使用 `markdown_it`¶

这个文档可以用 Jupytext 打开来执行！

借用 markdown_it 包，markdown-it-py 可以作为一个 API 使用。

原始文本首先被解析为语法 “形符”，然后使用 “渲染器” 将这些文本转换为其他格式。

快速入门¶

了解文本将如何被解析的最简单方法是使用：

from pprint import pprint
from markdown_it import MarkdownIt

md = MarkdownIt()
md.render("some *text*")

'<p>some <em>text</em></p>\n'

for token in md.parse("some *text*"):
    print(token)
    print()

Token(type='paragraph_open', tag='p', nesting=1, attrs={}, map=[0, 1], level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False)

Token(type='inline', tag='', nesting=0, attrs={}, map=[0, 1], level=1, children=[Token(type='text', tag='', nesting=0, attrs={}, map=None, level=0, children=None, content='some ', markup='', info='', meta={}, block=False, hidden=False), Token(type='em_open', tag='em', nesting=1, attrs={}, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), Token(type='text', tag='', nesting=0, attrs={}, map=None, level=1, children=None, content='text', markup='', info='', meta={}, block=False, hidden=False), Token(type='em_close', tag='em', nesting=-1, attrs={}, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False)], content='some *text*', markup='', info='', meta={}, block=True, hidden=False)

Token(type='paragraph_close', tag='p', nesting=-1, attrs={}, map=None, level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False)

解析器¶

MarkdownIt 类的实例化带有解析配置选项，规定了语法规则以及解析器和渲染器的附加选项。你可以通过直接提供一个字典或一个预设名称来定义这个配置：

zero：这配置了解析文本的最小组件（即只有段落和文本）
commonmark（默认）：这将配置解析器，使其严格遵守 CommonMark 规范。
js-default：这是 JavaScript 版本中的默认值。与 commonmark 相比，它禁用了 HTML 解析，并启用了表格和删除线组件。
gfm-like：这将解析器配置为大致符合 GitHub 风味的 Markdown 规范。与 commonmark 相比，它启用了表格、删除线和 linkify 组件。重要的是，要使用这个配置，你必须安装 linkify-it-py。

from markdown_it.presets import zero
zero.make()

{'options': {'maxNesting': 20,
  'html': False,
  'linkify': False,
  'typographer': False,
  'quotes': '“”‘’',
  'xhtmlOut': False,
  'breaks': False,
  'langPrefix': 'language-',
  'highlight': None},
 'components': {'core': {'rules': ['normalize', 'block', 'inline']},
  'block': {'rules': ['paragraph']},
  'inline': {'rules': ['text'], 'rules2': ['balance_pairs', 'text_collapse']}}}

md = MarkdownIt("zero")
md.options

{'maxNesting': 20,
 'html': False,
 'linkify': False,
 'typographer': False,
 'quotes': '“”‘’',
 'xhtmlOut': False,
 'breaks': False,
 'langPrefix': 'language-',
 'highlight': None}

你也可以覆盖特定的选项：

md = MarkdownIt("zero", {"maxNesting": 99})
md.options

{'maxNesting': 99,
 'html': False,
 'linkify': False,
 'typographer': False,
 'quotes': '“”‘’',
 'xhtmlOut': False,
 'breaks': False,
 'langPrefix': 'language-',
 'highlight': None}

pprint(md.get_active_rules())

{'block': ['paragraph'],
 'core': ['normalize', 'block', 'inline'],
 'inline': ['text'],
 'inline2': ['balance_pairs', 'text_collapse']}

你可以在源代码中找到所有的解析规则：parser_core.py、parser_block.py、parser_inline.py。

pprint(md.get_all_rules())

{'block': ['table',
           'code',
           'fence',
           'blockquote',
           'hr',
           'list',
           'reference',
           'html_block',
           'heading',
           'lheading',
           'paragraph'],
 'core': ['normalize',
          'block',
          'inline',
          'linkify',
          'replacements',
          'smartquotes'],
 'inline': ['text',
            'newline',
            'escape',
            'backticks',
            'strikethrough',
            'emphasis',
            'link',
            'image',
            'autolink',
            'html_inline',
            'entity'],
 'inline2': ['balance_pairs', 'strikethrough', 'emphasis', 'text_collapse']}

任何解析规则都可以被启用/禁用，这些方法是：”chainable” ：

md.render("- __*emphasise this*__")

'<p>- __*emphasise this*__</p>\n'

md.enable(["list", "emphasis"]).render("- __*emphasise this*__")

'<ul>\n<li><strong><em>emphasise this</em></strong></li>\n</ul>\n'

你可以用 reset_rules 上下文管理器临时修改规则。

with md.reset_rules():
    md.disable("emphasis")
    print(md.render("__*emphasise this*__"))
md.render("__*emphasise this*__")

<p>__*emphasise this*__</p>

'<p><strong><em>emphasise this</em></strong></p>\n'

另外 renderInline 在运行解析器时禁用所有块语法规则。

md.renderInline("__*emphasise this*__")

'<strong><em>emphasise this</em></strong>'

排版组件¶

smartquotes 和 replacements 组件的目的是改善排版：

smartquotes 将把基本引号转换为其开头和结尾的变体：

‘单引号’ -> ‘单引号’。
“双引号” -> “双引号”

replacements 将替换特定的文本结构：

(c), (C) → ©
(tm), (TM) → ™
(r), (R) → ®
(p), (P) → §
+- → ±
... → …
?.... → ?..
!.... → !..
???????? → ???
!!!!! → !!!
,,, → ,
-- → &ndash
--- → &mdash

这两个组件都需要打开排版，以及启用组件：

md = MarkdownIt("commonmark", {"typographer": True})
md.enable(["replacements", "smartquotes"])
md.render("'single quotes' (c)")

'<p>‘single quotes’ ©</p>\n'

Linkify¶

linkify 组件需要安装 linkify-it-py（例如，通过 pip install markdown-it-py[linkify]）。这允许识别 URI 自动链接，而不需要用 <> 括号括起来：

md = MarkdownIt("commonmark", {"linkify": True})
md.enable(["linkify"])
md.render("github.com")

'<p><a href="http://github.com">github.com</a></p>\n'

加载插件¶

插件将额外的语法规则和渲染方法的集合加载到解析器中。在 mdit_py_plugins 中有许多有用的插件（见插件列表），或者你可以自己创建（遵循 markdown-it 设计原则）。

from markdown_it import MarkdownIt
import mdit_py_plugins
from mdit_py_plugins.front_matter import front_matter_plugin
from mdit_py_plugins.footnote import footnote_plugin

md = (
    MarkdownIt()
    .use(front_matter_plugin)
    .use(footnote_plugin)
    .enable('table')
)
text = ("""
---
a: 1
---

a | b
- | -
1 | 2

A footnote [^1]

[^1]: some details
""")
md.render(text)

'<hr />\n<h2>a: 1</h2>\n<p>a | b</p>\n<ul>\n<li>| -\n1 | 2</li>\n</ul>\n<p>A footnote <sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup></p>\n<hr class="footnotes-sep" />\n<section class="footnotes">\n<ol class="footnotes-list">\n<li id="fn1" class="footnote-item"><p>some details <a href="#fnref1" class="footnote-backref">↩︎</a></p>\n</li>\n</ol>\n</section>\n'

形符流¶

在渲染之前，文本被解析为块级语法元素的扁平形符流，嵌套由开口（1）和闭口（-1）属性定义：

md = MarkdownIt("commonmark")
tokens = md.parse("""
Here's some *text*

1. a list

> a *quote*""")
[(t.type, t.nesting) for t in tokens]

[('paragraph_open', 1),
 ('inline', 0),
 ('paragraph_close', -1),
 ('ordered_list_open', 1),
 ('list_item_open', 1),
 ('paragraph_open', 1),
 ('inline', 0),
 ('paragraph_close', -1),
 ('list_item_close', -1),
 ('ordered_list_close', -1),
 ('blockquote_open', 1),
 ('paragraph_open', 1),
 ('inline', 0),
 ('paragraph_close', -1),
 ('blockquote_close', -1)]

自然，所有的开口最终都应该被关闭，这样一来：

sum([t.nesting for t in tokens]) == 0

True

所有的形符都是同一个类别，也可以在解析器之外创建：

tokens[0]

Token(type='paragraph_open', tag='p', nesting=1, attrs={}, map=[1, 2], level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False)

from markdown_it.token import Token
token = Token("paragraph_open", "p", 1, block=True, map=[1, 2])
token == tokens[0]

True

'inline' 类型形符包含内联形符作为子项：

tokens[1]

Token(type='inline', tag='', nesting=0, attrs={}, map=[1, 2], level=1, children=[Token(type='text', tag='', nesting=0, attrs={}, map=None, level=0, children=None, content="Here's some ", markup='', info='', meta={}, block=False, hidden=False), Token(type='em_open', tag='em', nesting=1, attrs={}, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), Token(type='text', tag='', nesting=0, attrs={}, map=None, level=1, children=None, content='text', markup='', info='', meta={}, block=False, hidden=False), Token(type='em_close', tag='em', nesting=-1, attrs={}, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False)], content="Here's some *text*", markup='', info='', meta={}, block=True, hidden=False)

你可以用以下方法将一个形符（和它的孩子）序列化为一个 JSONable 字典：

print(tokens[1].as_dict())

{'type': 'inline', 'tag': '', 'nesting': 0, 'attrs': None, 'map': [1, 2], 'level': 1, 'children': [{'type': 'text', 'tag': '', 'nesting': 0, 'attrs': None, 'map': None, 'level': 0, 'children': None, 'content': "Here's some ", 'markup': '', 'info': '', 'meta': {}, 'block': False, 'hidden': False}, {'type': 'em_open', 'tag': 'em', 'nesting': 1, 'attrs': None, 'map': None, 'level': 0, 'children': None, 'content': '', 'markup': '*', 'info': '', 'meta': {}, 'block': False, 'hidden': False}, {'type': 'text', 'tag': '', 'nesting': 0, 'attrs': None, 'map': None, 'level': 1, 'children': None, 'content': 'text', 'markup': '', 'info': '', 'meta': {}, 'block': False, 'hidden': False}, {'type': 'em_close', 'tag': 'em', 'nesting': -1, 'attrs': None, 'map': None, 'level': 0, 'children': None, 'content': '', 'markup': '*', 'info': '', 'meta': {}, 'block': False, 'hidden': False}], 'content': "Here's some *text*", 'markup': '', 'info': '', 'meta': {}, 'block': True, 'hidden': False}

这个字典也可以被反序列化：

Token.from_dict(tokens[1].as_dict())

Token(type='inline', tag='', nesting=0, attrs={}, map=[1, 2], level=1, children=[Token(type='text', tag='', nesting=0, attrs={}, map=None, level=0, children=None, content="Here's some ", markup='', info='', meta={}, block=False, hidden=False), Token(type='em_open', tag='em', nesting=1, attrs={}, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), Token(type='text', tag='', nesting=0, attrs={}, map=None, level=1, children=None, content='text', markup='', info='', meta={}, block=False, hidden=False), Token(type='em_close', tag='em', nesting=-1, attrs={}, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False)], content="Here's some *text*", markup='', info='', meta={}, block=True, hidden=False)

创建语法树¶

在 0.7.0 版更改: nest_tokens 和 NestedTokens 已被废弃，由 SyntaxTreeNode 取代。

在某些用例中，将形符流转换为语法树可能是有用的，开放/关闭的形符被折叠成一个包含子代的单一形符。

from markdown_it.tree import SyntaxTreeNode

md = MarkdownIt("commonmark")
tokens = md.parse("""
# Header

Here's some text and an image ![title](image.png)

1. a **list**

> a *quote*
""")

node = SyntaxTreeNode(tokens)
print(node.pretty(indent=2, show_text=True))

<root>
  <heading>
    <inline>
      <text>
        Header
  <paragraph>
    <inline>
      <text>
        Here's some text and an image 
      <image src='image.png' alt=''>
        <text>
          title
  <ordered_list>
    <list_item>
      <paragraph>
        <inline>
          <text>
            a 
          <strong>
            <text>
              list
          <text>
  <blockquote>
    <paragraph>
      <inline>
        <text>
          a 
        <em>
          <text>
            quote

然后，你可以使用方法来遍历树结构

node.children

[SyntaxTreeNode(heading),
 SyntaxTreeNode(paragraph),
 SyntaxTreeNode(ordered_list),
 SyntaxTreeNode(blockquote)]

print(node[0])
node[0].next_sibling

SyntaxTreeNode(heading)

SyntaxTreeNode(paragraph)

渲染器¶

形符流生成后，它被传递给一个渲染器。然后，它播放所有的形符，将每个形符传递给一个与形符类型同名的规则。

渲染器规则位于 md.renderer.rules 中，是具有相同签名的简单函数：

def function(renderer, tokens, idx, options, env):
  return htmlResult

你可以将渲染方法注入到实例化的渲染类中。

md = MarkdownIt("commonmark")

def render_em_open(self, tokens, idx, options, env):
    return '<em class="myclass">'

md.add_render_rule("em_open", render_em_open)
md.render("*a*")

'<p><em class="myclass">a</em></p>\n'

这是对 JS 版本的轻微改变，渲染器的参数在最后。另外 add_render_rule 方法是 Python 特有的，而不是直接添加到 md.renderer.rules 中，这确保了该方法被绑定到渲染器上。

你也可以对渲染器进行子类化，并在那里添加这个方法：

from markdown_it.renderer import RendererHTML

class MyRenderer(RendererHTML):
    def em_open(self, tokens, idx, options, env):
        return '<em class="myclass">'

md = MarkdownIt("commonmark", renderer_cls=MyRenderer)
md.render("*a*")

'<p><em class="myclass">a</em></p>\n'

插件可以支持多种渲染类型，使用 __ouput__ 属性（目前这只是一个 Python 功能）。

from markdown_it.renderer import RendererHTML

class MyRenderer1(RendererHTML):
    __output__ = "html1"

class MyRenderer2(RendererHTML):
    __output__ = "html2"

def plugin(md):
    def render_em_open1(self, tokens, idx, options, env):
        return '<em class="myclass1">'
    def render_em_open2(self, tokens, idx, options, env):
        return '<em class="myclass2">'
    md.add_render_rule("em_open", render_em_open1, fmt="html1")
    md.add_render_rule("em_open", render_em_open2, fmt="html2")

md = MarkdownIt("commonmark", renderer_cls=MyRenderer1).use(plugin)
print(md.render("*a*"))

md = MarkdownIt("commonmark", renderer_cls=MyRenderer2).use(plugin)
print(md.render("*a*"))

<p><em class="myclass1">a</em></p>

<p><em class="myclass2">a</em></p>

这里有一个更具体的例子；让我们用 vimeo 链接替换图片到播放器的 iframe：

import re
from markdown_it import MarkdownIt

vimeoRE = re.compile(r'^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)')

def render_vimeo(self, tokens, idx, options, env):
    token = tokens[idx]

    if vimeoRE.match(token.attrs["src"]):

        ident = vimeoRE.match(token.attrs["src"])[2]

        return ('<div class="embed-responsive embed-responsive-16by9">\n' +
               '  <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' +
                ident + '"></iframe>\n' +
               '</div>\n')
    return self.image(tokens, idx, options, env)

md = MarkdownIt("commonmark")
md.add_render_rule("image", render_vimeo)
print(md.render("![](https://www.vimeo.com/123)"))

<p><div class="embed-responsive embed-responsive-16by9">
  <iframe class="embed-responsive-item" src="//player.vimeo.com/video/123"></iframe>
</div>
</p>

下面是另一个例子，如何将 target="_blank" 添加到所有链接：

from markdown_it import MarkdownIt

def render_blank_link(self, tokens, idx, options, env):
    tokens[idx].attrSet("target", "_blank")

    # pass token to default renderer.
    return self.renderToken(tokens, idx, options, env)

md = MarkdownIt("commonmark")
md.add_render_rule("link_open", render_blank_link)
print(md.render("[a]\n\n[a]: b"))

<p><a href="b" target="_blank">a</a></p>

markdown-it-py

使用 markdown_it¶