Extending the build process¶
The objective of this tutorial is to create a more comprehensive extension than that created in Extending syntax with roles and directives. Whereas that guide just covered writing a custom role and directive, this guide covers a more complex extension to the Sphinx build process; adding multiple directives, along with custom nodes, additional config values and custom event handlers.
To this end, we will cover a todo
extension
that adds capabilities to include todo entries in the documentation,
and to collect these in a central place.
This is similar to the sphinx.ext.todo
extension distributed with Sphinx.
Overview¶
Note
To understand the design of this extension, refer to Important objects and Build phases.
We want the extension to add the following to Sphinx:
A
todo
directive, containing some content that is marked with "TODO" and only shown in the output if a new config value is set. Todo entries should not be in the output by default.A
todolist
directive that creates a list of all todo entries throughout the documentation.
For that, we will need to add the following elements to Sphinx:
New directives, called
todo
andtodolist
.New document tree nodes to represent these directives, conventionally also called
todo
andtodolist
. We wouldn't need new nodes if the new directives only produced some content representable by existing nodes.A new config value
todo_include_todos
(config value names should start with the extension name, in order to stay unique) that controls whether todo entries make it into the output.New event handlers: one for the
doctree-resolved
event, to replace the todo and todolist nodes, one forenv-merge-info
to merge intermediate results from parallel builds, and one forenv-purge-doc
(the reason for that will be covered later).
Prerequisites¶
As with Extending syntax with roles and directives, we will not be distributing this plugin via PyPI so once again we need a Sphinx project to call this from. You can use an existing project or create a new one using sphinx-quickstart.
We assume you are using separate source (source
) and build
(build
) folders. Your extension file could be in any folder of your
project. In our case, let's do the following:
Create an
_ext
folder insource
Create a new Python file in the
_ext
folder calledtodo.py
Here is an example of the folder structure you might obtain:
└── source
├── _ext
│ └── todo.py
├── _static
├── conf.py
├── somefolder
├── index.rst
├── somefile.rst
└── someotherfile.rst
Writing the extension¶
Open todo.py
and paste the following code in it, all of which we will
explain in detail shortly:
1from docutils import nodes
2from docutils.parsers.rst import Directive
3
4from sphinx.application import Sphinx
5from sphinx.locale import _
6from sphinx.util.docutils import SphinxDirective
7from sphinx.util.typing import ExtensionMetadata
8
9
10class todo(nodes.Admonition, nodes.Element):
11 pass
12
13
14class todolist(nodes.General, nodes.Element):
15 pass
16
17
18def visit_todo_node(self, node):
19 self.visit_admonition(node)
20
21
22def depart_todo_node(self, node):
23 self.depart_admonition(node)
24
25
26class TodolistDirective(Directive):
27 def run(self):
28 return [todolist('')]
29
30
31class TodoDirective(SphinxDirective):
32 # this enables content in the directive
33 has_content = True
34
35 def run(self):
36 targetid = 'todo-%d' % self.env.new_serialno('todo')
37 targetnode = nodes.target('', '', ids=[targetid])
38
39 todo_node = todo('\n'.join(self.content))
40 todo_node += nodes.title(_('Todo'), _('Todo'))
41 todo_node += self.parse_content_to_nodes()
42
43 if not hasattr(self.env, 'todo_all_todos'):
44 self.env.todo_all_todos = []
45
46 self.env.todo_all_todos.append({
47 'docname': self.env.docname,
48 'lineno': self.lineno,
49 'todo': todo_node.deepcopy(),
50 'target': targetnode,
51 })
52
53 return [targetnode, todo_node]
54
55
56def purge_todos(app, env, docname):
57 if not hasattr(env, 'todo_all_todos'):
58 return
59
60 env.todo_all_todos = [
61 todo for todo in env.todo_all_todos if todo['docname'] != docname
62 ]
63
64
65def merge_todos(app, env, docnames, other):
66 if not hasattr(env, 'todo_all_todos'):
67 env.todo_all_todos = []
68 if hasattr(other, 'todo_all_todos'):
69 env.todo_all_todos.extend(other.todo_all_todos)
70
71
72def process_todo_nodes(app, doctree, fromdocname):
73 if not app.config.todo_include_todos:
74 for node in doctree.findall(todo):
75 node.parent.remove(node)
76
77 # Replace all todolist nodes with a list of the collected todos.
78 # Augment each todo with a backlink to the original location.
79 env = app.builder.env
80
81 if not hasattr(env, 'todo_all_todos'):
82 env.todo_all_todos = []
83
84 for node in doctree.findall(todolist):
85 if not app.config.todo_include_todos:
86 node.replace_self([])
87 continue
88
89 content = []
90
91 for todo_info in env.todo_all_todos:
92 para = nodes.paragraph()
93 filename = env.doc2path(todo_info['docname'], base=None)
94 description = _(
95 '(The original entry is located in %s, line %d and can be found '
96 ) % (filename, todo_info['lineno'])
97 para += nodes.Text(description)
98
99 # Create a reference
100 newnode = nodes.reference('', '')
101 innernode = nodes.emphasis(_('here'), _('here'))
102 newnode['refdocname'] = todo_info['docname']
103 newnode['refuri'] = app.builder.get_relative_uri(
104 fromdocname, todo_info['docname']
105 )
106 newnode['refuri'] += '#' + todo_info['target']['refid']
107 newnode.append(innernode)
108 para += newnode
109 para += nodes.Text('.)')
110
111 # Insert into the todolist
112 content.extend((
113 todo_info['todo'],
114 para,
115 ))
116
117 node.replace_self(content)
118
119
120def setup(app: Sphinx) -> ExtensionMetadata:
121 app.add_config_value('todo_include_todos', False, 'html')
122
123 app.add_node(todolist)
124 app.add_node(
125 todo,
126 html=(visit_todo_node, depart_todo_node),
127 latex=(visit_todo_node, depart_todo_node),
128 text=(visit_todo_node, depart_todo_node),
129 )
130
131 app.add_directive('todo', TodoDirective)
132 app.add_directive('todolist', TodolistDirective)
133 app.connect('doctree-resolved', process_todo_nodes)
134 app.connect('env-purge-doc', purge_todos)
135 app.connect('env-merge-info', merge_todos)
136
137 return {
138 'version': '0.1',
139 'env_version': 1,
140 'parallel_read_safe': True,
141 'parallel_write_safe': True,
142 }
This is far more extensive extension than the one detailed in Extending syntax with roles and directives, however, we will will look at each piece step-by-step to explain what's happening.
The node classes
Let's start with the node classes:
1
2
3class todo(nodes.Admonition, nodes.Element):
4 pass
5
6
7class todolist(nodes.General, nodes.Element):
8 pass
9
10
11def visit_todo_node(self, node):
12 self.visit_admonition(node)
13
14
Node classes usually don't have to do anything except inherit from the standard
docutils classes defined in docutils.nodes
. todo
inherits from
Admonition
because it should be handled like a note or warning, todolist
is just a "general" node.
Note
Many extensions will not have to create their own node classes and work fine with the nodes already provided by docutils and Sphinx.
Attention
It is important to know that while you can extend Sphinx without
leaving your conf.py
, if you declare an inherited node right
there, you'll hit an unobvious PickleError
. So if
something goes wrong, please make sure that you put inherited nodes
into a separate Python module.
For more details, see:
The directive classes
A directive class is a class deriving usually from
docutils.parsers.rst.Directive
. The directive interface is also
covered in detail in the docutils documentation; the important thing is that
the class should have attributes that configure the allowed markup, and a
run
method that returns a list of nodes.
Looking first at the TodolistDirective
directive:
1
2
3class TodolistDirective(Directive):
4 def run(self):
It's very simple, creating and returning an instance of our todolist
node
class. The TodolistDirective
directive itself has neither content nor
arguments that need to be handled. That brings us to the TodoDirective
directive:
1
2class TodoDirective(SphinxDirective):
3 # this enables content in the directive
4 has_content = True
5
6 def run(self):
7 targetid = 'todo-%d' % self.env.new_serialno('todo')
8 targetnode = nodes.target('', '', ids=[targetid])
9
10 todo_node = todo('\n'.join(self.content))
11 todo_node += nodes.title(_('Todo'), _('Todo'))
12 todo_node += self.parse_content_to_nodes()
13
14 if not hasattr(self.env, 'todo_all_todos'):
15 self.env.todo_all_todos = []
16
17 self.env.todo_all_todos.append({
18 'docname': self.env.docname,
19 'lineno': self.lineno,
20 'todo': todo_node.deepcopy(),
21 'target': targetnode,
22 })
23
24 return [targetnode, todo_node]
Several important things are covered here. First, as you can see, we're now
subclassing the SphinxDirective
helper class
instead of the usual Directive
class. This
gives us access to the build environment instance
using the self.env
property. Without this, we'd have to use the rather
convoluted self.state.document.settings.env
. Then, to act as a link target
(from TodolistDirective
), the TodoDirective
directive needs to return a
target node in addition to the todo
node. The target ID (in HTML, this will
be the anchor name) is generated by using env.new_serialno
which returns a
new unique integer on each call and therefore leads to unique target names. The
target node is instantiated without any text (the first two arguments).
On creating admonition node, the content body of the directive are parsed using
self.state.nested_parse
. The first argument gives the content body, and
the second one gives content offset. The third argument gives the parent node
of parsed result, in our case the todo
node. Following this, the todo
node is added to the environment. This is needed to be able to create a list of
all todo entries throughout the documentation, in the place where the author
puts a todolist
directive. For this case, the environment attribute
todo_all_todos
is used (again, the name should be unique, so it is prefixed
by the extension name). It does not exist when a new environment is created, so
the directive must check and create it if necessary. Various information about
the todo entry's location are stored along with a copy of the node.
In the last line, the nodes that should be put into the doctree are returned: the target node and the admonition node.
The node structure that the directive returns looks like this:
+--------------------+
| target node |
+--------------------+
+--------------------+
| todo node |
+--------------------+
\__+--------------------+
| admonition title |
+--------------------+
| paragraph |
+--------------------+
| ... |
+--------------------+
The event handlers
Event handlers are one of Sphinx's most powerful features, providing a way to do hook into any part of the documentation process. There are many events provided by Sphinx itself, as detailed in the API guide, and we're going to use a subset of them here.
Let's look at the event handlers used in the above example. First, the one for
the env-purge-doc
event:
1def purge_todos(app, env, docname):
2 if not hasattr(env, 'todo_all_todos'):
3 return
4
5 env.todo_all_todos = [
6 todo for todo in env.todo_all_todos if todo['docname'] != docname
Since we store information from source files in the environment, which is
persistent, it may become out of date when the source file changes. Therefore,
before each source file is read, the environment's records of it are cleared,
and the env-purge-doc
event gives extensions a chance to do the same.
Here we clear out all todos whose docname matches the given one from the
todo_all_todos
list. If there are todos left in the document, they will be
added again during parsing.
The next handler, for the env-merge-info
event, is used
during parallel builds. As during parallel builds all threads have
their own env
, there's multiple todo_all_todos
lists that need
to be merged:
1
2def merge_todos(app, env, docnames, other):
3 if not hasattr(env, 'todo_all_todos'):
4 env.todo_all_todos = []
5 if hasattr(other, 'todo_all_todos'):
The other handler belongs to the doctree-resolved
event:
1
2def process_todo_nodes(app, doctree, fromdocname):
3 if not app.config.todo_include_todos:
4 for node in doctree.findall(todo):
5 node.parent.remove(node)
6
7 # Replace all todolist nodes with a list of the collected todos.
8 # Augment each todo with a backlink to the original location.
9 env = app.builder.env
10
11 if not hasattr(env, 'todo_all_todos'):
12 env.todo_all_todos = []
13
14 for node in doctree.findall(todolist):
15 if not app.config.todo_include_todos:
16 node.replace_self([])
17 continue
18
19 content = []
20
21 for todo_info in env.todo_all_todos:
22 para = nodes.paragraph()
23 filename = env.doc2path(todo_info['docname'], base=None)
24 description = _(
25 '(The original entry is located in %s, line %d and can be found '
26 ) % (filename, todo_info['lineno'])
27 para += nodes.Text(description)
28
29 # Create a reference
30 newnode = nodes.reference('', '')
31 innernode = nodes.emphasis(_('here'), _('here'))
32 newnode['refdocname'] = todo_info['docname']
33 newnode['refuri'] = app.builder.get_relative_uri(
34 fromdocname, todo_info['docname']
35 )
36 newnode['refuri'] += '#' + todo_info['target']['refid']
37 newnode.append(innernode)
38 para += newnode
39 para += nodes.Text('.)')
40
41 # Insert into the todolist
42 content.extend((
43 todo_info['todo'],
The doctree-resolved
event is emitted at the end of phase 3
(resolving) and allows custom resolving to be done. The handler
we have written for this event is a bit more involved. If the
todo_include_todos
config value (which we'll describe shortly) is false,
all todo
and todolist
nodes are removed from the documents. If not,
todo
nodes just stay where and how they are. todolist
nodes are
replaced by a list of todo entries, complete with backlinks to the location
where they come from. The list items are composed of the nodes from the
todo
entry and docutils nodes created on the fly: a paragraph for each
entry, containing text that gives the location, and a link (reference node
containing an italic node) with the backreference. The reference URI is built
by sphinx.builders.Builder.get_relative_uri()
which creates a suitable
URI depending on the used builder, and appending the todo node's (the target's)
ID as the anchor name.
The setup
function
As noted previously,
the setup
function is a requirement
and is used to plug directives into Sphinx. However, we also use it to hook up
the other parts of our extension. Let's look at our setup
function:
1
2 node.replace_self(content)
3
4
5def setup(app: Sphinx) -> ExtensionMetadata:
6 app.add_config_value('todo_include_todos', False, 'html')
7
8 app.add_node(todolist)
9 app.add_node(
10 todo,
11 html=(visit_todo_node, depart_todo_node),
12 latex=(visit_todo_node, depart_todo_node),
13 text=(visit_todo_node, depart_todo_node),
14 )
15
16 app.add_directive('todo', TodoDirective)
17 app.add_directive('todolist', TodolistDirective)
18 app.connect('doctree-resolved', process_todo_nodes)
19 app.connect('env-purge-doc', purge_todos)
20 app.connect('env-merge-info', merge_todos)
21
22 return {
23 'version': '0.1',
24 'env_version': 1,
25 'parallel_read_safe': True,
26 'parallel_write_safe': True,
27 }
The calls in this function refer to the classes and functions we added earlier. What the individual calls do is the following:
add_config_value()
lets Sphinx know that it should recognize the new config valuetodo_include_todos
, whose default value should beFalse
(this also tells Sphinx that it is a boolean value).If the third argument was
'html'
, HTML documents would be full rebuild if the config value changed its value. This is needed for config values that influence reading (build phase 1 (reading)).add_node()
adds a new node class to the build system. It also can specify visitor functions for each supported output format. These visitor functions are needed when the new nodes stay until phase 4 (writing). Since thetodolist
node is always replaced in phase 3 (resolving), it doesn't need any.add_directive()
adds a new directive, given by name and class.Finally,
connect()
adds an event handler to the event whose name is given by the first argument. The event handler function is called with several arguments which are documented with the event.
With this, our extension is complete.
Using the extension¶
As before, we need to enable the extension by declaring it in our
conf.py
file. There are two steps necessary here:
Add the
_ext
directory to the Python path usingsys.path.append
. This should be placed at the top of the file.Update or create the
extensions
list and add the extension file name to the list
In addition, we may wish to set the todo_include_todos
config value. As
noted above, this defaults to False
but we can set it explicitly.
For example:
import sys
from pathlib import Path
sys.path.append(str(Path('_ext').resolve()))
extensions = ['todo']
todo_include_todos = False
You can now use the extension throughout your project. For example:
Hello, world
============
.. toctree::
somefile.rst
someotherfile.rst
Hello world. Below is the list of TODOs.
.. todolist::
foo
===
Some intro text here...
.. todo:: Fix this
bar
===
Some more text here...
.. todo:: Fix that
Because we have configured todo_include_todos
to False
, we won't
actually see anything rendered for the todo
and todolist
directives.
However, if we toggle this to true, we will see the output described
previously.
Further reading¶
For more information, refer to the docutils documentation and Sphinx API.
If you wish to share your extension across multiple projects or with others, check out the 第三方插件 section.