PEP 517 – A build-system independent format for source trees
- Author:
- Nathaniel J. Smith <njs at pobox.com>, Thomas Kluyver <thomas at kluyver.me.uk>
- BDFL-Delegate:
- Nick Coghlan <ncoghlan at gmail.com>
- Discussions-To:
- Distutils-SIG list
- Status:
- Final
- Type:
- Standards Track
- Topic:
- Packaging
- Created:
- 30-Sep-2015
- Post-History:
- 01-Oct-2015, 25-Oct-2015, 19-May-2017, 11-Sep-2017
- Resolution:
- Distutils-SIG message
摘要
虽然 distutils
/ setuptools
已经带我们走了很长一段路,但它们有三个严重的问题:(a) 它们缺少重要的功能,比如可用的构建时依赖声明,自动配置,甚至基本的人体工程学细节,比如符合 DRY 的版本号管理,(b)扩展它们是困难的,所以虽然确实存在针对上述问题的各种解决方案,但它们通常是古怪的,脆弱的,维护成本昂贵,然而(c)很难使用其他任何东西,因为 distutils/setuptools 为安装用户和 pip
等安装工具所需的包提供了标准接口。
以前的努力(例如 distutils2 或 setuptools 本身)试图解决问题(a)和/或(b)。这个提案旨在解决(c)。
这个 PEP 的目标是让 distutils-sig 不再充当 Python 构建系统的看门人。如果你想用 distutils,很好;如果您想使用其他方法,那么使用标准化方法应该很容易做到。与 distutils 进行接口的困难意味着目前还没有很多这样的系统,但是为了让我们了解我们正在考虑什么,请参阅 flit 或 bento。幸运的是,wheels 现在已经解决了这里的许多难题——例如,构建系统不再需要知道可能的安装配置——所以真正需要的基本上就是构建系统能够以某种方式吐出符合标准的 wheels 和 sdist。
因此,建议为 pip
等安装工具提供新的、相对最小的接口,以与包源代码树和源代码分发进行交互。
术语和目标
源代码树有点像 VCS 签出。需要标准的接口来从这种格式安装,以支持像 pip install some-directory/
这样的用法。
源代码发行版(source distribution)是表示某些源代码的特定版本的静态快照,如 lxml-3.4.4.tar.gz
。源代码发行版有很多用途:它们形成了版本的归档记录,它们为那些想要吸收和处理大量代码(可能用多种语言编写)的工具提供了一种笨拙而简单的事实上的标准,它们充当了下游打包系统(如 Debian/Fedora/Conda/…)的输入,等等。在 Python 生态系统中,它们还扮演着特别重要的角色,因为像 pip
这样的打包工具能够使用源代码发行版来实现二进制依赖关系,例如,如果有发行版 foo.whl
声明了对 bar
的依赖,那么需要支持这样的情况,即 pip install bar
或 pip install foo
自动为 bar
定位 sdist,下载它,构建它,并安装结果包。
源代码发行版也简称为 sdists。
构建前端 (build frontend)是一种用户可以运行的工具,它可以获取任意的源代码树或源代码分布,并从中构建轮子。实际的构建由每个源树的 构建后端 (build backend)完成。在像 pip wheel some-directory/
这样的命令中,pip
充当构建前端。
集成前端 (integration frontend)是用户可能运行的一种工具,它接受一组包需求(例如,requirements.txt 文件),并尝试更新工作环境以满足这些需求。这可能需要定位、构建和安装轮子和 sdist 的组合。在像 pip install lxml==2.4.0
这样的命令中,pip 充当集成前端。”
源码树
有一个现有的、遗留的源代码树格式涉及 setup.py
。我们不打算进一步详细说明;它的实际规范编码在 distutils
、setuptools
、pip
和其他工具的源代码和文档中。我们称之为 setup.py
样式。
在这里,基于 PEP 518 中定义的 pyproject.toml
文件定义了一种新的源代码树样式,扩展了该文件中的 [build-system]
表,并添加了附加键 build-backend
。下面是例子:
[build-system]
# Defined by PEP 518:
requires = ["flit"]
# Defined by this PEP:
build-backend = "flit.api:main"
build-backend
是命名 Python 对象的字符串,该对象将用于执行构建(详细信息见下文)。它的格式遵循相同的 module:object
语法,作为 setuptools
入口点。例如,如果字符串是 "flit.api:main"
就像上面的例子一样,这个对象将通过执行等价的命令来查找:
import flit.api
backend = flit.api.main
省略 :object
部分也是合法的,例如:
build-backend = "flit.api"
它的作用如下:
import flit.api
backend = flit.api
形式上,字符串应该满足此语法:
identifier = (letter | '_') (letter | '_' | digit)*
module_path = identifier ('.' identifier)*
object_path = identifier ('.' identifier)*
entry_point = module_path (':' object_path)?
我们导入 module_path
,然后查找 module_path.object_path
(如果缺少 object_path
,则只需 module_path
)。
当导入模块路径时,我们 不会 查看包含源树的目录,除非该目录位于 sys.path
(例如,因为它是在 PYTHONPATH 中指定的)。尽管 Python 会自动将工作目录添加到 sys.path
,在某些情况下,解析后端代码不应受此影响。
如果 pyproject.toml
文件缺失,或者 build-backend
键缺失,则源树没有使用此规范,并且工具应该恢复到运行 setup.py
的遗留行为(直接或隐式调用 setuptools.build_meta:__legacy__
后端)。
如果存在 build-backend
键,则该键优先,并且源树遵循指定后端的格式和约定(因此,除非后端需要,否则不需要 setup.py
)。项目可能仍然希望包含 setup.py
,以便与不使用此规范的工具兼容。
这个 PEP 还定义了 backend-path
键,用于 pyproject.toml
,请参阅下面的 “树内构建后端” 部分。这个键的使用方式如下:
[build-system]
# Defined by PEP 518:
requires = ["flit"]
# Defined by this PEP:
build-backend = "local_backend"
backend-path = ["backend"]
构建需求
这个 PEP 在 pyproject.toml
的“构建需求”部分放置了许多额外的需求。这些旨在确保项目不会创造不可能满足其建造要求的条件。
- 项目构建需求将定义有向的需求图(项目 A 需要 B 来构建,B 需要 C 和 D,等等)这个图一定不能包含循环。如果(例如,由于项目之间缺乏协调)存在周期,前端可能会拒绝构建项目。
- 当构建需求作为轮子可用时,前端应该在实际的地方使用它们,以避免深度嵌套的构建。然而,前端可能有一种模式,在定位构建需求时不考虑轮子,因此项目一定不要认为发布轮子就足以打破需求循环。
- 前端应该明确地检查需求周期,如果发现需求周期,就用有信息的消息终止构建。
特别要注意的是,没有需求循环的要求意味着希望自托管的后端(即,为后端构建轮子使用该后端进行构建)需要做出特殊的规定,以避免造成循环。通常,这将涉及将自己指定为树内后端,并避免外部构建依赖(通常通过提供它们)。
构建后端接口
构建后端对象期望具有提供以下部分或全部钩子的属性。常见的 config_settings
参数在各个钩子之后描述。
强制性的钩子
build_wheel
def build_wheel(wheel_directory, config_settings=None, metadata_directory=None):
...
必须构建 .whl 文件,并将其放在指定的 wheel_directory
中。它必须返回 .whl
的基名(而不是完整路径),作为 unicode 字符串。
If the build frontend has previously called prepare_metadata_for_build_wheel
and depends on the wheel resulting from this call to have metadata
matching this earlier call, then it should provide the path to the created
.dist-info
directory as the metadata_directory
argument. If this
argument is provided, then build_wheel
MUST produce a wheel with identical
metadata. The directory passed in by the build frontend MUST be
identical to the directory created by prepare_metadata_for_build_wheel
,
including any unrecognized files it created.
Backends which do not provide the prepare_metadata_for_build_wheel
hook may
either silently ignore the metadata_directory
parameter to build_wheel
,
or else raise an exception when it is set to anything other than None
.
To ensure that wheels from different sources are built the same way, frontends
may call build_sdist
first, and then call build_wheel
in the unpacked
sdist. But if the backend indicates that it is missing some requirements for
creating an sdist (see below), the frontend will fall back to calling
build_wheel
in the source directory.
The source directory may be read-only. Backends should therefore be prepared to build without creating or modifying any files in the source directory, but they may opt not to handle this case, in which case failures will be visible to the user. Frontends are not responsible for any special handling of read-only source directories.
The backend may store intermediate artifacts in cache locations or temporary directories. The presence or absence of any caches should not make a material difference to the final result of the build.
build_sdist
def build_sdist(sdist_directory, config_settings=None):
...
Must build a .tar.gz source distribution and place it in the specified
sdist_directory
. It must return the basename (not the full path) of the
.tar.gz
file it creates, as a unicode string.
A .tar.gz source distribution (sdist) contains a single top-level directory called
{name}-{version}
(e.g. foo-1.0
), containing the source files of the
package. This directory must also contain the
pyproject.toml
from the build directory, and a PKG-INFO file containing
metadata in the format described in
PEP 345. Although historically
zip files have also been used as sdists, this hook should produce a gzipped
tarball. This is already the more common format for sdists, and having a
consistent format makes for simpler tooling.
The generated tarball should use the modern POSIX.1-2001 pax tar format, which
specifies UTF-8 based file names. This is not yet the default for the tarfile
module shipped with Python 3.6, so backends using the tarfile module need to
explicitly pass format=tarfile.PAX_FORMAT
.
Some backends may have extra requirements for creating sdists, such as version
control tools. However, some frontends may prefer to make intermediate sdists
when producing wheels, to ensure consistency.
If the backend cannot produce an sdist because a dependency is missing, or
for another well understood reason, it should raise an exception of a specific
type which it makes available as UnsupportedOperation
on the backend object.
If the frontend gets this exception while building an sdist as an intermediate
for a wheel, it should fall back to building a wheel directly.
The backend does not need to define this exception type if it would never raise
it.
Optional hooks
get_requires_for_build_wheel
def get_requires_for_build_wheel(config_settings=None):
...
This hook MUST return an additional list of strings containing PEP 508
dependency specifications, above and beyond those specified in the
pyproject.toml
file, to be installed when calling the build_wheel
or
prepare_metadata_for_build_wheel
hooks.
Example:
def get_requires_for_build_wheel(config_settings):
return ["wheel >= 0.25", "setuptools"]
If not defined, the default implementation is equivalent to return []
.
prepare_metadata_for_build_wheel
def prepare_metadata_for_build_wheel(metadata_directory, config_settings=None):
...
Must create a .dist-info
directory containing wheel metadata
inside the specified metadata_directory
(i.e., creates a directory
like {metadata_directory}/{package}-{version}.dist-info/
). This
directory MUST be a valid .dist-info
directory as defined in the
wheel specification, except that it need not contain RECORD
or
signatures. The hook MAY also create other files inside this
directory, and a build frontend MUST preserve, but otherwise ignore, such files;
the intention
here is that in cases where the metadata depends on build-time
decisions, the build backend may need to record these decisions in
some convenient format for re-use by the actual wheel-building step.
This must return the basename (not the full path) of the .dist-info
directory it creates, as a unicode string.
If a build frontend needs this information and the method is
not defined, it should call build_wheel
and look at the resulting
metadata directly.
get_requires_for_build_sdist
def get_requires_for_build_sdist(config_settings=None):
...
This hook MUST return an additional list of strings containing PEP 508
dependency specifications, above and beyond those specified in the
pyproject.toml
file. These dependencies will be installed when calling the
build_sdist
hook.
If not defined, the default implementation is equivalent to return []
.
备注
Editable installs
This PEP originally specified another hook, install_editable
, to do an
editable install (as with pip install -e
). It was removed due to the
complexity of the topic, but may be specified in a later PEP.
Briefly, the questions to be answered include: what reasonable ways existing of implementing an ‘editable install’? Should the backend or the frontend pick how to make an editable install? And if the frontend does, what does it need from the backend to do so.
Config settings
config_settings
This argument, which is passed to all hooks, is an arbitrary
dictionary provided as an “escape hatch” for users to pass ad-hoc
configuration into individual package builds. Build backends MAY
assign any semantics they like to this dictionary. Build frontends
SHOULD provide some mechanism for users to specify arbitrary
string-key/string-value pairs to be placed in this dictionary. For
example, they might support some syntax like --package-config
CC=gcc
. Build frontends MAY also provide arbitrary other mechanisms
for users to place entries in this dictionary. For example, pip
might choose to map a mix of modern and legacy command line arguments
like:
pip install \
--package-config CC=gcc \
--global-option="--some-global-option" \
--build-option="--build-option1" \
--build-option="--build-option2"
into a config_settings
dictionary like:
{
"CC": "gcc",
"--global-option": ["--some-global-option"],
"--build-option": ["--build-option1", "--build-option2"],
}
Of course, it’s up to users to make sure that they pass options which make sense for the particular build backend and package that they are building.
The hooks may be called with positional or keyword arguments, so backends implementing them should be careful to make sure that their signatures match both the order and the names of the arguments above.
All hooks are run with working directory set to the root of the source tree, and MAY print arbitrary informational text on stdout and stderr. They MUST NOT read from stdin, and the build frontend MAY close stdin before invoking the hooks.
The build frontend may capture stdout and/or stderr from the backend. If the
backend detects that an output stream is not a terminal/console (e.g.
not sys.stdout.isatty()
), it SHOULD ensure that any output it writes to that
stream is UTF-8 encoded. The build frontend MUST NOT fail if captured output is
not valid UTF-8, but it MAY not preserve all the information in that case (e.g.
it may decode using the replace error handler in Python). If the output stream
is a terminal, the build backend is responsible for presenting its output
accurately, as for any program running in a terminal.
If a hook raises an exception, or causes the process to terminate, then this indicates an error.
Build environment
One of the responsibilities of a build frontend is to set up the Python environment in which the build backend will run.
We do not require that any particular “virtual environment” mechanism be used; a build frontend might use virtualenv, or venv, or no special mechanism at all. But whatever mechanism is used MUST meet the following criteria:
- All requirements specified by the project’s build-requirements must
be available for import from Python. In particular:
- The
get_requires_for_build_wheel
andget_requires_for_build_sdist
hooks are executed in an environment which contains the bootstrap requirements specified in thepyproject.toml
file. - The
prepare_metadata_for_build_wheel
andbuild_wheel
hooks are executed in an environment which contains the bootstrap requirements frompyproject.toml
and those specified by theget_requires_for_build_wheel
hook. - The
build_sdist
hook is executed in an environment which contains the bootstrap requirements frompyproject.toml
and those specified by theget_requires_for_build_sdist
hook.
- The
- This must remain true even for new Python subprocesses spawned by
the build environment, e.g. code like:
import sys, subprocess subprocess.check_call([sys.executable, ...])
must spawn a Python process which has access to all the project’s build-requirements. This is necessary e.g. for build backends that want to run legacy
setup.py
scripts in a subprocess. - All command-line scripts provided by the build-required packages
must be present in the build environment’s PATH. For example, if a
project declares a build-requirement on flit, then the following must
work as a mechanism for running the flit command-line tool:
import subprocess import shutil subprocess.check_call([shutil.which("flit"), ...])
A build backend MUST be prepared to function in any environment which meets the above criteria. In particular, it MUST NOT assume that it has access to any packages except those that are present in the stdlib, or that are explicitly declared as build-requirements.
Frontends should call each hook in a fresh subprocess, so that backends are free to change process global state (such as environment variables or the working directory). A Python library will be provided which frontends can use to easily call hooks this way.
Recommendations for build frontends (non-normative)
A build frontend MAY use any mechanism for setting up a build environment that meets the above criteria. For example, simply installing all build-requirements into the global environment would be sufficient to build any compliant package – but this would be sub-optimal for a number of reasons. This section contains non-normative advice to frontend implementors.
A build frontend SHOULD, by default, create an isolated environment for each build, containing only the standard library and any explicitly requested build-dependencies. This has two benefits:
- It allows for a single installation run to build multiple packages
that have contradictory build-requirements. E.g. if package1
build-requires pbr==1.8.1, and package2 build-requires pbr==1.7.2,
then these cannot both be installed simultaneously into the global
environment – which is a problem when the user requests
pip install package1 package2
. Or if the user already has pbr==1.8.1 installed in their global environment, and a package build-requires pbr==1.7.2, then downgrading the user’s version would be rather rude. - It acts as a kind of public health measure to maximize the number of packages that actually do declare accurate build-dependencies. We can write all the strongly worded admonitions to package authors we want, but if build frontends don’t enforce isolation by default, then we’ll inevitably end up with lots of packages on PyPI that build fine on the original author’s machine and nowhere else, which is a headache that no-one needs.
However, there will also be situations where build-requirements are
problematic in various ways. For example, a package author might
accidentally leave off some crucial requirement despite our best
efforts; or, a package might declare a build-requirement on foo >=
1.0
which worked great when 1.0 was the latest version, but now 1.1
is out and it has a showstopper bug; or, the user might decide to
build a package against numpy==1.7 – overriding the package’s
preferred numpy==1.8 – to guarantee that the resulting build will be
compatible at the C ABI level with an older version of numpy (even if
this means the resulting build is unsupported upstream). Therefore,
build frontends SHOULD provide some mechanism for users to override
the above defaults. For example, a build frontend could have a
--build-with-system-site-packages
option that causes the
--system-site-packages
option to be passed to
virtualenv-or-equivalent when creating build environments, or a
--build-requirements-override=my-requirements.txt
option that
overrides the project’s normal build-requirements.
The general principle here is that we want to enforce hygiene on package authors, while still allowing end-users to open up the hood and apply duct tape when necessary.
In-tree build backends
In certain circumstances, projects may wish to include the source code for the
build backend directly in the source tree, rather than referencing the backend
via the requires
key. Two specific situations where this would be expected
are:
- Backends themselves, which want to use their own features for building themselves (“self-hosting backends”)
- Project-specific backends, typically consisting of a custom wrapper around a standard backend, where the wrapper is too project-specific to be worth distributing independently (“in-tree backends”)
Projects can specify that their backend code is hosted in-tree by including the
backend-path
key in pyproject.toml
. This key contains a list of
directories, which the frontend will add to the start of sys.path
when
loading the backend, and running the backend hooks.
There are two restrictions on the content of the backend-path
key:
- Directories in
backend-path
are interpreted as relative to the project root, and MUST refer to a location within the source tree (after relative paths and symbolic links have been resolved). - The backend code MUST be loaded from one of the directories specified in
backend-path
(i.e., it is not permitted to specifybackend-path
and not have in-tree backend code).
The first restriction is to ensure that source trees remain self-contained, and cannot refer to locations outside of the source tree. Frontends SHOULD check this condition (typically by resolving the location to an absolute path and resolving symbolic links, and then checking it against the project root), and fail with an error message if it is violated.
The backend-path
feature is intended to support the implementation of
in-tree backends, and not to allow configuration of existing backends. The
second restriction above is specifically to ensure that this is how the feature
is used. Front ends MAY enforce this check, but are not required to. Doing so
would typically involve checking the backend’s __file__
attribute against
the locations in backend-path
.
Source distributions
We continue with the legacy sdist format, adding some new restrictions.
This format is mostly
undefined, but basically comes down to: a file named
{NAME}-{VERSION}.{EXT}
, which unpacks into a buildable source tree
called {NAME}-{VERSION}/
. Traditionally these have always
contained setup.py
-style source trees; we now allow them to also
contain pyproject.toml
-style source trees.
Integration frontends require that an sdist named
{NAME}-{VERSION}.{EXT}
will generate a wheel named
{NAME}-{VERSION}-{COMPAT-INFO}.whl
.
The new restrictions for sdists built by PEP 517 backends are:
- They will be gzipped tar archives, with the
.tar.gz
extension. Zip archives, or other compression formats for tarballs, are not allowed at present. - Tar archives must be created in the modern POSIX.1-2001 pax tar format, which uses UTF-8 for file names.
- The source tree contained in an sdist is expected to include the
pyproject.toml
file.
Evolutionary notes
A goal here is to make it as simple as possible to convert old-style
sdists to new-style sdists. (E.g., this is one motivation for
supporting dynamic build requirements.) The ideal would be that there
would be a single static pyproject.toml
that could be dropped into any
“version 0” VCS checkout to convert it to the new shiny. This is
probably not 100% possible, but we can get close, and it’s important
to keep track of how close we are… hence this section.
A rough plan would be: Create a build system package
(setuptools_pypackage
or whatever) that knows how to speak
whatever hook language we come up with, and convert them into calls to
setup.py
. This will probably require some sort of hooking or
monkeypatching to setuptools to provide a way to extract the
setup_requires=
argument when needed, and to provide a new version
of the sdist command that generates the new-style format. This all
seems doable and sufficient for a large proportion of packages (though
obviously we’ll want to prototype such a system before we finalize
anything here). (Alternatively, these changes could be made to
setuptools itself rather than going into a separate package.)
But there remain two obstacles that mean we probably won’t be able to automatically upgrade packages to the new format:
- There currently exist packages which insist on particular packages
being available in their environment before setup.py is
executed. This means that if we decide to execute build scripts in
an isolated virtualenv-like environment, then projects will need to
check whether they do this, and if so then when upgrading to the
new system they will have to start explicitly declaring these
dependencies (either via
setup_requires=
or via static declaration inpyproject.toml
). - There currently exist packages which do not declare consistent
metadata (e.g.
egg_info
andbdist_wheel
might get differentinstall_requires=
). When upgrading to the new system, projects will have to evaluate whether this applies to them, and if so they will need to stop doing that.
Rejected options
- We discussed making the wheel and sdist hooks build unpacked directories containing the same contents as their respective archives. In some cases this could avoid the need to pack and unpack an archive, but this seems like premature optimisation. It’s advantageous for tools to work with archives as the canonical interchange formats (especially for wheels, where the archive format is already standardised). Close control of archive creation is important for reproducible builds. And it’s not clear that tasks requiring an unpacked distribution will be more common than those requiring an archive.
- We considered an extra hook to copy files to a build directory before invoking
build_wheel
. Looking at existing build systems, we found that passing a build directory intobuild_wheel
made more sense for many tools than pre-emptively copying files into a build directory. - The idea of passing
build_wheel
a build directory was then also deemed an unnecessary complication. Build tools can use a temporary directory or a cache directory to store intermediate files while building. If there is a need, a frontend-controlled cache directory could be added in the future. - For
build_sdist
to signal a failure for an expected reason, various options were debated at great length, including raisingNotImplementedError
and returning eitherNotImplemented
orNone
. Please do not attempt to reopen this discussion without an extremely good reason, because we are quite tired of it. - Allowing the backend to be imported from files in the source tree would be
more consistent with the way Python imports often work. However, not allowing
this prevents confusing errors from clashing module names. The initial
version of this PEP did not provide a means to allow backends to be
imported from files within the source tree, but the
backend-path
key was added in the next revision to allow projects to opt into this behaviour if needed.
Summary of changes to PEP 517
The following changes were made to this PEP after the initial reference implementation was released in pip 19.0.
- Cycles in build requirements were explicitly prohibited.
- Support for in-tree backends and self-hosting of backends was added by
the introduction of the
backend-path
key in the[build-system]
table. - Clarified that the
setuptools.build_meta:__legacy__
PEP 517 backend is an acceptable alternative to directly invokingsetup.py
for source trees that don’t specifybuild-backend
explicitly.
Appendix A: Comparison to PEP 516
PEP 516 is a competing proposal to specify a build system interface, which has now been rejected in favour of this PEP. The primary difference is that our build backend is defined via a Python hook-based interface rather than a command-line based interface.
This appendix documents the arguments advanced for this PEP over PEP 516.
We do not expect that specifying Python hooks rather than command line
interfaces will, by itself, reduce the
complexity of calling into the backend, because build frontends will
in any case want to run hooks inside a child – this is important to
isolate the build frontend itself from the backend code and to better
control the build backends execution environment. So under both
proposals, there will need to be some code in pip
to spawn a
subprocess and talk to some kind of command-line/IPC interface, and
there will need to be some code in the subprocess that knows how to
parse these command line arguments and call the actual build backend
implementation. So this diagram applies to all proposals equally:
+-----------+ +---------------+ +----------------+
| frontend | -spawn-> | child cmdline | -Python-> | backend |
| (pip) | | interface | | implementation |
+-----------+ +---------------+ +----------------+
The key difference between the two approaches is how these interface boundaries map onto project structure:
.-= This PEP =-.
+-----------+ +---------------+ | +----------------+
| frontend | -spawn-> | child cmdline | -Python-> | backend |
| (pip) | | interface | | | implementation |
+-----------+ +---------------+ | +----------------+
|
|______________________________________| |
Owned by pip, updated in lockstep |
|
|
PEP-defined interface boundary
Changes here require distutils-sig
.-= Alternative =-.
+-----------+ | +---------------+ +----------------+
| frontend | -spawn-> | child cmdline | -Python-> | backend |
| (pip) | | | interface | | implementation |
+-----------+ | +---------------+ +----------------+
|
| |____________________________________________|
| Owned by build backend, updated in lockstep
|
PEP-defined interface boundary
Changes here require distutils-sig
By moving the PEP-defined interface boundary into Python code, we gain three key advantages.
First, because there will likely be only a small number of build
frontends (pip
, and… maybe a few others?), while there will
likely be a long tail of custom build backends (since these are chosen
separately by each package to match their particular build
requirements), the actual diagrams probably look more like:
.-= This PEP =-.
+-----------+ +---------------+ +----------------+
| frontend | -spawn-> | child cmdline | -Python+> | backend |
| (pip) | | interface | | | implementation |
+-----------+ +---------------+ | +----------------+
|
| +----------------+
+> | backend |
| | implementation |
| +----------------+
:
:
.-= Alternative =-.
+-----------+ +---------------+ +----------------+
| frontend | -spawn+> | child cmdline | -Python-> | backend |
| (pip) | | | interface | | implementation |
+-----------+ | +---------------+ +----------------+
|
| +---------------+ +----------------+
+> | child cmdline | -Python-> | backend |
| | interface | | implementation |
| +---------------+ +----------------+
:
:
That is, this PEP leads to less total code in the overall ecosystem. And in particular, it reduces the barrier to entry of making a new build system. For example, this is a complete, working build backend:
# mypackage_custom_build_backend.py
import os.path
import pathlib
import shutil
import tarfile
SDIST_NAME = "mypackage-0.1"
SDIST_FILENAME = SDIST_NAME + ".tar.gz"
WHEEL_FILENAME = "mypackage-0.1-py2.py3-none-any.whl"
#################
# sdist creation
#################
def _exclude_hidden_and_special_files(archive_entry):
"""Tarfile filter to exclude hidden and special files from the archive"""
if archive_entry.isfile() or archive_entry.isdir():
if not os.path.basename(archive_entry.name).startswith("."):
return archive_entry
def _make_sdist(sdist_dir):
"""Make an sdist and return both the Python object and its filename"""
sdist_path = pathlib.Path(sdist_dir) / SDIST_FILENAME
sdist = tarfile.open(sdist_path, "w:gz", format=tarfile.PAX_FORMAT)
# Tar up the whole directory, minus hidden and special files
sdist.add(os.getcwd(), arcname=SDIST_NAME,
filter=_exclude_hidden_and_special_files)
return sdist, SDIST_FILENAME
def build_sdist(sdist_dir, config_settings):
"""PEP 517 sdist creation hook"""
sdist, sdist_filename = _make_sdist(sdist_dir)
return sdist_filename
#################
# wheel creation
#################
def get_requires_for_build_wheel(config_settings):
"""PEP 517 wheel building dependency definition hook"""
# As a simple static requirement, this could also just be
# listed in the project's build system dependencies instead
return ["wheel"]
def build_wheel(wheel_directory,
metadata_directory=None, config_settings=None):
"""PEP 517 wheel creation hook"""
from wheel.archive import archive_wheelfile
path = os.path.join(wheel_directory, WHEEL_FILENAME)
archive_wheelfile(path, "src/")
return WHEEL_FILENAME
Of course, this is a terrible build backend: it requires the user to
have manually set up the wheel metadata in
src/mypackage-0.1.dist-info/
; when the version number changes it
must be manually updated in multiple places… but it works, and more features
could be added incrementally. Much experience suggests that large successful
projects often originate as quick hacks (e.g., Linux – “just a hobby,
won’t be big and professional”; IPython/Jupyter – a grad
student’s $PYTHONSTARTUP file),
so if our goal is to encourage the growth of a vibrant ecosystem of
good build tools, it’s important to minimize the barrier to entry.
Second, because Python provides a simpler yet richer structure for
describing interfaces, we remove unnecessary complexity from the
specification – and specifications are the worst place for
complexity, because changing specifications requires painful
consensus-building across many stakeholders. In the command-line
interface approach, we have to come up with ad hoc ways to map
multiple different kinds of inputs into a single linear command line
(e.g. how do we avoid collisions between user-specified configuration
arguments and PEP-defined arguments? how do we specify optional
arguments? when working with a Python interface these questions have
simple, obvious answers). When spawning and managing subprocesses,
there are many fiddly details that must be gotten right, subtle
cross-platform differences, and some of the most obvious approaches –
e.g., using stdout to return data for the build_requires
operation
– can create unexpected pitfalls (e.g., what happens when computing
the build requirements requires spawning some child processes, and
these children occasionally print an error message to stdout?
obviously a careful build backend author can avoid this problem, but
the most obvious way of defining a Python interface removes this
possibility entirely, because the hook return value is clearly
demarcated).
In general, the need to isolate build backends into their own process means that we can’t remove IPC complexity entirely – but by placing both sides of the IPC channel under the control of a single project, we make it much cheaper to fix bugs in the IPC interface than if fixing bugs requires coordinated agreement and coordinated changes across the ecosystem.
Third, and most crucially, the Python hook approach gives us much more powerful options for evolving this specification in the future.
For concreteness, imagine that next year we add a new
build_sdist_from_vcs
hook, which provides an alternative to the current
build_sdist
hook where the frontend is responsible for passing
version control tracking metadata to backends (including indicating when all
on disk files are tracked), rather than individual backends having to query that
information themselves. In order to manage the transition, we’d want it to be
possible for build frontends to transparently use build_sdist_from_vcs
when
available and fall back onto build_sdist
otherwise; and we’d want it to be
possible for build backends to define both methods, for compatibility
with both old and new build frontends.
Furthermore, our mechanism should also fulfill two more goals: (a) If
new versions of e.g. pip
and flit
are both updated to support
the new interface, then this should be sufficient for it to be used;
in particular, it should not be necessary for every project that
uses flit
to update its individual pyproject.toml
file. (b)
We do not want to have to spawn extra processes just to perform this
negotiation, because process spawns can easily become a bottleneck when
deploying large multi-package stacks on some platforms (Windows).
In the interface described here, all of these goals are easy to
achieve. Because pip
controls the code that runs inside the child
process, it can easily write it to do something like:
command, backend, args = parse_command_line_args(...)
if command == "build_sdist":
if hasattr(backend, "build_sdist_from_vcs"):
backend.build_sdist_from_vcs(...)
elif hasattr(backend, "build_sdist"):
backend.build_sdist(...)
else:
# error handling
In the alternative where the public interface boundary is placed at
the subprocess call, this is not possible – either we need to spawn
an extra process just to query what interfaces are supported (as was
included in an earlier draft of PEP 516, an alternative to this), or
else we give up on autonegotiation entirely (as in the current version
of that PEP), meaning that any changes in the interface will require
N individual packages to update their pyproject.toml
files before
any change can go live, and that any changes will necessarily be
restricted to new releases.
One specific consequence of this is that in this PEP, we’re able to
make the prepare_metadata_for_build_wheel
command optional. In our design,
this can be readily handled by build frontends, which can put code in
their subprocess runner like:
def dump_wheel_metadata(backend, working_dir):
"""Dumps wheel metadata to working directory.
Returns absolute path to resulting metadata directory
"""
if hasattr(backend, "prepare_metadata_for_build_wheel"):
subdir = backend.prepare_metadata_for_build_wheel(working_dir)
else:
wheel_fname = backend.build_wheel(working_dir)
already_built = os.path.join(working_dir, "ALREADY_BUILT_WHEEL")
with open(already_built, "w") as f:
f.write(wheel_fname)
subdir = unzip_metadata(os.path.join(working_dir, wheel_fname))
return os.path.join(working_dir, subdir)
def ensure_wheel_is_built(backend, output_dir, working_dir, metadata_dir):
"""Ensures built wheel is available in output directory
Returns absolute path to resulting wheel file
"""
already_built = os.path.join(working_dir, "ALREADY_BUILT_WHEEL")
if os.path.exists(already_built):
with open(already_built, "r") as f:
wheel_fname = f.read().strip()
working_path = os.path.join(working_dir, wheel_fname)
final_path = os.path.join(output_dir, wheel_fname)
os.rename(working_path, final_path)
os.remove(already_built)
else:
wheel_fname = backend.build_wheel(output_dir, metadata_dir=metadata_dir)
return os.path.join(output_dir, wheel_fname)
and thus expose a totally uniform interface to the rest of the frontend, with no extra subprocess calls, no duplicated builds, etc. But obviously this is the kind of code that you only want to write as part of a private, within-project interface (e.g. the given example requires that the working directory be shared between the two calls, but not with any other wheel builds, and that the return value from the metadata helper function will be passed back in to the wheel building one).
(And, of course, making the metadata
command optional is one piece
of lowering the barrier to entry for developing new backends, as discussed
above.)
Other differences
Besides the key command line versus Python hook difference described above, there are a few other differences in this proposal:
- Metadata command is optional (as described above).
- We return metadata as a directory, rather than a single METADATA file. This aligns better with the way that in practice wheel metadata is distributed across multiple files (e.g. entry points), and gives us more options in the future. (For example, instead of following the PEP 426 proposal of switching the format of METADATA to JSON, we might decide to keep the existing METADATA the way it is for backcompat, while adding new extensions as JSON “sidecar” files inside the same directory. Or maybe not; the point is it keeps our options more open.)
- We provide a mechanism for passing information between the metadata step and the wheel building step. I guess everyone probably will agree this is a good idea?
- We provide more detailed recommendations about the build environment, but these aren’t normative anyway.
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/pep-0517.txt
Last modified: 2022-06-14 21:22:20 GMT