PEP 518 – Specifying Minimum Build System Requirements for Python Projects
- Author:
- Brett Cannon <brett at python.org>, Nathaniel Smith <njs at pobox.com>, Donald Stufft <donald at stufft.io>
- BDFL-Delegate:
- Nick Coghlan
- Discussions-To:
- Distutils-SIG list
- Status:
- Final
- Type:
- Standards Track
- Topic:
- Packaging
- Created:
- 10-May-2016
- Post-History:
- 10-May-2016, 11-May-2016, 13-May-2016
- Resolution:
- Distutils-SIG message
摘要
这个 PEP 指定了 Python 软件包应该如何指定它们所拥有的构建依赖关系,以便执行所选择的构建系统。作为该规范的一部分,为软件包引入了新的配置文件,用于指定它们的构建依赖关系(预期相同的配置文件将用于未来的配置细节)。
基本原理
当 Python 第一次开发用于构建项目软件发行版的工具时,distutils [1] 是被选择的解决方案。随着时间的推移,setuptools [2] 越来越受欢迎,它在 distutils 之上添加了一些功能。两者都使用了 setup.py
文件的概念,项目维护者执行该文件来构建其软件的发行版(以及用户安装该发行版)。
使用可执行文件在 distutils 下指定构建需求不是问题,因为 distutils 是 Python 标准库的一部分。将构建工具作为 Python 的一部分意味着 setup.py
没有项目维护者在构建项目发行版时需要担心的外部依赖项。没有必要指定任何依赖项信息,因为唯一的依赖项就是 Python。
但是当项目选择使用 setuptools 时,像 setup.py
这样的可执行文件的使用就成了问题。你不能在不知道它的依赖关系的情况下执行 setup.py
文件,但目前没有标准的方法来自动地知道这些依赖关系是什么,而不执行存储信息的 setup.py
文件。这是进退两难的局面,文件如果不知道它自己的内容就不能运行,除非你运行文件,否则无法通过编程知道它的内容。
Setuptools 试图用 setup_requires
参数来解决这个问题,它的 setup()
函数 [3]。这个解决方案有很多问题,比如:
- 没有任何工具(除了 setuptools 本身)可以在不执行
setup.py
的情况下访问这些信息,但是如果没有安装这些项,setup.py
就不能执行。 - 虽然 setuptools 本身将安装中列出这样的东西,他们不会被安装到执行期间
setup()
的函数,这意味着实际使用任何添加的唯一途径是通过日益复杂的机制,推迟 import 和使用这些模块,直到后来的执行setup()
函数。 - 这不能包括
setuptools
本身,也不能包括setuptools
的替代品,这意味着numpy.distutils
等项目,在很大程度上无法利用它,项目不能利用更新的setuptools
功能,直到他们的用户自然地将setuptools
版本升级到更新的版本。 - 当您执行
setup.py
时,setup_requires
中列出的项目将隐式安装,但是执行setup.py
的常见方式之一是通过另一个工具,例如pip
,它已经在管理依赖项。这意味着像pip install spam
这样的命令可能最终会让pip
和setuptools
同时下载和安装包,最终用户需要配置这两个工具(对于setuptools
来说,不需要控制调用)来更改设置,比如从哪个存储库安装。这还意味着用户需要了解这两种工具的发现规则,因为工具可能支持不同的包格式,或者以不同的方式确定最新版本。
这种情况导致很少使用 setup_requires
,项目倾向于简单地在 setup.py
文件之间复制和粘贴代码片段,或者在试图构建或安装他们的项目之前简单地在其他地方记录他们希望用户手动安装的内容。
所有这些导致pip [4] 简单地假设在执行 setup.py
文件时 setuptools 是必要的。但问题是,如果另一个项目开始像 setuptools 一样在社区中获得吸引力,它就无法扩展。当 pip 无法推断除了 setuptools 之外还需要其他工具时,它还可以防止其他项目获得关注,因为在项目中使用它所需的摩擦。
这个 PEP 试图通过指定一种方法在特定文件中以声明的方式列出项目构建系统的最小依赖项来纠正这种情况。这允许项目列出从源代码签出到轮的构建依赖关系,而不会陷入 setup.py
所具有的 catch-22 陷阱,即工具无法推断项目需要自己构建什么。实现这个 PEP 将允许项目预先指定它们依赖的构建系统,这样像 pip
这样的工具就可以确保它们被安装,以便运行构建系统来构建项目。
为了为这个 PEP 提供更多的上下文和动机,请考虑为项目生成构建工件所需的(粗略的)步骤:
- 项目的源代码检出
- 构建系统的安装。
- 执行构建系统。
这个 PEP 涵盖了第 2 步。PEP 517 涵盖了第 3 步,包括如何让构建系统动态地指定构建系统执行其工作所需的更多依赖项。不过,这个 PEP 的目的是指定构建系统开始执行所需的最小需求集。
规范
文件格式
构建系统依赖项将存储在以 TOML 格式 [6] 编写名为 pyproject.toml
的文件中。
选择这种格式是因为它是人类可用的(不像JSON [7]),它足够灵活(不像 configparser [9]),源于一个标准(也不像 configparser [9]),并且它不是太复杂(不像 YAML [8])。TOML 格式已经被 Rust 社区作为 Cargo 包管理器 [14] 的一部分使用,并且在私人邮件中表示他们对选择 TOML 非常满意。关于为什么没有选择各种替代方案的更彻底的讨论可以在 Other file formats 一节中阅读。不过,作者确实意识到,配置文件格式的选择最终是主观的,必须做出选择,对于这种情况,作者更倾向于 TOML。
下面我们列出了工具应该认可/尊重的表格。本 PEP 中未指定的表将保留给其他 PEP 将来使用。
build-system 表
[build-system]
表用于存储与构建相关的数据。最初,表中只有一个键是有效的,并且对于表来说是强制性的:requires
。该键必须具有字符串列表的值,该字符串列表表示执行构建系统所需的 PEP 508 依赖项(目前这意味着执行 setup.py
文件所需的依赖项)。
对于绝大多数依赖于 setuptools 的 Python 项目来说,pyproject.toml
将是:
[build-system]
# Minimum requirements for the build system to execute.
requires = ["setuptools", "wheel"] # PEP 508 specifications.
因为目前 setuptools 和 wheel 的使用在社区中非常广泛,当 pyproject.toml
文件不存在时,构建工具将使用上面的示例配置文件作为默认语义。
工具不应该要求存在 [build-system]
表。pyproject.toml
文件可能用于存储与构建相关的数据以外的配置细节,因此合法地缺少 [build-system]
表。如果文件存在,但是缺少 [build-system]
表,那么应该使用上面指定的默认值。如果指定了表,但缺少必需的字段,那么工具应该认为这是一个错误。
tool 表
[tool]
表是任何与你的 Python 项目相关的工具(不仅仅是构建工具)都可以让用户指定配置数据的地方,只要他们使用 [tool]
中的子表,例如 flit 工具会将其配置存储在 [tool.flit]
中。
需要一些机制来在 tool.*
命名空间中分配名称,以确保不同的项目不会试图使用相同的子表而发生冲突。我们的规则是项目可以使用子表工具。当且仅当他们拥有 Cheeseshop/PyPI 中 $NAME
的条目时。
JSON 模式
为了从 TOML 文件中提供特定类型的结果数据表示,仅用于说明目的,以下 JSON 模式 [15] 将匹配数据格式:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"additionalProperties": false,
"properties": {
"build-system": {
"type": "object",
"additionalProperties": false,
"properties": {
"requires": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": ["requires"]
},
"tool": {
"type": "object"
}
}
}
拒绝的想法
语义版本键
为了保证配置文件的结构不受未来影响,最初提出了 semantics-version
键。默认为 1
,其想法是,如果之前定义的键或表发生了任何语义更改,而这些更改不向后兼容,那么 semantics-version
将被增加为新数字。
但最终,我们认为这是不成熟的优化。我们期望对配置文件中预定义语义的更改相当保守。在可能发生向后不兼容更改的情况下,可以使用不同的名称来表示新的语义,以避免破坏旧的工具。
嵌套更多的命名空间
这个 PEP 的早期草案有一个顶级的 [package]
表。这个想法是为语义版本控制方案强加一些作用域(参见 A semantic version key 了解为什么这个想法被拒绝)。随着范围需求的消除,拥有顶级表的意义变得多余。
其他表名称
为 [build-system]
表提议的另一个名称是 [build]
。替代名称更短,但不能传达表中存储哪些信息的意图。在对 distutils-sig 邮件列表进行投票后,当前的名称胜出。
其他文件格式
曾提出其他几种文件格式供考虑,但均因各种原因被拒绝。关键的要求是格式可以由人编辑,并且有一个可以由项目轻松提供的实现。这完全排除了 XML 等对人类不友好的格式,而且从未认真讨论过。
Overview of file formats considered
The key reasons for rejecting the other alternatives considered are summarised in the following sections, while the full review (including positive arguments in favour of TOML) can be found at [16].
TOML was ultimately selected as it provided all the features we were interested in, while avoiding the downsides introduced by the alternatives.
Feature | TOML | YAML | JSON | CFG/INI |
---|---|---|---|---|
Well-defined | yes | yes | yes | |
Real data types | yes | yes | yes | |
Reliable Unicode | yes | yes | yes | |
Reliable comments | yes | yes | ||
Easy for humans to edit | yes | ?? | ?? | |
Easy for tools to edit | yes | ?? | yes | ?? |
In standard library | yes | yes | ||
Easy for pip to vendor | yes | n/a | n/a |
(“??” in the table indicates items where most folks would be inclined to answer “yes”, but there turn out to be a lot of quirks and edge cases that arise in practice due to either the lack of a clear specification, or else the underlying file format specification being surprisingly complicated)
The pytoml
TOML parser is ~300 lines of pure Python code,
so being outside the standard library didn’t count heavily
against it.
Python literals were also discussed as a potential format, but weren’t considered in the file format review (since they’re not a common pre-existing file format).
JSON
The JSON format [7] was initially considered but quickly rejected. While great as a human-readable, string-based data exchange format, the syntax does not lend itself to easy editing by a human being (e.g. the syntax is more verbose than necessary while not allowing for comments).
An example JSON file for the proposed data would be:
{
"build": {
"requires": [
"setuptools",
"wheel>=0.27"
]
}
}
YAML
The YAML format [8] was designed to be a superset of JSON [7] while being easier to work with by hand. There are three main issues with YAML.
One is that the specification is large: 86 pages if printed on letter-sized paper. That leaves the possibility that someone may use a feature of YAML that works with one parser but not another. It has been suggested to standardize on a subset, but that basically means creating a new standard specific to this file which is not tractable long-term.
Two is that YAML itself is not safe by default. The specification
allows for the arbitrary execution of code which is best avoided when
dealing with configuration data. It is of course possible to avoid
this behavior – for example, PyYAML provides a safe_load
operation
– but if any tool carelessly uses load
instead then they open
themselves up to arbitrary code execution. While this PEP is focused on
the building of projects which inherently involves code execution,
other configuration data such as project name and version number may
end up in the same file someday where arbitrary code execution is not
desired.
And finally, the most popular Python implementation of YAML is PyYAML [10] which is a large project of a few thousand lines of code and an optional C extension module. While in and of itself this isn’t necessarily an issue, this becomes more of a problem for projects like pip where they would most likely need to vendor PyYAML as a dependency so as to be fully self-contained (otherwise you end up with your install tool needing an install tool to work). A proof-of-concept re-working of PyYAML has been done to see how easy it would be to potentially vendor a simpler version of the library which shows it is a possibility.
An example YAML file is:
build:
requires:
- setuptools
- wheel>=0.27
configparser
An INI-style configuration file based on what configparser [9] accepts was considered. Unfortunately there is no specification of what configparser accepts, leading to support skew between versions. For instance, what ConfigParser in Python 2.7 accepts is not the same as what configparser in Python 3 accepts. While one could standardize on what Python 3 accepts and simply vendor the backport of the configparser module, that does mean this PEP would have to codify that the backport of configparser must be used by all project wishes to consume the metadata specified by this PEP. This is overly restrictive and could lead to confusion if someone is not aware of that a specific version of configparser is expected.
An example INI file is:
[build]
requires =
setuptools
wheel>=0.27
Python literals
Someone proposed using Python literals as the configuration format.
The file would contain one dict at the top level, with the data all
inside that dict, with sections defined by the keys. All Python
programmers would be used to the format, there would implicitly be no
third-party dependency to read the configuration data, and it can be
safe if parsed by ast.literal_eval()
[13].
Python literals can be identical to JSON, with the added benefit of
supporting trailing commas and comments. In addition, Python’s richer
data model may be useful for some future configuration needs (e.g. non-string
dict keys, floating point vs. integer values).
On the other hand, python literals are a Python-specific format, and it is anticipated that these data may need to be read by packaging tools, etc. that are not written in Python.
An example Python literal file for the proposed data would be:
# The build configuration
{"build": {"requires": ["setuptools",
"wheel>=0.27", # note the trailing comma
# "numpy>=1.10" # a commented out data line
]
# and here is an arbitrary comment.
}
}
Sticking with setup.cfg
There are two issues with setup.cfg
used by setuptools as a general
format. One is that they are .ini
files which have issues as mentioned
in the configparser discussion above. The other is that the schema for
that file has never been rigorously defined and thus it’s unknown which
format would be safe to use going forward without potentially confusing
setuptools installations.
Other file names
Several other file names were considered and rejected (although this is very much a bikeshedding topic, and so the decision comes down to mostly taste).
- pysettings.toml
- Most reasonable alternative.
- pypa.toml
- While it makes sense to reference the PyPA [11], it is a somewhat niche term. It’s better to have the file name make sense without having domain-specific knowledge.
- pybuild.toml
- From the restrictive perspective of this PEP this filename makes sense, but if any non-build metadata ever gets added to the file then the name ceases to make sense.
- pip.toml
- Too tool-specific.
- meta.toml
- Too generic; project may want to have its own metadata file.
- setup.toml
- While keeping with traditional thanks to
setup.py
, it does not necessarily match what the file may contain in the future (e.g. is knowing the name of a project inherently part of its setup?). - pymeta.toml
- Not obvious to newcomers to programming and/or Python.
- pypackage.toml & pypackaging.toml
- Name conflation of what a “package” is (project versus namespace).
- pydevelop.toml
- The file may contain details not specific to development.
- pysource.toml
- Not directly related to source code.
- pytools.toml
- Misleading as the file is (currently) aimed at project management.
- dstufft.toml
- Too person-specific. ;)
References
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/pep-0518.txt
Last modified: 2022-09-14 15:48:22 GMT