Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 518 – Specifying Minimum Build System Requirements for Python Projects

Author:
Brett Cannon <brett at python.org>, Nathaniel Smith <njs at pobox.com>, Donald Stufft <donald at stufft.io>
BDFL-Delegate:
Nick Coghlan
Discussions-To:
Distutils-SIG list
Status:
Final
Type:
Standards Track
Topic:
Packaging
Created:
10-May-2016
Post-History:
10-May-2016, 11-May-2016, 13-May-2016
Resolution:
Distutils-SIG message

Table of Contents

摘要

这个 PEP 指定了 Python 软件包应该如何指定它们所拥有的构建依赖关系,以便执行所选择的构建系统。作为该规范的一部分,为软件包引入了新的配置文件,用于指定它们的构建依赖关系(预期相同的配置文件将用于未来的配置细节)。

基本原理

当 Python 第一次开发用于构建项目软件发行版的工具时,distutils [1] 是被选择的解决方案。随着时间的推移,setuptools [2] 越来越受欢迎,它在 distutils 之上添加了一些功能。两者都使用了 setup.py 文件的概念,项目维护者执行该文件来构建其软件的发行版(以及用户安装该发行版)。

使用可执行文件在 distutils 下指定构建需求不是问题,因为 distutils 是 Python 标准库的一部分。将构建工具作为 Python 的一部分意味着 setup.py 没有项目维护者在构建项目发行版时需要担心的外部依赖项。没有必要指定任何依赖项信息,因为唯一的依赖项就是 Python。

但是当项目选择使用 setuptools 时,像 setup.py 这样的可执行文件的使用就成了问题。你不能在不知道它的依赖关系的情况下执行 setup.py 文件,但目前没有标准的方法来自动地知道这些依赖关系是什么,而不执行存储信息的 setup.py 文件。这是进退两难的局面,文件如果不知道它自己的内容就不能运行,除非你运行文件,否则无法通过编程知道它的内容。

Setuptools 试图用 setup_requires 参数来解决这个问题,它的 setup() 函数 [3]。这个解决方案有很多问题,比如:

  • 没有任何工具(除了 setuptools 本身)可以在不执行 setup.py 的情况下访问这些信息,但是如果没有安装这些项,setup.py 就不能执行。
  • 虽然 setuptools 本身将安装中列出这样的东西,他们不会被安装到执行期间 setup() 的函数,这意味着实际使用任何添加的唯一途径是通过日益复杂的机制,推迟 import 和使用这些模块,直到后来的执行 setup() 函数。
  • 这不能包括 setuptools 本身,也不能包括 setuptools 的替代品,这意味着 numpy.distutils 等项目,在很大程度上无法利用它,项目不能利用更新的 setuptools 功能,直到他们的用户自然地将 setuptools 版本升级到更新的版本。
  • 当您执行 setup.py 时,setup_requires 中列出的项目将隐式安装,但是执行 setup.py 的常见方式之一是通过另一个工具,例如 pip,它已经在管理依赖项。这意味着像 pip install spam 这样的命令可能最终会让 pipsetuptools 同时下载和安装包,最终用户需要配置这两个工具(对于 setuptools 来说,不需要控制调用)来更改设置,比如从哪个存储库安装。这还意味着用户需要了解这两种工具的发现规则,因为工具可能支持不同的包格式,或者以不同的方式确定最新版本。

这种情况导致很少使用 setup_requires,项目倾向于简单地在 setup.py 文件之间复制和粘贴代码片段,或者在试图构建或安装他们的项目之前简单地在其他地方记录他们希望用户手动安装的内容。

所有这些导致pip [4] 简单地假设在执行 setup.py 文件时 setuptools 是必要的。但问题是,如果另一个项目开始像 setuptools 一样在社区中获得吸引力,它就无法扩展。当 pip 无法推断除了 setuptools 之外还需要其他工具时,它还可以防止其他项目获得关注,因为在项目中使用它所需的摩擦。

这个 PEP 试图通过指定一种方法在特定文件中以声明的方式列出项目构建系统的最小依赖项来纠正这种情况。这允许项目列出从源代码签出到轮的构建依赖关系,而不会陷入 setup.py 所具有的 catch-22 陷阱,即工具无法推断项目需要自己构建什么。实现这个 PEP 将允许项目预先指定它们依赖的构建系统,这样像 pip 这样的工具就可以确保它们被安装,以便运行构建系统来构建项目。

为了为这个 PEP 提供更多的上下文和动机,请考虑为项目生成构建工件所需的(粗略的)步骤:

  1. 项目的源代码检出
  2. 构建系统的安装。
  3. 执行构建系统。

这个 PEP 涵盖了第 2 步。PEP 517 涵盖了第 3 步,包括如何让构建系统动态地指定构建系统执行其工作所需的更多依赖项。不过,这个 PEP 的目的是指定构建系统开始执行所需的最小需求集。

规范

文件格式

构建系统依赖项将存储在以 TOML 格式 [6] 编写名为 pyproject.toml 的文件中。

选择这种格式是因为它是人类可用的(不像JSON [7]),它足够灵活(不像 configparser [9]),源于一个标准(也不像 configparser [9]),并且它不是太复杂(不像 YAML [8])。TOML 格式已经被 Rust 社区作为 Cargo 包管理器 [14] 的一部分使用,并且在私人邮件中表示他们对选择 TOML 非常满意。关于为什么没有选择各种替代方案的更彻底的讨论可以在 Other file formats 一节中阅读。不过,作者确实意识到,配置文件格式的选择最终是主观的,必须做出选择,对于这种情况,作者更倾向于 TOML。

下面我们列出了工具应该认可/尊重的表格。本 PEP 中未指定的表将保留给其他 PEP 将来使用。

build-system 表

[build-system] 表用于存储与构建相关的数据。最初,表中只有一个键是有效的,并且对于表来说是强制性的:requires。该键必须具有字符串列表的值,该字符串列表表示执行构建系统所需的 PEP 508 依赖项(目前这意味着执行 setup.py 文件所需的依赖项)。

对于绝大多数依赖于 setuptools 的 Python 项目来说,pyproject.toml 将是:

[build-system]
# Minimum requirements for the build system to execute.
requires = ["setuptools", "wheel"]  # PEP 508 specifications.

因为目前 setuptools 和 wheel 的使用在社区中非常广泛,当 pyproject.toml 文件不存在时,构建工具将使用上面的示例配置文件作为默认语义。

工具不应该要求存在 [build-system] 表。pyproject.toml 文件可能用于存储与构建相关的数据以外的配置细节,因此合法地缺少 [build-system] 表。如果文件存在,但是缺少 [build-system] 表,那么应该使用上面指定的默认值。如果指定了表,但缺少必需的字段,那么工具应该认为这是一个错误。

tool 表

[tool] 表是任何与你的 Python 项目相关的工具(不仅仅是构建工具)都可以让用户指定配置数据的地方,只要他们使用 [tool] 中的子表,例如 flit 工具会将其配置存储在 [tool.flit] 中。

需要一些机制来在 tool.* 命名空间中分配名称,以确保不同的项目不会试图使用相同的子表而发生冲突。我们的规则是项目可以使用子表工具。当且仅当他们拥有 Cheeseshop/PyPI 中 $NAME 的条目时。

JSON 模式

为了从 TOML 文件中提供特定类型的结果数据表示,仅用于说明目的,以下 JSON 模式 [15] 将匹配数据格式:

{
    "$schema": "http://json-schema.org/schema#",

    "type": "object",
    "additionalProperties": false,

    "properties": {
        "build-system": {
            "type": "object",
            "additionalProperties": false,

            "properties": {
                "requires": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                }
            },
            "required": ["requires"]
        },

        "tool": {
            "type": "object"
        }
    }
}

拒绝的想法

语义版本键

为了保证配置文件的结构不受未来影响,最初提出了 semantics-version 键。默认为 1,其想法是,如果之前定义的键或表发生了任何语义更改,而这些更改不向后兼容,那么 semantics-version 将被增加为新数字。

但最终,我们认为这是不成熟的优化。我们期望对配置文件中预定义语义的更改相当保守。在可能发生向后不兼容更改的情况下,可以使用不同的名称来表示新的语义,以避免破坏旧的工具。

嵌套更多的命名空间

这个 PEP 的早期草案有一个顶级的 [package] 表。这个想法是为语义版本控制方案强加一些作用域(参见 A semantic version key 了解为什么这个想法被拒绝)。随着范围需求的消除,拥有顶级表的意义变得多余。

其他表名称

[build-system] 表提议的另一个名称是 [build]。替代名称更短,但不能传达表中存储哪些信息的意图。在对 distutils-sig 邮件列表进行投票后,当前的名称胜出。

其他文件格式

曾提出其他几种文件格式供考虑,但均因各种原因被拒绝。关键的要求是格式可以由人编辑,并且有一个可以由项目轻松提供的实现。这完全排除了 XML 等对人类不友好的格式,而且从未认真讨论过。

Overview of file formats considered

The key reasons for rejecting the other alternatives considered are summarised in the following sections, while the full review (including positive arguments in favour of TOML) can be found at [16].

TOML was ultimately selected as it provided all the features we were interested in, while avoiding the downsides introduced by the alternatives.

Feature TOML YAML JSON CFG/INI
Well-defined yes yes yes
Real data types yes yes yes
Reliable Unicode yes yes yes
Reliable comments yes yes
Easy for humans to edit yes ?? ??
Easy for tools to edit yes ?? yes ??
In standard library yes yes
Easy for pip to vendor yes n/a n/a

(“??” in the table indicates items where most folks would be inclined to answer “yes”, but there turn out to be a lot of quirks and edge cases that arise in practice due to either the lack of a clear specification, or else the underlying file format specification being surprisingly complicated)

The pytoml TOML parser is ~300 lines of pure Python code, so being outside the standard library didn’t count heavily against it.

Python literals were also discussed as a potential format, but weren’t considered in the file format review (since they’re not a common pre-existing file format).

JSON

The JSON format [7] was initially considered but quickly rejected. While great as a human-readable, string-based data exchange format, the syntax does not lend itself to easy editing by a human being (e.g. the syntax is more verbose than necessary while not allowing for comments).

An example JSON file for the proposed data would be:

{
    "build": {
        "requires": [
            "setuptools",
            "wheel>=0.27"
        ]
    }
}

YAML

The YAML format [8] was designed to be a superset of JSON [7] while being easier to work with by hand. There are three main issues with YAML.

One is that the specification is large: 86 pages if printed on letter-sized paper. That leaves the possibility that someone may use a feature of YAML that works with one parser but not another. It has been suggested to standardize on a subset, but that basically means creating a new standard specific to this file which is not tractable long-term.

Two is that YAML itself is not safe by default. The specification allows for the arbitrary execution of code which is best avoided when dealing with configuration data. It is of course possible to avoid this behavior – for example, PyYAML provides a safe_load operation – but if any tool carelessly uses load instead then they open themselves up to arbitrary code execution. While this PEP is focused on the building of projects which inherently involves code execution, other configuration data such as project name and version number may end up in the same file someday where arbitrary code execution is not desired.

And finally, the most popular Python implementation of YAML is PyYAML [10] which is a large project of a few thousand lines of code and an optional C extension module. While in and of itself this isn’t necessarily an issue, this becomes more of a problem for projects like pip where they would most likely need to vendor PyYAML as a dependency so as to be fully self-contained (otherwise you end up with your install tool needing an install tool to work). A proof-of-concept re-working of PyYAML has been done to see how easy it would be to potentially vendor a simpler version of the library which shows it is a possibility.

An example YAML file is:

build:
    requires:
        - setuptools
        - wheel>=0.27

configparser

An INI-style configuration file based on what configparser [9] accepts was considered. Unfortunately there is no specification of what configparser accepts, leading to support skew between versions. For instance, what ConfigParser in Python 2.7 accepts is not the same as what configparser in Python 3 accepts. While one could standardize on what Python 3 accepts and simply vendor the backport of the configparser module, that does mean this PEP would have to codify that the backport of configparser must be used by all project wishes to consume the metadata specified by this PEP. This is overly restrictive and could lead to confusion if someone is not aware of that a specific version of configparser is expected.

An example INI file is:

[build]
requires =
    setuptools
    wheel>=0.27

Python literals

Someone proposed using Python literals as the configuration format. The file would contain one dict at the top level, with the data all inside that dict, with sections defined by the keys. All Python programmers would be used to the format, there would implicitly be no third-party dependency to read the configuration data, and it can be safe if parsed by ast.literal_eval() [13]. Python literals can be identical to JSON, with the added benefit of supporting trailing commas and comments. In addition, Python’s richer data model may be useful for some future configuration needs (e.g. non-string dict keys, floating point vs. integer values).

On the other hand, python literals are a Python-specific format, and it is anticipated that these data may need to be read by packaging tools, etc. that are not written in Python.

An example Python literal file for the proposed data would be:

# The build configuration
{"build": {"requires": ["setuptools",
                        "wheel>=0.27", # note the trailing comma
                        # "numpy>=1.10" # a commented out data line
                        ]
# and here is an arbitrary comment.
           }
 }

Sticking with setup.cfg

There are two issues with setup.cfg used by setuptools as a general format. One is that they are .ini files which have issues as mentioned in the configparser discussion above. The other is that the schema for that file has never been rigorously defined and thus it’s unknown which format would be safe to use going forward without potentially confusing setuptools installations.

Other file names

Several other file names were considered and rejected (although this is very much a bikeshedding topic, and so the decision comes down to mostly taste).

pysettings.toml
Most reasonable alternative.
pypa.toml
While it makes sense to reference the PyPA [11], it is a somewhat niche term. It’s better to have the file name make sense without having domain-specific knowledge.
pybuild.toml
From the restrictive perspective of this PEP this filename makes sense, but if any non-build metadata ever gets added to the file then the name ceases to make sense.
pip.toml
Too tool-specific.
meta.toml
Too generic; project may want to have its own metadata file.
setup.toml
While keeping with traditional thanks to setup.py, it does not necessarily match what the file may contain in the future (e.g. is knowing the name of a project inherently part of its setup?).
pymeta.toml
Not obvious to newcomers to programming and/or Python.
pypackage.toml & pypackaging.toml
Name conflation of what a “package” is (project versus namespace).
pydevelop.toml
The file may contain details not specific to development.
pysource.toml
Not directly related to source code.
pytools.toml
Misleading as the file is (currently) aimed at project management.
dstufft.toml
Too person-specific. ;)

References


Source: https://github.com/python/peps/blob/main/pep-0518.txt

Last modified: 2022-09-14 15:48:22 GMT