Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 420 – Implicit Namespace Packages

Author:
Eric V. Smith <eric at trueblade.com>
Status:
Final
Type:
Standards Track
Created:
19-Apr-2012
Python-Version:
3.3
Post-History:

Resolution:
Python-Dev message

Table of Contents

摘要

Namespace packages are a mechanism for splitting a single Python package across multiple directories on disk. In current Python versions, an algorithm to compute the packages __path__ must be formulated. With the enhancement proposed here, the import machinery itself will construct the list of directories that make up the package. This PEP builds upon previous work, documented in PEP 382 and PEP 402. Those PEPs have since been rejected in favor of this one. An implementation of this PEP is at [1].

术语

在此 PEP 中:

  • “package” 指的是由 Python 的导入语句定义的 Python 包。
  • “distribution” 指的是存储在 Python 包索引中,并由 distutils 或 setuptools 安装的可单独安装的 Python 模块集。
  • “vendor package” 指的是由操作系统的打包机制安装的文件组(例如,Debian 或 Redhat 软件包在 Linux 系统上的安装)。
  • “regular package” 指的是 Python 3.2 和更早版本中实现的包。
  • “portion” 指的是一个单一目录中的一组文件(可能存储在一个压缩文件中),这些文件对一个命名空间包有贡献。
  • “legacy portion” 指的是为了实现命名空间包而使用 __path__ 操作的部分。

这个 PEP 定义了一种新的包的类型,即 “namespace package”。

今天的命名空间包

Python 目前提供了 pkgutil.extend_path 来表示一个包为命名空间包。推荐的使用方法是把

from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)

放在包的 __init__.py 中。每个发行版都需要在其 __init__.py 中提供相同的内容,这样 extend_path 的调用就与软件包的哪一部分先被导入无关。因此,软件包的 __init__.py 实际上不能定义任何名字,因为它依赖于 sys.path 上的软件包片段的顺序来决定哪部分先被导入。作为一个特殊的功能,extend_path 读取名为 <packagename>.pkg 的文件,允许声明额外的部分。

setuptools 提供了一个类似的函数,名为 pkg_resources.declare_namespace ,其使用形式为

import pkg_resources
pkg_resources.declare_namespace(__name__)

在该部分的 __init__.py 中,不需要对 __path__ 进行赋值,因为 declare_namespace 通过 sys.modules 修改了软件包 __path__。作为一个特殊的功能,declare_namespace 也支持压缩文件,并在内部注册包的名称,以便将来 setuptools 对 sys.path 的添加可以正确地添加到每个包的附加部分。

setuptools 允许在发行版的 setup.py 中声明命名空间包,这样发行版的开发者就不需要自己把神奇的 __path__ 修改到 __init__.py 中。

See PEP 402’s “The Problem” section for additional motivations for namespace packages. Note that PEP 402 has been rejected, but the motivating use cases are still valid.

理论依据

目前对命名空间包的命令式方法导致了提供命名空间包的多种稍微不兼容的机制。例如,pkgutil 支持 *.pkg 文件;setuptools 则不支持。同样,setuptools 支持检查 zip 文件,并支持向其 _namespace_packages 变量添加部分内容,而 pkgutil 不支持。

命名空间包被设计为支持在多个目录中分割(因此通过多个 sys.path 条目找到)。在这种配置下,如果多个部分都提供了一个 __init__.py 文件,这并不重要,只要每个部分都正确地初始化了命名空间包。然而,Linux 发行商(包括其他人)更喜欢将独立的部分结合起来,并将它们全部安装到 相同的 文件系统目录中。这就产生了潜在的冲突,因为这些部分现在正试图在目标系统上提供 相同的 文件 – 这是许多软件包管理器所不允许的。允许隐式命名空间包意味着可以完全放弃提供 __init__.py 文件的要求,受影响的部分可以安装到一个共同的目录中,或者按照发行版的要求分割到多个目录中。

一个命名空间包不会受到固定的 __path__ 的约束,在命名空间包创建时从父路径计算出来。考虑一下标准库的 encodings 包:

  1. 假设 encodings 成为一个命名空间包。
  2. 它有时会在解释器启动时被导入,以初始化标准 io 流。
  3. 一个应用程序在启动后修改了 sys.path,并希望从新的路径条目中贡献额外的编码。
  4. 试图从 encodings 部分导入编码,该部分在步骤 3 中添加的路径条目中发现。

如果导入系统被限制为只能沿着 sys.path 的值寻找在创建 encodings 命名空间包时存在的部分,那么在步骤 3 中添加的额外路径将永远不会被搜索到步骤 4 中导入的额外部分。此外,如果步骤 2 有时被跳过(由于某些运行时标志或其他条件),那么在步骤 3 中添加的路径项确实会在第一次导入部分时被使用。因此,本 PEP 要求在每个部分被加载时动态地计算路径项的列表。预计导入机制将通过缓存 __path__ 值来有效地完成这一工作,并且只有在检测到父路径发生变化时才刷新它们。对于像 encodings 这样的顶级包来说,这个父路径就是 sys.path

规格

常规软件包将继续有一个 __init__.py,并驻留在一个目录中。

命名空间包不能包含一个 __init__.py。因此,pkgutil.extend_pathpkg_resources.declare_namespace 对于创建命名空间包来说已经过时了。将不会有用于指定命名空间包的标记文件或目录。

在导入处理过程中,导入机制将继续遍历父路径中的每个目录,就像在 Python 3.2 中那样。在寻找名为 “foo” 的模块或包时,对于父路径中的每个目录:

  • 如果找到 <directory>/foo/__init__.py,就会导入并返回一个常规包。
  • 如果没有,但找到了 <directory>/foo.{py,pyc,so,pyd},则会导入一个模块并返回。确切的扩展列表因平台和是否指定 -O 标志而不同。这里的列表是有代表性的。
  • 如果没有,但找到了 <directory>/foo,并且是一个目录,就会被记录下来,然后继续扫描父路径中的下一个目录。
  • 否则将继续扫描父路径中的下一个目录。

如果扫描完成后没有返回一个模块或包,并且至少有一个目录被记录,那么就会创建一个命名空间包。新的命名空间包:

  • 有一个 __path__ 属性,设置为在扫描过程中发现并记录的路径字符串是可迭代的。
  • 没有 __file__ 属性。

注意,如果执行了 “import foo”,并且发现 “foo” 是一个命名空间包(使用上述规则),那么 “foo” 会立即被创建为一个包。命名空间包的创建不会被推迟到子级导入发生时。

命名空间包与常规包没有本质上的区别。它只是创建包的一种不同方式。一旦创建了命名空间包,它和常规包之间就没有功能上的区别。

动态路径的计算

导入机制将表现为命名空间包的 __path__ 在每个部分被加载之前被重新计算。

出于性能方面的考虑,预计这将通过检测父路径的变化来实现。如果没有发生变化,那么就不需要对 __path__ 重新进行计算。实现必须确保检测到父路径内容的变化,以及检测到用新的路径入口列表对象替换父路径。

对 import 发现器与加载器的影响

PEP 302 defines “finders” that are called to search path elements. These finders’ find_module methods return either a “loader” object or None.

For a finder to contribute to namespace packages, it must implement a new find_loader(fullname) method. fullname has the same meaning as for find_module. find_loader always returns a 2-tuple of (loader, <iterable-of-path-entries>). loader may be None, in which case <iterable-of-path-entries> (which may be empty) is added to the list of recorded path entries and path searching continues. If loader is not None, it is immediately used to load a module or regular package.

Even if loader is returned and is not None, <iterable-of-path-entries> must still contain the path entries for the package. This allows code such as pkgutil.extend_path() to compute path entries for packages that it does not load.

Note that multiple path entries per finder are allowed. This is to support the case where a finder discovers multiple namespace portions for a given fullname. Many finders will support only a single namespace package portion per find_loader call, in which case this iterable will contain only a single string.

The import machinery will call find_loader if it exists, else fall back to find_module. Legacy finders which implement find_module but not find_loader will be unable to contribute portions to a namespace package.

The specification expands PEP 302 loaders to include an optional method called module_repr() which if present, is used to generate module object reprs. See the section below for further details.

Differences between namespace packages and regular packages

Namespace packages and regular packages are very similar. The differences are:

  • Portions of namespace packages need not all come from the same directory structure, or even from the same loader. Regular packages are self-contained: all parts live in the same directory hierarchy.
  • Namespace packages have no __file__ attribute.
  • Namespace packages’ __path__ attribute is a read-only iterable of strings, which is automatically updated when the parent path is modified.
  • Namespace packages have no __init__.py module.
  • Namespace packages have a different type of object for their __loader__ attribute.

Namespace packages in the standard library

It is possible, and this PEP explicitly allows, that parts of the standard library be implemented as namespace packages. When and if any standard library packages become namespace packages is outside the scope of this PEP.

Migrating from legacy namespace packages

As described above, prior to this PEP pkgutil.extend_path() was used by legacy portions to create namespace packages. Because it is likely not practical for all existing portions of a namespace package to be migrated to this PEP at once, extend_path() will be modified to also recognize PEP 420 namespace packages. This will allow some portions of a namespace to be legacy portions while others are migrated to PEP 420. These hybrid namespace packages will not have the dynamic path computation that normal namespace packages have, since extend_path() never provided this functionality in the past.

Packaging Implications

Multiple portions of a namespace package can be installed into the same directory, or into separate directories. For this section, suppose there are two portions which define “foo.bar” and “foo.baz”. “foo” itself is a namespace package.

If these are installed in the same location, a single directory “foo” would be in a directory that is on sys.path. Inside “foo” would be two directories, “bar” and “baz”. If “foo.bar” is removed (perhaps by an OS package manager), care must be taken not to remove the “foo/baz” or “foo” directories. Note that in this case “foo” will be a namespace package (because it lacks an __init__.py), even though all of its portions are in the same directory.

Note that “foo.bar” and “foo.baz” can be installed into the same “foo” directory because they will not have any files in common.

If the portions are installed in different locations, two different “foo” directories would be in directories that are on sys.path. “foo/bar” would be in one of these sys.path entries, and “foo/baz” would be in the other. Upon removal of “foo.bar”, the “foo/bar” and corresponding “foo” directories can be completely removed. But “foo/baz” and its corresponding “foo” directory cannot be removed.

It is also possible to have the “foo.bar” portion installed in a directory on sys.path, and have the “foo.baz” portion provided in a zip file, also on sys.path.

示例

嵌套的命名空间包

这个例子使用了以下目录结构

Lib/test/namespace_pkgs
    project1
        parent
            child
                one.py
    project2
        parent
            child
                two.py

这里,父代和子代都是命名空间包。它们的 portion 存在于不同的目录中,并且它们没有 __init__.py 文件。

这里我们将父目录添加到 sys.path 中,并显示 portion 被正确找到

>>> import sys
>>> sys.path += ['Lib/test/namespace_pkgs/project1', 'Lib/test/namespace_pkgs/project2']
>>> import parent.child.one
>>> parent.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent'])
>>> parent.child.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child'])
>>> import parent.child.two
>>>

动态路径的计算

这个例子使用类似的目录结构,但增加了第三个部分

Lib/test/namespace_pkgs
    project1
        parent
            child
                one.py
    project2
        parent
            child
                two.py
    project3
        parent
            child
                three.py

我们在 sys.path 中添加 project1project2,然后导入 parent.child.oneparent.child.two。然后我们在 sys.path 中加入 project3,当 parent.child.three 被导入时,project3/parent 被自动添加到 parent.__path__

# add the first two parent paths to sys.path
>>> import sys
>>> sys.path += ['Lib/test/namespace_pkgs/project1', 'Lib/test/namespace_pkgs/project2']

# parent.child.one can be imported, because project1 was added to sys.path:
>>> import parent.child.one
>>> parent.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent'])

# parent.child.__path__ contains project1/parent/child and project2/parent/child, but not project3/parent/child:
>>> parent.child.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child'])

# parent.child.two can be imported, because project2 was added to sys.path:
>>> import parent.child.two

# we cannot import parent.child.three, because project3 is not in the path:
>>> import parent.child.three
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 1286, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1250, in _find_and_load_unlocked
ImportError: No module named 'parent.child.three'

# now add project3 to sys.path:
>>> sys.path.append('Lib/test/namespace_pkgs/project3')

# and now parent.child.three can be imported:
>>> import parent.child.three

# project3/parent has been added to parent.__path__:
>>> parent.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent', 'Lib/test/namespace_pkgs/project3/parent'])

# and project3/parent/child has been added to parent.child.__path__
>>> parent.child.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child', 'Lib/test/namespace_pkgs/project3/parent/child'])
>>>

讨论

At PyCon 2012, we had a discussion about namespace packages at which PEP 382 and PEP 402 were rejected, to be replaced by this PEP [3].

没有打算取消对常规包的支持。如果开发者知道其包永远不会成为命名空间包的一部分,那么它作为一个普通的包(有一个 init__.py)会有性能上的优势。当一个常规包位于路径上时,它的创建和加载可以立即进行。对于命名空间包,在创建包之前,必须对路径中的所有条目进行扫描。

注意,如果一个目录没有 init__.py 文件,将不再引发 ImportWarning。这样的目录现在将作为一个命名空间包被导入,而在之前的 Python 版本中,导入警告将被触发。

Nick Coghlan 提出了他对这一建议的反对意见清单 [4]。它们是:

  1. Implicit package directories go against the Zen of Python.
  2. Implicit package directories pose awkward backwards compatibility challenges.
  3. Implicit package directories introduce ambiguity into file system layouts.
  4. Implicit package directories will permanently entrench current newbie-hostile behavior in __main__.

Nick later gave a detailed response to his own objections [5], which is summarized here:

  1. The practicality of this PEP wins over other proposals and the status quo.
  2. Minor backward compatibility issues are okay, as long as they are properly documented.
  3. This will be addressed in PEP 395.
  4. This will also be addressed in PEP 395.

The inclusion of namespace packages in the standard library was motivated by Martin v. Löwis, who wanted the encodings package to become a namespace package [6]. While this PEP allows for standard library packages to become namespaces, it defers a decision on encodings.

find_module versus find_loader

An early draft of this PEP specified a change to the find_module method in order to support namespace packages. It would be modified to return a string in the case where a namespace package portion was discovered.

However, this caused a problem with existing code outside of the standard library which calls find_module. Because this code would not be upgraded in concert with changes required by this PEP, it would fail when it would receive unexpected return values from find_module. Because of this incompatibility, this PEP now specifies that finders that want to provide namespace portions must implement the find_loader method, described above.

The use case for supporting multiple portions per find_loader call is given in [7].

动态路径的计算

Guido raised a concern that automatic dynamic path computation was an unnecessary feature [8]. Later in that thread, PJ Eby and Nick Coghlan presented arguments as to why dynamic computation would minimize surprise to Python users. The conclusion of that discussion has been included in this PEP’s Rationale section.

An earlier version of this PEP required that dynamic path computation could only take affect if the parent path object were modified in-place. That is, this would work:

sys.path.append('new-dir')

But this would not:

sys.path = sys.path + ['new-dir']

In the same thread [8], it was pointed out that this restriction is not required. If the parent path is looked up by name instead of by holding a reference to it, then there is no restriction on how the parent path is modified or replaced. For a top-level namespace package, the lookup would be the module named "sys" then its attribute "path". For a namespace package nested inside a package foo, the lookup would be for the module named "foo" then its attribute "__path__".

Module reprs

Previously, module reprs were hard coded based on assumptions about a module’s __file__ attribute. If this attribute existed and was a string, it was assumed to be a file system path, and the module object’s repr would include this in its value. The only exception was that PEP 302 reserved missing __file__ attributes to built-in modules, and in CPython, this assumption was baked into the module object’s implementation. Because of this restriction, some modules contained contrived __file__ values that did not reflect file system paths, and which could cause unexpected problems later (e.g. os.path.join() on a non-path __file__ would return gibberish).

This PEP relaxes this constraint, and leaves the setting of __file__ to the purview of the loader producing the module. Loaders may opt to leave __file__ unset if no file system path is appropriate. Loaders may also set additional reserved attributes on the module if useful. This means that the definitive way to determine the origin of a module is to check its __loader__ attribute.

For example, namespace packages as described in this PEP will have no __file__ attribute because no corresponding file exists. In order to provide flexibility and descriptiveness in the reprs of such modules, a new optional protocol is added to PEP 302 loaders. Loaders can implement a module_repr() method which takes a single argument, the module object. This method should return the string to be used verbatim as the repr of the module. The rules for producing a module repr are now standardized as:

  • If the module has an __loader__ and that loader has a module_repr() method, call it with a single argument, which is the module object. The value returned is used as the module’s repr.
  • If an exception occurs in module_repr(), the exception is caught and discarded, and the calculation of the module’s repr continues as if module_repr() did not exist.
  • If the module has an __file__ attribute, this is used as part of the module’s repr.
  • If the module has no __file__ but does have an __loader__, then the loader’s repr is used as part of the module’s repr.
  • Otherwise, just use the module’s __name__ in the repr.

Here is a snippet showing how namespace module reprs are calculated from its loader:

class NamespaceLoader:
    @classmethod
    def module_repr(cls, module):
        return "<module '{}' (namespace)>".format(module.__name__)

Built-in module reprs would no longer need to be hard-coded, but instead would come from their loader as well:

class BuiltinImporter:
    @classmethod
    def module_repr(cls, module):
        return "<module '{}' (built-in)>".format(module.__name__)

Here are some example reprs of different types of modules with different sets of the related attributes:

>>> import email
>>> email
<module 'email' from '/home/barry/projects/python/pep-420/Lib/email/__init__.py'>
>>> m = type(email)('foo')
>>> m
<module 'foo'>
>>> m.__file__ = 'zippy:/de/do/dah'
>>> m
<module 'foo' from 'zippy:/de/do/dah'>
>>> class Loader: pass
...
>>> m.__loader__ = Loader
>>> del m.__file__
>>> m
<module 'foo' (<class '__main__.Loader'>)>
>>> class NewLoader:
...   @classmethod
...   def module_repr(cls, module):
...      return '<mystery module!>'
...
>>> m.__loader__ = NewLoader
>>> m
<mystery module!>
>>>

References


Source: https://github.com/python/peps/blob/main/pep-0420.txt

Last modified: 2022-01-21 11:03:51 GMT