PEP 420 – Implicit Namespace Packages
- Author:
- Eric V. Smith <eric at trueblade.com>
- Status:
- Final
- Type:
- Standards Track
- Created:
- 19-Apr-2012
- Python-Version:
- 3.3
- Post-History:
- Resolution:
- Python-Dev message
Table of Contents
摘要
Namespace packages are a mechanism for splitting a single Python package
across multiple directories on disk. In current Python versions, an algorithm
to compute the packages __path__
must be formulated. With the enhancement
proposed here, the import machinery itself will construct the list of
directories that make up the package. This PEP builds upon previous work,
documented in PEP 382 and PEP 402. Those PEPs have since been rejected in
favor of this one. An implementation of this PEP is at [1].
术语
在此 PEP 中:
- “package” 指的是由 Python 的导入语句定义的 Python 包。
- “distribution” 指的是存储在 Python 包索引中,并由 distutils 或 setuptools 安装的可单独安装的 Python 模块集。
- “vendor package” 指的是由操作系统的打包机制安装的文件组(例如,Debian 或 Redhat 软件包在 Linux 系统上的安装)。
- “regular package” 指的是 Python 3.2 和更早版本中实现的包。
- “portion” 指的是一个单一目录中的一组文件(可能存储在一个压缩文件中),这些文件对一个命名空间包有贡献。
- “legacy portion” 指的是为了实现命名空间包而使用
__path__
操作的部分。
这个 PEP 定义了一种新的包的类型,即 “namespace package”。
今天的命名空间包
Python 目前提供了 pkgutil.extend_path
来表示一个包为命名空间包。推荐的使用方法是把
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
放在包的 __init__.py
中。每个发行版都需要在其 __init__.py
中提供相同的内容,这样 extend_path
的调用就与软件包的哪一部分先被导入无关。因此,软件包的 __init__.py
实际上不能定义任何名字,因为它依赖于 sys.path
上的软件包片段的顺序来决定哪部分先被导入。作为一个特殊的功能,extend_path
读取名为 <packagename>.pkg
的文件,允许声明额外的部分。
setuptools 提供了一个类似的函数,名为 pkg_resources.declare_namespace
,其使用形式为
import pkg_resources
pkg_resources.declare_namespace(__name__)
在该部分的 __init__.py
中,不需要对 __path__
进行赋值,因为 declare_namespace
通过 sys.modules
修改了软件包 __path__
。作为一个特殊的功能,declare_namespace
也支持压缩文件,并在内部注册包的名称,以便将来 setuptools 对 sys.path
的添加可以正确地添加到每个包的附加部分。
setuptools 允许在发行版的 setup.py
中声明命名空间包,这样发行版的开发者就不需要自己把神奇的 __path__
修改到 __init__.py
中。
See PEP 402’s “The Problem” section for additional motivations for namespace packages. Note that PEP 402 has been rejected, but the motivating use cases are still valid.
理论依据
目前对命名空间包的命令式方法导致了提供命名空间包的多种稍微不兼容的机制。例如,pkgutil 支持 *.pkg
文件;setuptools 则不支持。同样,setuptools 支持检查 zip 文件,并支持向其 _namespace_packages
变量添加部分内容,而 pkgutil 不支持。
命名空间包被设计为支持在多个目录中分割(因此通过多个 sys.path
条目找到)。在这种配置下,如果多个部分都提供了一个 __init__.py
文件,这并不重要,只要每个部分都正确地初始化了命名空间包。然而,Linux 发行商(包括其他人)更喜欢将独立的部分结合起来,并将它们全部安装到 相同的 文件系统目录中。这就产生了潜在的冲突,因为这些部分现在正试图在目标系统上提供 相同的 文件 – 这是许多软件包管理器所不允许的。允许隐式命名空间包意味着可以完全放弃提供 __init__.py
文件的要求,受影响的部分可以安装到一个共同的目录中,或者按照发行版的要求分割到多个目录中。
一个命名空间包不会受到固定的 __path__
的约束,在命名空间包创建时从父路径计算出来。考虑一下标准库的 encodings
包:
- 假设
encodings
成为一个命名空间包。 - 它有时会在解释器启动时被导入,以初始化标准 io 流。
- 一个应用程序在启动后修改了
sys.path
,并希望从新的路径条目中贡献额外的编码。 - 试图从
encodings
部分导入编码,该部分在步骤 3 中添加的路径条目中发现。
如果导入系统被限制为只能沿着 sys.path
的值寻找在创建 encodings
命名空间包时存在的部分,那么在步骤 3 中添加的额外路径将永远不会被搜索到步骤 4 中导入的额外部分。此外,如果步骤 2 有时被跳过(由于某些运行时标志或其他条件),那么在步骤 3 中添加的路径项确实会在第一次导入部分时被使用。因此,本 PEP 要求在每个部分被加载时动态地计算路径项的列表。预计导入机制将通过缓存 __path__
值来有效地完成这一工作,并且只有在检测到父路径发生变化时才刷新它们。对于像 encodings
这样的顶级包来说,这个父路径就是 sys.path
。
规格
常规软件包将继续有一个 __init__.py
,并驻留在一个目录中。
命名空间包不能包含一个 __init__.py
。因此,pkgutil.extend_path
和 pkg_resources.declare_namespace
对于创建命名空间包来说已经过时了。将不会有用于指定命名空间包的标记文件或目录。
在导入处理过程中,导入机制将继续遍历父路径中的每个目录,就像在 Python 3.2 中那样。在寻找名为 “foo” 的模块或包时,对于父路径中的每个目录:
- 如果找到
<directory>/foo/__init__.py
,就会导入并返回一个常规包。 - 如果没有,但找到了
<directory>/foo.{py,pyc,so,pyd}
,则会导入一个模块并返回。确切的扩展列表因平台和是否指定 -O 标志而不同。这里的列表是有代表性的。 - 如果没有,但找到了
<directory>/foo
,并且是一个目录,就会被记录下来,然后继续扫描父路径中的下一个目录。 - 否则将继续扫描父路径中的下一个目录。
如果扫描完成后没有返回一个模块或包,并且至少有一个目录被记录,那么就会创建一个命名空间包。新的命名空间包:
- 有一个
__path__
属性,设置为在扫描过程中发现并记录的路径字符串是可迭代的。 - 没有
__file__
属性。
注意,如果执行了 “import foo”,并且发现 “foo” 是一个命名空间包(使用上述规则),那么 “foo” 会立即被创建为一个包。命名空间包的创建不会被推迟到子级导入发生时。
命名空间包与常规包没有本质上的区别。它只是创建包的一种不同方式。一旦创建了命名空间包,它和常规包之间就没有功能上的区别。
动态路径的计算
导入机制将表现为命名空间包的 __path__
在每个部分被加载之前被重新计算。
出于性能方面的考虑,预计这将通过检测父路径的变化来实现。如果没有发生变化,那么就不需要对 __path__
重新进行计算。实现必须确保检测到父路径内容的变化,以及检测到用新的路径入口列表对象替换父路径。
对 import 发现器与加载器的影响
PEP 302 defines “finders” that are called to search path elements.
These finders’ find_module
methods return either a “loader” object
or None
.
For a finder to contribute to namespace packages, it must implement a
new find_loader(fullname)
method. fullname
has the same
meaning as for find_module
. find_loader
always returns a
2-tuple of (loader, <iterable-of-path-entries>)
. loader
may
be None
, in which case <iterable-of-path-entries>
(which may
be empty) is added to the list of recorded path entries and path
searching continues. If loader
is not None
, it is immediately
used to load a module or regular package.
Even if loader
is returned and is not None
,
<iterable-of-path-entries>
must still contain the path entries for
the package. This allows code such as pkgutil.extend_path()
to
compute path entries for packages that it does not load.
Note that multiple path entries per finder are allowed. This is to
support the case where a finder discovers multiple namespace portions
for a given fullname
. Many finders will support only a single
namespace package portion per find_loader
call, in which case this
iterable will contain only a single string.
The import machinery will call find_loader
if it exists, else fall
back to find_module
. Legacy finders which implement
find_module
but not find_loader
will be unable to contribute
portions to a namespace package.
The specification expands PEP 302 loaders to include an optional method called
module_repr()
which if present, is used to generate module object reprs.
See the section below for further details.
Differences between namespace packages and regular packages
Namespace packages and regular packages are very similar. The differences are:
- Portions of namespace packages need not all come from the same directory structure, or even from the same loader. Regular packages are self-contained: all parts live in the same directory hierarchy.
- Namespace packages have no
__file__
attribute. - Namespace packages’
__path__
attribute is a read-only iterable of strings, which is automatically updated when the parent path is modified. - Namespace packages have no
__init__.py
module. - Namespace packages have a different type of object for their
__loader__
attribute.
Namespace packages in the standard library
It is possible, and this PEP explicitly allows, that parts of the standard library be implemented as namespace packages. When and if any standard library packages become namespace packages is outside the scope of this PEP.
Migrating from legacy namespace packages
As described above, prior to this PEP pkgutil.extend_path()
was
used by legacy portions to create namespace packages. Because it is
likely not practical for all existing portions of a namespace package
to be migrated to this PEP at once, extend_path()
will be modified
to also recognize PEP 420 namespace packages. This will allow some
portions of a namespace to be legacy portions while others are
migrated to PEP 420. These hybrid namespace packages will not have
the dynamic path computation that normal namespace packages have,
since extend_path()
never provided this functionality in the past.
Packaging Implications
Multiple portions of a namespace package can be installed into the same directory, or into separate directories. For this section, suppose there are two portions which define “foo.bar” and “foo.baz”. “foo” itself is a namespace package.
If these are installed in the same location, a single directory “foo”
would be in a directory that is on sys.path
. Inside “foo” would
be two directories, “bar” and “baz”. If “foo.bar” is removed (perhaps
by an OS package manager), care must be taken not to remove the
“foo/baz” or “foo” directories. Note that in this case “foo” will be
a namespace package (because it lacks an __init__.py
), even though
all of its portions are in the same directory.
Note that “foo.bar” and “foo.baz” can be installed into the same “foo” directory because they will not have any files in common.
If the portions are installed in different locations, two different
“foo” directories would be in directories that are on sys.path
.
“foo/bar” would be in one of these sys.path entries, and “foo/baz”
would be in the other. Upon removal of “foo.bar”, the “foo/bar” and
corresponding “foo” directories can be completely removed. But
“foo/baz” and its corresponding “foo” directory cannot be removed.
It is also possible to have the “foo.bar” portion installed in a
directory on sys.path
, and have the “foo.baz” portion provided in
a zip file, also on sys.path
.
示例
嵌套的命名空间包
这个例子使用了以下目录结构
Lib/test/namespace_pkgs
project1
parent
child
one.py
project2
parent
child
two.py
这里,父代和子代都是命名空间包。它们的 portion 存在于不同的目录中,并且它们没有 __init__.py
文件。
这里我们将父目录添加到 sys.path
中,并显示 portion 被正确找到
>>> import sys
>>> sys.path += ['Lib/test/namespace_pkgs/project1', 'Lib/test/namespace_pkgs/project2']
>>> import parent.child.one
>>> parent.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent'])
>>> parent.child.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child'])
>>> import parent.child.two
>>>
动态路径的计算
这个例子使用类似的目录结构,但增加了第三个部分
Lib/test/namespace_pkgs
project1
parent
child
one.py
project2
parent
child
two.py
project3
parent
child
three.py
我们在 sys.path
中添加 project1
和 project2
,然后导入 parent.child.one
和 parent.child.two
。然后我们在 sys.path
中加入 project3
,当 parent.child.three
被导入时,project3/parent
被自动添加到 parent.__path__
# add the first two parent paths to sys.path
>>> import sys
>>> sys.path += ['Lib/test/namespace_pkgs/project1', 'Lib/test/namespace_pkgs/project2']
# parent.child.one can be imported, because project1 was added to sys.path:
>>> import parent.child.one
>>> parent.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent'])
# parent.child.__path__ contains project1/parent/child and project2/parent/child, but not project3/parent/child:
>>> parent.child.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child'])
# parent.child.two can be imported, because project2 was added to sys.path:
>>> import parent.child.two
# we cannot import parent.child.three, because project3 is not in the path:
>>> import parent.child.three
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<frozen importlib._bootstrap>", line 1286, in _find_and_load
File "<frozen importlib._bootstrap>", line 1250, in _find_and_load_unlocked
ImportError: No module named 'parent.child.three'
# now add project3 to sys.path:
>>> sys.path.append('Lib/test/namespace_pkgs/project3')
# and now parent.child.three can be imported:
>>> import parent.child.three
# project3/parent has been added to parent.__path__:
>>> parent.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent', 'Lib/test/namespace_pkgs/project3/parent'])
# and project3/parent/child has been added to parent.child.__path__
>>> parent.child.__path__
_NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child', 'Lib/test/namespace_pkgs/project3/parent/child'])
>>>
讨论
At PyCon 2012, we had a discussion about namespace packages at which PEP 382 and PEP 402 were rejected, to be replaced by this PEP [3].
没有打算取消对常规包的支持。如果开发者知道其包永远不会成为命名空间包的一部分,那么它作为一个普通的包(有一个 init__.py
)会有性能上的优势。当一个常规包位于路径上时,它的创建和加载可以立即进行。对于命名空间包,在创建包之前,必须对路径中的所有条目进行扫描。
注意,如果一个目录没有 init__.py
文件,将不再引发 ImportWarning。这样的目录现在将作为一个命名空间包被导入,而在之前的 Python 版本中,导入警告将被触发。
Nick Coghlan 提出了他对这一建议的反对意见清单 [4]。它们是:
- Implicit package directories go against the Zen of Python.
- Implicit package directories pose awkward backwards compatibility challenges.
- Implicit package directories introduce ambiguity into file system layouts.
- Implicit package directories will permanently entrench current
newbie-hostile behavior in
__main__
.
Nick later gave a detailed response to his own objections [5], which is summarized here:
- The practicality of this PEP wins over other proposals and the status quo.
- Minor backward compatibility issues are okay, as long as they are properly documented.
- This will be addressed in PEP 395.
- This will also be addressed in PEP 395.
The inclusion of namespace packages in the standard library was
motivated by Martin v. Löwis, who wanted the encodings
package to
become a namespace package [6]. While this PEP allows for standard
library packages to become namespaces, it defers a decision on
encodings
.
find_module
versus find_loader
An early draft of this PEP specified a change to the find_module
method in order to support namespace packages. It would be modified
to return a string in the case where a namespace package portion was
discovered.
However, this caused a problem with existing code outside of the
standard library which calls find_module
. Because this code would
not be upgraded in concert with changes required by this PEP, it would
fail when it would receive unexpected return values from
find_module
. Because of this incompatibility, this PEP now
specifies that finders that want to provide namespace portions must
implement the find_loader
method, described above.
The use case for supporting multiple portions per find_loader
call
is given in [7].
动态路径的计算
Guido raised a concern that automatic dynamic path computation was an unnecessary feature [8]. Later in that thread, PJ Eby and Nick Coghlan presented arguments as to why dynamic computation would minimize surprise to Python users. The conclusion of that discussion has been included in this PEP’s Rationale section.
An earlier version of this PEP required that dynamic path computation could only take affect if the parent path object were modified in-place. That is, this would work:
sys.path.append('new-dir')
But this would not:
sys.path = sys.path + ['new-dir']
In the same thread [8], it was pointed out that this restriction is
not required. If the parent path is looked up by name instead of by
holding a reference to it, then there is no restriction on how the
parent path is modified or replaced. For a top-level namespace
package, the lookup would be the module named "sys"
then its
attribute "path"
. For a namespace package nested inside a package
foo
, the lookup would be for the module named "foo"
then its
attribute "__path__"
.
Module reprs
Previously, module reprs were hard coded based on assumptions about a module’s
__file__
attribute. If this attribute existed and was a string, it was
assumed to be a file system path, and the module object’s repr would include
this in its value. The only exception was that PEP 302 reserved missing
__file__
attributes to built-in modules, and in CPython, this assumption
was baked into the module object’s implementation. Because of this
restriction, some modules contained contrived __file__
values that did not
reflect file system paths, and which could cause unexpected problems later
(e.g. os.path.join()
on a non-path __file__
would return gibberish).
This PEP relaxes this constraint, and leaves the setting of __file__
to
the purview of the loader producing the module. Loaders may opt to leave
__file__
unset if no file system path is appropriate. Loaders may also
set additional reserved attributes on the module if useful. This means that
the definitive way to determine the origin of a module is to check its
__loader__
attribute.
For example, namespace packages as described in this PEP will have no
__file__
attribute because no corresponding file exists. In order to
provide flexibility and descriptiveness in the reprs of such modules, a new
optional protocol is added to PEP 302 loaders. Loaders can implement a
module_repr()
method which takes a single argument, the module object.
This method should return the string to be used verbatim as the repr of the
module. The rules for producing a module repr are now standardized as:
- If the module has an
__loader__
and that loader has amodule_repr()
method, call it with a single argument, which is the module object. The value returned is used as the module’s repr. - If an exception occurs in
module_repr()
, the exception is caught and discarded, and the calculation of the module’s repr continues as ifmodule_repr()
did not exist. - If the module has an
__file__
attribute, this is used as part of the module’s repr. - If the module has no
__file__
but does have an__loader__
, then the loader’s repr is used as part of the module’s repr. - Otherwise, just use the module’s
__name__
in the repr.
Here is a snippet showing how namespace module reprs are calculated from its loader:
class NamespaceLoader:
@classmethod
def module_repr(cls, module):
return "<module '{}' (namespace)>".format(module.__name__)
Built-in module reprs would no longer need to be hard-coded, but instead would come from their loader as well:
class BuiltinImporter:
@classmethod
def module_repr(cls, module):
return "<module '{}' (built-in)>".format(module.__name__)
Here are some example reprs of different types of modules with different sets of the related attributes:
>>> import email
>>> email
<module 'email' from '/home/barry/projects/python/pep-420/Lib/email/__init__.py'>
>>> m = type(email)('foo')
>>> m
<module 'foo'>
>>> m.__file__ = 'zippy:/de/do/dah'
>>> m
<module 'foo' from 'zippy:/de/do/dah'>
>>> class Loader: pass
...
>>> m.__loader__ = Loader
>>> del m.__file__
>>> m
<module 'foo' (<class '__main__.Loader'>)>
>>> class NewLoader:
... @classmethod
... def module_repr(cls, module):
... return '<mystery module!>'
...
>>> m.__loader__ = NewLoader
>>> m
<mystery module!>
>>>
References
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/pep-0420.txt
Last modified: 2022-01-21 11:03:51 GMT