PEP 519 – Adding a file system path protocol
- Author:
- Brett Cannon <brett at python.org>, Koos Zevenhoven <k7hoven at gmail.com>
- Status:
- Final
- Type:
- Standards Track
- Created:
- 11-May-2016
- Python-Version:
- 3.6
- Post-History:
- 11-May-2016, 12-May-2016, 13-May-2016
- Resolution:
- Python-Dev message
Abstract
This PEP proposes a protocol for classes which represent a file system
path to be able to provide a str
or bytes
representation.
Changes to Python’s standard library are also proposed to utilize this
protocol where appropriate to facilitate the use of path objects where
historically only str
and/or bytes
file system paths are
accepted. The goal is to facilitate the migration of users towards
rich path objects while providing an easy way to work with code
expecting str
or bytes
.
Rationale
Historically in Python, file system paths have been represented as
strings or bytes. This choice of representation has stemmed from C’s
own decision to represent file system paths as
const char *
[3]. While that is a totally serviceable
format to use for file system paths, it’s not necessarily optimal. At
issue is the fact that while all file system paths can be represented
as strings or bytes, not all strings or bytes represent a file system
path. This can lead to issues where any e.g. string duck-types to a
file system path whether it actually represents a path or not.
To help elevate the representation of file system paths from their
representation as strings and bytes to a richer object representation,
the pathlib module [4] was provisionally introduced in
Python 3.4 through PEP 428. While considered by some as an improvement
over strings and bytes for file system paths, it has suffered from a
lack of adoption. Typically the key issue listed for the low adoption
rate has been the lack of support in the standard library. This lack
of support required users of pathlib to manually convert path objects
to strings by calling str(path)
which many found error-prone.
One issue in converting path objects to strings comes from
the fact that the only generic way to get a string representation of
the path was to pass the object to str()
. This can pose a
problem when done blindly as nearly all Python objects have some
string representation whether they are a path or not, e.g.
str(None)
will give a result that
builtins.open()
[5] will happily use to create a new
file.
Exacerbating this whole situation is the
DirEntry
object [8]. While path objects have a
representation that can be extracted using str()
, DirEntry
objects expose a path
attribute instead. Having no common
interface between path objects, DirEntry
, and any other
third-party path library has become an issue. A solution that allows
any path-representing object to declare that it is a path and a way
to extract a low-level representation that all path objects could
support is desired.
This PEP then proposes to introduce a new protocol to be followed by objects which represent file system paths. Providing a protocol allows for explicit signaling of what objects represent file system paths as well as a way to extract a lower-level representation that can be used with older APIs which only support strings or bytes.
Discussions regarding path objects that led to this PEP can be found in multiple threads on the python-ideas mailing list archive [1] for the months of March and April 2016 and on the python-dev mailing list archives [2] during April 2016.
Proposal
This proposal is split into two parts. One part is the proposal of a protocol for objects to declare and provide support for exposing a file system path representation. The other part deals with changes to Python’s standard library to support the new protocol. These changes will also lead to the pathlib module dropping its provisional status.
Protocol
The following abstract base class defines the protocol for an object to be considered a path object:
import abc
import typing as t
class PathLike(abc.ABC):
"""Abstract base class for implementing the file system path protocol."""
@abc.abstractmethod
def __fspath__(self) -> t.Union[str, bytes]:
"""Return the file system path representation of the object."""
raise NotImplementedError
Objects representing file system paths will implement the
__fspath__()
method which will return the str
or bytes
representation of the path. The str
representation is the
preferred low-level path representation as it is human-readable and
what people historically represent paths as.
Standard library changes
It is expected that most APIs in Python’s standard library that currently accept a file system path will be updated appropriately to accept path objects (whether that requires code or simply an update to documentation will vary). The modules mentioned below, though, deserve specific details as they have either fundamental changes that empower the ability to use path objects, or entail additions/removal of APIs.
builtins
open()
[5] will be updated to accept path objects as
well as continue to accept str
and bytes
.
os
The fspath()
function will be added with the following semantics:
import typing as t
def fspath(path: t.Union[PathLike, str, bytes]) -> t.Union[str, bytes]:
"""Return the string representation of the path.
If str or bytes is passed in, it is returned unchanged. If __fspath__()
returns something other than str or bytes then TypeError is raised. If
this function is given something that is not str, bytes, or os.PathLike
then TypeError is raised.
"""
if isinstance(path, (str, bytes)):
return path
# Work from the object's type to match method resolution of other magic
# methods.
path_type = type(path)
try:
path = path_type.__fspath__(path)
except AttributeError:
if hasattr(path_type, '__fspath__'):
raise
else:
if isinstance(path, (str, bytes)):
return path
else:
raise TypeError("expected __fspath__() to return str or bytes, "
"not " + type(path).__name__)
raise TypeError("expected str, bytes or os.PathLike object, not "
+ path_type.__name__)
The os.fsencode()
[6] and
os.fsdecode()
[7] functions will be updated to accept
path objects. As both functions coerce their arguments to
bytes
and str
, respectively, they will be updated to call
__fspath__()
if present to convert the path object to a str
or
bytes
representation, and then perform their appropriate
coercion operations as if the return value from __fspath__()
had
been the original argument to the coercion function in question.
The addition of os.fspath()
, the updates to
os.fsencode()
/os.fsdecode()
, and the current semantics of
pathlib.PurePath
provide the semantics necessary to
get the path representation one prefers. For a path object,
pathlib.PurePath
/Path
can be used. To obtain the str
or
bytes
representation without any coercion, then os.fspath()
can be used. If a str
is desired and the encoding of bytes
should be assumed to be the default file system encoding, then
os.fsdecode()
should be used. If a bytes
representation is
desired and any strings should be encoded using the default file
system encoding, then os.fsencode()
is used. This PEP recommends
using path objects when possible and falling back to string paths as
necessary and using bytes
as a last resort.
Another way to view this is as a hierarchy of file system path
representations (highest- to lowest-level): path → str → bytes. The
functions and classes under discussion can all accept objects on the
same level of the hierarchy, but they vary in whether they promote or
demote objects to another level. The pathlib.PurePath
class can
promote a str
to a path object. The os.fspath()
function can
demote a path object to a str
or bytes
instance, depending
on what __fspath__()
returns.
The os.fsdecode()
function will demote a path object to
a string or promote a bytes
object to a str
. The
os.fsencode()
function will demote a path or string object to
bytes
. There is no function that provides a way to demote a path
object directly to bytes
while bypassing string demotion.
The DirEntry
object [8] will gain an __fspath__()
method. It will return the same value as currently found on the
path
attribute of DirEntry
instances.
The Protocol ABC will be added to the os
module under the name
os.PathLike
.
os.path
The various path-manipulation functions of os.path
[9]
will be updated to accept path objects. For polymorphic functions that
accept both bytes and strings, they will be updated to simply use
os.fspath()
.
During the discussions leading up to this PEP it was suggested that
os.path
not be updated using an “explicit is better than implicit”
argument. The thinking was that since __fspath__()
is polymorphic
itself it may be better to have code working with os.path
extract
the path representation from path objects explicitly. There is also
the consideration that adding support this deep into the low-level OS
APIs will lead to code magically supporting path objects without
requiring any documentation updated, leading to potential complaints
when it doesn’t work, unbeknownst to the project author.
But it is the view of this PEP that “practicality beats purity” in this instance. To help facilitate the transition to supporting path objects, it is better to make the transition as easy as possible than to worry about unexpected/undocumented duck typing support for path objects by projects.
There has also been the suggestion that os.path
functions could be
used in a tight loop and the overhead of checking or calling
__fspath__()
would be too costly. In this scenario only
path-consuming APIs would be directly updated and path-manipulating
APIs like the ones in os.path
would go unmodified. This would
require library authors to update their code to support path objects
if they performed any path manipulations, but if the library code
passed the path straight through then the library wouldn’t need to be
updated. It is the view of this PEP and Guido, though, that this is an
unnecessary worry and that performance will still be acceptable.
pathlib
The constructor for pathlib.PurePath
and pathlib.Path
will be
updated to accept PathLike
objects. Both PurePath
and Path
will continue to not accept bytes
path representations, and so if
__fspath__()
returns bytes
it will raise an exception.
The path
attribute will be removed as this PEP makes it
redundant (it has not been included in any released version of Python
and so is not a backwards-compatibility concern).
C API
The C API will gain an equivalent function to os.fspath()
:
/*
Return the file system path representation of the object.
If the object is str or bytes, then allow it to pass through with
an incremented refcount. If the object defines __fspath__(), then
return the result of that method. All other types raise a TypeError.
*/
PyObject *
PyOS_FSPath(PyObject *path)
{
_Py_IDENTIFIER(__fspath__);
PyObject *func = NULL;
PyObject *path_repr = NULL;
if (PyUnicode_Check(path) || PyBytes_Check(path)) {
Py_INCREF(path);
return path;
}
func = _PyObject_LookupSpecial(path, &PyId___fspath__);
if (NULL == func) {
return PyErr_Format(PyExc_TypeError,
"expected str, bytes or os.PathLike object, "
"not %S",
path->ob_type);
}
path_repr = PyObject_CallFunctionObjArgs(func, NULL);
Py_DECREF(func);
if (!PyUnicode_Check(path_repr) && !PyBytes_Check(path_repr)) {
Py_DECREF(path_repr);
return PyErr_Format(PyExc_TypeError,
"expected __fspath__() to return str or bytes, "
"not %S",
path_repr->ob_type);
}
return path_repr;
}
Backwards compatibility
There are no explicit backwards-compatibility concerns. Unless an
object incidentally already defines a __fspath__()
method there is
no reason to expect the pre-existing code to break or expect to have
its semantics implicitly changed.
Libraries wishing to support path objects and a version of Python
prior to Python 3.6 and the existence of os.fspath()
can use the
idiom of
path.__fspath__() if hasattr(path, "__fspath__") else path
.
Implementation
This is the task list for what this PEP proposes to be changed in Python 3.6:
- Remove the
path
attribute from pathlib (done) - Remove the provisional status of pathlib (done)
- Add
os.PathLike
(code and docs done) - Add
PyOS_FSPath()
(code and docs done) - Add
os.fspath()
(done <done) - Update
os.fsencode()
(done) - Update
os.fsdecode()
(done) - Update
pathlib.PurePath
andpathlib.Path
(done)- Add
__fspath__()
- Add
os.PathLike
support to the constructors
- Add
- Add
__fspath__()
toDirEntry
(done) - Update
builtins.open()
(done) - Update
os.path
(done) - Add a glossary entry for “path-like” (done)
- Update “What’s New” (done)
Rejected Ideas
Other names for the protocol’s method
Various names were proposed during discussions leading to this PEP,
including __path__
, __pathname__
, and __fspathname__
. In
the end people seemed to gravitate towards __fspath__
for being
unambiguous without being unnecessarily long.
Separate str/bytes methods
At one point it was suggested that __fspath__()
only return
strings and another method named __fspathb__()
be introduced to
return bytes. The thinking is that by making __fspath__()
not be
polymorphic it could make dealing with the potential string or bytes
representations easier. But the general consensus was that returning
bytes will more than likely be rare and that the various functions in
the os module are the better abstraction to promote over direct
calls to __fspath__()
.
Providing a path
attribute
To help deal with the issue of pathlib.PurePath
not inheriting
from str
, originally it was proposed to introduce a path
attribute to mirror what os.DirEntry
provides. In the end,
though, it was determined that a protocol would provide the same
result while not directly exposing an API that most people will never
need to interact with directly.
Have __fspath__()
only return strings
Much of the discussion that led to this PEP revolved around whether
__fspath__()
should be polymorphic and return bytes
as well as
str
or only return str
. The general sentiment for this view
was that bytes
are difficult to work with due to their
inherent lack of information about their encoding and PEP 383 makes
it possible to represent all file system paths using str
with the
surrogateescape
handler. Thus, it would be better to forcibly
promote the use of str
as the low-level path representation for
high-level path objects.
In the end, it was decided that using bytes
to represent paths is
simply not going to go away and thus they should be supported to some
degree. The hope is that people will gravitate towards path objects
like pathlib and that will move people away from operating directly
with bytes
.
A generic string encoding mechanism
At one point there was a discussion of developing a generic mechanism
to extract a string representation of an object that had semantic
meaning (__str__()
does not necessarily return anything of
semantic significance beyond what may be helpful for debugging). In
the end, it was deemed to lack a motivating need beyond the one this
PEP is trying to solve in a specific fashion.
Have __fspath__ be an attribute
It was briefly considered to have __fspath__
be an attribute
instead of a method. This was rejected for two reasons. One,
historically protocols have been implemented as “magic methods” and
not “magic methods and attributes”. Two, there is no guarantee that
the lower-level representation of a path object will be pre-computed,
potentially misleading users that there was no expensive computation
behind the scenes in case the attribute was implemented as a property.
This also indirectly ties into the idea of introducing a path
attribute to accomplish the same thing. This idea has an added issue,
though, of accidentally having any object with a path
attribute
meet the protocol’s duck typing. Introducing a new magic method for
the protocol helpfully avoids any accidental opting into the protocol.
Provide specific type hinting support
There was some consideration to providing a generic typing.PathLike
class which would allow for e.g. typing.PathLike[str]
to specify
a type hint for a path object which returned a string representation.
While potentially beneficial, the usefulness was deemed too small to
bother adding the type hint class.
This also removed any desire to have a class in the typing
module
which represented the union of all acceptable path-representing types
as that can be represented with
typing.Union[str, bytes, os.PathLike]
easily enough and the hope
is users will slowly gravitate to path objects only.
Provide os.fspathb()
It was suggested that to mirror the structure of e.g.
os.getcwd()
/os.getcwdb()
, that os.fspath()
only return
str
and that another function named os.fspathb()
be
introduced that only returned bytes
. This was rejected as the
purposes of the *b()
functions are tied to querying the file
system where there is a need to get the raw bytes back. As this PEP
does not work directly with data on a file system (but which may
be), the view was taken this distinction is unnecessary. It’s also
believed that the need for only bytes will not be common enough to
need to support in such a specific manner as os.fsencode()
will
provide similar functionality.
Call __fspath__()
off of the instance
An earlier draft of this PEP had os.fspath()
calling
path.__fspath__()
instead of type(path).__fspath__(path)
. The
changed to be consistent with how other magic methods in Python are
resolved.
Acknowledgements
Thanks to everyone who participated in the various discussions related to this PEP that spanned both python-ideas and python-dev. Special thanks to Stephen Turnbull for direct feedback on early drafts of this PEP. More special thanks to Koos Zevenhoven and Ethan Furman for not only feedback on early drafts of this PEP but also helping to drive the overall discussion on this topic across the two mailing lists.
References
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/pep-0519.txt
Last modified: 2022-01-21 11:03:51 GMT