Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 661 – Sentinel Values

Author:
Tal Einat <tal at python.org>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Created:
06-Jun-2021
Post-History:
06-Jun-2021

Table of Contents

TL;DR: See the Specification and Reference Implementation.

Abstract

Unique placeholder values, commonly known as “sentinel values”, are common in programming. They have many uses, such as for:

  • Default values for function arguments, for when a value was not given:
    def foo(value=None):
        ...
    
  • Return values from functions when something is not found or unavailable:
    >>> "abc".find("d")
    -1
    
  • Missing data, such as NULL in relational databases or “N/A” (“not available”) in spreadsheets

Python has the special value None, which is intended to be used as such a sentinel value in most cases. However, sometimes an alternative sentinel value is needed, usually when it needs to be distinct from None. These cases are common enough that several idioms for implementing such sentinels have arisen over the years, but uncommon enough that there hasn’t been a clear need for standardization. However, the common implementations, including some in the stdlib, suffer from several significant drawbacks.

This PEP proposes adding a utility for defining sentinel values, to be used in the stdlib and made publicly available as part of the stdlib.

Note: Changing all existing sentinels in the stdlib to be implemented this way is not deemed necessary, and whether to do so is left to the discretion of the maintainers.

Motivation

In May 2021, a question was brought up on the python-dev mailing list [1] about how to better implement a sentinel value for traceback.print_exception. The existing implementation used the following common idiom:

_sentinel = object()

However, this object has an uninformative and overly verbose repr, causing the function’s signature to be overly long and hard to read:

>>> help(traceback.print_exception)
Help on function print_exception in module traceback:

print_exception(exc, /, value=<object object at
0x000002825DF09650>, tb=<object object at 0x000002825DF09650>,
limit=None, file=None, chain=True)

Additionally, two other drawbacks of many existing sentinels were brought up in the discussion:

  1. Not having a distinct type, hence it being impossible to define clear type signatures for functions with sentinels as default values
  2. Incorrect behavior after being copied or unpickled, due to a separate instance being created and thus comparisons using is failing

In the ensuing discussion, Victor Stinner supplied a list of currently used sentinel values in the Python standard library [2]. This showed that the need for sentinels is fairly common, that there are various implementation methods used even within the stdlib, and that many of these suffer from at least one of the three aforementioned drawbacks.

The discussion did not lead to any clear consensus on whether a standard implementation method is needed or desirable, whether the drawbacks mentioned are significant, nor which kind of implementation would be good. The author of this PEP created an issue on bugs.python.org [3] suggesting options for improvement, but that focused on only a single problematic aspect of a few cases, and failed to gather any support.

A poll [4] was created on discuss.python.org to get a clearer sense of the community’s opinions. The poll’s results were not conclusive, with 40% voting for “The status-quo is fine / there’s no need for consistency in this”, but most voters voting for one or more standardized solutions. Specifically, 37% of the voters chose “Consistent use of a new, dedicated sentinel factory / class / meta-class, also made publicly available in the stdlib”.

With such mixed opinions, this PEP was created to facilitate making a decision on the subject.

While working on this PEP, iterating on various options and implementations and continuing discussions, the author has come to the opinion that a simple, good implementation available in the standard library would be worth having, both for use in the standard library itself and elsewhere.

Rationale

The criteria guiding the chosen implementation were:

  1. The sentinel objects should behave as expected by a sentinel object: When compared using the is operator, it should always be considered identical to itself but never to any other object.
  2. Creating a sentinel object should be a simple, straightforward one-liner.
  3. It should be simple to define as many distinct sentinel values as needed.
  4. The sentinel objects should have a clear and short repr.
  5. It should be possible to use clear type signatures for sentinels.
  6. The sentinel objects should behave correctly after copying and/or unpickling.
  7. Such sentinels should work when using CPython 3.x and PyPy3, and ideally also with other implementations of Python.
  8. As simple and straightforward as possible, in implementation and especially in use. Avoid this becoming one more special thing to learn when learning Python. It should be easy to find and use when needed, and obvious enough when reading code that one would normally not feel a need to look up its documentation.

With so many uses in the Python standard library [2], it would be useful to have an implementation in the standard library, since the stdlib cannot use implementations of sentinel objects available elsewhere (such as the sentinels [5] or sentinel [6] PyPI packages).

After researching existing idioms and implementations, and going through many different possible implementations, an implementation was written which meets all of these criteria (see Reference Implementation).

Specification

A new Sentinel class will be added to a new sentinels module. Its initializer will accept a single required argument, the name of the sentinel object, and two optional arguments: the repr of the object, and the name of its module:

>>> from sentinel import Sentinel
>>> NotGiven = Sentinel('NotGiven')
>>> NotGiven
<NotGiven>
>>> MISSING = Sentinel('MISSING', repr='mymodule.MISSING')
>>> MISSING
mymodule.MISSING
>>> MEGA = Sentinel('MEGA', repr='<MEGA>', module_name='mymodule')
<MEGA>

Checking if a value is such a sentinel should be done using the is operator, as is recommended for None. Equality checks using == will also work as expected, returning True only when the object is compared with itself. Identity checks such as if value is MISSING: should usually be used rather than boolean checks such as if value: or if not value:. Sentinel instances are truthy by default.

The names of sentinels are unique within each module. When calling Sentinel() in a module where a sentinel with that name was already defined, the existing sentinel with that name will be returned. Sentinels with the same name in different modules will be distinct from each other.

Creating a copy of a sentinel object, such as by using copy.copy() or by pickling and unpickling, will return the same object.

Type annotations for sentinel values should use Sentinel. For example:

def foo(value: int | Sentinel = MISSING) -> int:
    ...

The module_name optional argument should normally not need to be supplied, as Sentinel() will usually be able to recognize the module in which it was called. module_name should be supplied only in unusual cases when this automatic recognition does not work as intended, such as perhaps when using Jython or IronPython. This parallels the designs of Enum and namedtuple. For more details, see PEP 435.

The Sentinel class may be sub-classed. Instances of each sub-class will be unique, even if using the same name and module. This allows for customizing the behavior of sentinels, such as controlling their truthiness.

Reference Implementation

The reference implementation is found in a dedicated GitHub repo [7]. A simplified version follows:

_registry = {}

class Sentinel:
    """Unique sentinel values."""

    def __new__(cls, name, repr=None, module_name=None):
        name = str(name)
        repr = str(repr) if repr else f'<{name.split(".")[-1]}>'
        if module_name is None:
            try:
                module_name = \
                    sys._getframe(1).f_globals.get('__name__', '__main__')
            except (AttributeError, ValueError):
                module_name = __name__

        registry_key = f'{module_name}-{name}'

        sentinel = _registry.get(registry_key, None)
        if sentinel is not None:
            return sentinel

        sentinel = super().__new__(cls)
        sentinel._name = name
        sentinel._repr = repr
        sentinel._module_name = module_name

        return _registry.setdefault(registry_key, sentinel)

    def __repr__(self):
        return self._repr

    def __reduce__(self):
        return (
            self.__class__,
            (
                self._name,
                self._repr,
                self._module_name,
            ),
        )

Rejected Ideas

Use NotGiven = object()

This suffers from all of the drawbacks mentioned in the Rationale section.

Add a single new sentinel value, such as MISSING or Sentinel

Since such a value could be used for various things in various places, one could not always be confident that it would never be a valid value in some use cases. On the other hand, a dedicated and distinct sentinel value can be used with confidence without needing to consider potential edge-cases.

Additionally, it is useful to be able to provide a meaningful name and repr for a sentinel value, specific to the context where it is used.

Finally, this was a very unpopular option in the poll [4], with only 12% of the votes voting for it.

Use the existing Ellipsis sentinel value

This is not the original intended use of Ellipsis, though it has become increasingly common to use it to define empty class or function blocks instead of using pass.

Also, similar to a potential new single sentinel value, Ellipsis can’t be as confidently used in all cases, unlike a dedicated, distinct value.

Use a single-valued enum

The suggested idiom is:

class NotGivenType(Enum):
    NotGiven = 'NotGiven'
NotGiven = NotGivenType.NotGiven

Besides the excessive repetition, the repr is overly long: <NotGivenType.NotGiven: 'NotGiven'>. A shorter repr can be defined, at the expense of a bit more code and yet more repetition.

Finally, this option was the least popular among the nine options in the poll [4], being the only option to receive no votes.

A sentinel class decorator

The suggested idiom is:

@sentinel(repr='<NotGiven>')
class NotGivenType: pass
NotGiven = NotGivenType()

While this allows for a very simple and clear implementation of the decorator, the idiom is too verbose, repetitive, and difficult to remember.

Using class objects

Since classes are inherently singletons, using a class as a sentinel value makes sense and allows for a simple implementation.

The simplest version of this is:

class NotGiven: pass

To have a clear repr, one would need to use a meta-class:

class NotGiven(metaclass=SentinelMeta): pass

… or a class decorator:

@Sentinel
class NotGiven: pass

Using classes this way is unusual and could be confusing. The intention of code would be hard to understand without comments. It would also cause such sentinels to have some unexpected and undesirable behavior, such as being callable.

Specific type signatures for each sentinel value

For a long time, the author of this PEP strove to have type signatures for such sentinels that were specific to each value. A leading proposal (supported by Guido and others) was to expand the use of Literal, e.g. Literal[MISSING]. After much thought and discussion, especially on the typing-sig mailing list [8], it seems that all such solutions would require special-casing and/or added complexity in the implementations of static type checkers, while also constraining the implementation of sentinels.

Therefore, this PEP no longer proposes such signatures. Instead, this PEP suggests using Sentinel as the type signature for sentinel values.

It is somewhat unfortunate that static type checkers will sometimes not be able to deduce more specific types due to this, such as inside a conditional block like if value is not MISSING: .... However, this is a minor issue in practice, as type checkers can be easily made to understand these cases using typing.cast().

Additional Notes

  • This PEP and the initial implementation are drafted in a dedicated GitHub repo [7].
  • For sentinels defined in a class scope, to avoid potential name clashes, one should use the fully-qualified name of the variable in the module. Only the part of the name after the last period will be used for the default repr. For example:
    >>> class MyClass:
    ...    NotGiven = sentinel('MyClass.NotGiven')
    >>> MyClass.NotGiven
    <NotGiven>
    
  • One should be careful when creating sentinels in a function or method, since sentinels with the same name created by code in the same module will be identical. If distinct sentinel objects are needed, make sure to use distinct names.

References


Source: https://github.com/python/peps/blob/main/pep-0661.rst

Last modified: 2022-09-08 18:02:39 GMT