PEP 267 – Optimized Access to Module Namespaces
- Author:
- Jeremy Hylton <jeremy at alum.mit.edu>
- Status:
- Deferred
- Type:
- Standards Track
- Created:
- 23-May-2001
- Python-Version:
- 2.2
- Post-History:
Deferral
While this PEP is a nice idea, no-one has yet emerged to do the work of hashing out the differences between this PEP, PEP 266 and PEP 280. Hence, it is being deferred.
Abstract
This PEP proposes a new implementation of global module namespaces and the builtin namespace that speeds name resolution. The implementation would use an array of object pointers for most operations in these namespaces. The compiler would assign indices for global variables and module attributes at compile time.
The current implementation represents these namespaces as dictionaries. A global name incurs a dictionary lookup each time it is used; a builtin name incurs two dictionary lookups, a failed lookup in the global namespace and a second lookup in the builtin namespace.
This implementation should speed Python code that uses module-level functions and variables. It should also eliminate awkward coding styles that have evolved to speed access to these names.
The implementation is complicated because the global and builtin namespaces can be modified dynamically in ways that are impossible for the compiler to detect. (Example: A module’s namespace is modified by a script after the module is imported.) As a result, the implementation must maintain several auxiliary data structures to preserve these dynamic features.
Introduction
This PEP proposes a new implementation of attribute access for module objects that optimizes access to module variables known at compile time. The module will store these variables in an array and provide an interface to lookup attributes using array offsets. For globals, builtins, and attributes of imported modules, the compiler will generate code that uses the array offsets for fast access.
[describe the key parts of the design: dlict, compiler support, stupid name trick workarounds, optimization of other module’s globals]
The implementation will preserve existing semantics for module namespaces, including the ability to modify module namespaces at runtime in ways that affect the visibility of builtin names.
DLict design
The namespaces are implemented using a data structure that has
sometimes gone under the name dlict
. It is a dictionary that has
numbered slots for some dictionary entries. The type must be
implemented in C to achieve acceptable performance. The new
type-class unification work should make this fairly easy. The
DLict
will presumably be a subclass of dictionary with an
alternate storage module for some keys.
A Python implementation is included here to illustrate the basic design:
"""A dictionary-list hybrid"""
import types
class DLict:
def __init__(self, names):
assert isinstance(names, types.DictType)
self.names = {}
self.list = [None] * size
self.empty = [1] * size
self.dict = {}
self.size = 0
def __getitem__(self, name):
i = self.names.get(name)
if i is None:
return self.dict[name]
if self.empty[i] is not None:
raise KeyError, name
return self.list[i]
def __setitem__(self, name, val):
i = self.names.get(name)
if i is None:
self.dict[name] = val
else:
self.empty[i] = None
self.list[i] = val
self.size += 1
def __delitem__(self, name):
i = self.names.get(name)
if i is None:
del self.dict[name]
else:
if self.empty[i] is not None:
raise KeyError, name
self.empty[i] = 1
self.list[i] = None
self.size -= 1
def keys(self):
if self.dict:
return self.names.keys() + self.dict.keys()
else:
return self.names.keys()
def values(self):
if self.dict:
return self.names.values() + self.dict.values()
else:
return self.names.values()
def items(self):
if self.dict:
return self.names.items()
else:
return self.names.items() + self.dict.items()
def __len__(self):
return self.size + len(self.dict)
def __cmp__(self, dlict):
c = cmp(self.names, dlict.names)
if c != 0:
return c
c = cmp(self.size, dlict.size)
if c != 0:
return c
for i in range(len(self.names)):
c = cmp(self.empty[i], dlict.empty[i])
if c != 0:
return c
if self.empty[i] is None:
c = cmp(self.list[i], dlict.empty[i])
if c != 0:
return c
return cmp(self.dict, dlict.dict)
def clear(self):
self.dict.clear()
for i in range(len(self.names)):
if self.empty[i] is None:
self.empty[i] = 1
self.list[i] = None
def update(self):
pass
def load(self, index):
"""dlict-special method to support indexed access"""
if self.empty[index] is None:
return self.list[index]
else:
raise KeyError, index # XXX might want reverse mapping
def store(self, index, val):
"""dlict-special method to support indexed access"""
self.empty[index] = None
self.list[index] = val
def delete(self, index):
"""dlict-special method to support indexed access"""
self.empty[index] = 1
self.list[index] = None
Compiler issues
The compiler currently collects the names of all global variables in a module. These are names bound at the module level or bound in a class or function body that declares them to be global.
The compiler would assign indices for each global name and add the names and indices of the globals to the module’s code object. Each code object would then be bound irrevocably to the module it was defined in. (Not sure if there are some subtle problems with this.)
For attributes of imported modules, the module will store an indirection record. Internally, the module will store a pointer to the defining module and the offset of the attribute in the defining module’s global variable array. The offset would be initialized the first time the name is looked up.
Runtime model
The PythonVM will be extended with new opcodes to access globals and module attributes via a module-level array.
A function object would need to point to the module that defined it in order to provide access to the module-level global array.
For module attributes stored in the dlict
(call them static
attributes), the get/delattr implementation would need to track
access to these attributes using the old by-name interface. If a
static attribute is updated dynamically, e.g.:
mod.__dict__["foo"] = 2
The implementation would need to update the array slot instead of the backup dict.
Backwards compatibility
The dlict
will need to maintain meta-information about whether a
slot is currently used or not. It will also need to maintain a
pointer to the builtin namespace. When a name is not currently
used in the global namespace, the lookup will have to fail over to
the builtin namespace.
In the reverse case, each module may need a special accessor
function for the builtin namespace that checks to see if a global
shadowing the builtin has been added dynamically. This check
would only occur if there was a dynamic change to the module’s
dlict
, i.e. when a name is bound that wasn’t discovered at
compile-time.
These mechanisms would have little if any cost for the common case whether a module’s global namespace is not modified in strange ways at runtime. They would add overhead for modules that did unusual things with global names, but this is an uncommon practice and probably one worth discouraging.
It may be desirable to disable dynamic additions to the global namespace in some future version of Python. If so, the new implementation could provide warnings.
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/pep-0267.txt
Last modified: 2022-10-05 16:48:43 GMT