PEP 669 – Low Impact Monitoring for CPython
- Author:
- Mark Shannon <mark at hotpy.org>
- Status:
- Draft
- Type:
- Standards Track
- Created:
- 18-Aug-2021
- Post-History:
- 07-Dec-2021
Abstract
Using a profiler or debugger in CPython can have a severe impact on performance. Slowdowns by an order of magnitude are common.
This PEP proposes an API for monitoring Python programs running on CPython that will enable monitoring at low cost.
Although this PEP does not specify an implementation, it is expected that it will be implemented using the quickening step of PEP 659.
A sys.monitoring
namespace will be added, which will contain
the relevant functions and enum.
Motivation
Developers should not have to pay an unreasonable cost to use debuggers, profilers and other similar tools.
C++ and Java developers expect to be able to run a program at full speed (or very close to it) under a debugger. Python developers should expect that too.
Rationale
The quickening mechanism provided by PEP 659 provides a way to dynamically modify executing Python bytecode. These modifications have little cost beyond the parts of the code that are modified and a relatively low cost to those parts that are modified. We can leverage this to provide an efficient mechanism for monitoring that was not possible in 3.10 or earlier.
By using quickening, we expect that code run under a debugger on 3.12 should outperform code run without a debugger on 3.11. Profiling will still slow down execution, but by much less than in 3.11.
Specification
Monitoring of Python programs is done by registering callback functions for events and by activating a set of events.
Activating events and registering callback functions are independent of each other.
Both registering callbacks and activating events are done on a per-tool basis. It is possible to have multiple tools that respond to different sets of events.
Note that, unlike sys.settrace()
, events and callbacks are per interpreter, not per thread.
Events
As a code object executes various events occur that might be of interest to tools. By activating events and by registering callback functions tools can respond to these events in any way that suits them. Events can be set globally, or for individual code objects.
For 3.12, CPython will support the following events:
- PY_START: Start of a Python function (occurs immediately after the call, the callee’s frame will be on the stack)
- PY_RESUME: Resumption of a Python function (for generator and coroutine functions), except for throw() calls.
- PY_THROW: A Python function is resumed by a throw() call.
- PY_RETURN: Return from a Python function (occurs immediately before the return, the callee’s frame will be on the stack).
- PY_YIELD: Yield from a Python function (occurs immediately before the yield, the callee’s frame will be on the stack).
- PY_UNWIND: Exit from a Python function during exception unwinding.
- C_CALL: Call to any callable, except Python functions (before the call in this case).
- C_RETURN: Return from any callable, except Python functions (after the return in this case).
- RAISE: An exception is raised.
- EXCEPTION_HANDLED: An exception is handled.
- LINE: An instruction is about to be executed that has a different line number from the preceding instruction.
- INSTRUCTION – A VM instruction is about to be executed.
- JUMP – An unconditional jump in the control flow graph is reached.
- BRANCH – A conditional branch is about to be taken (or not).
- MARKER – A marker is hit
More events may be added in the future.
All events will be attributes of the Event
enum in sys.monitoring
:
class Event(enum.IntFlag):
PY_CALL = ...
Note that Event
is an IntFlag
which means that the events can be or-ed
together to form a set of events.
Tool identifiers
The VM can support up to 6 tools at once. Before registering or activating events, a tool should choose an identifier. Identifiers are integers in the range 0 to 5.
sys.monitoring.use_tool_id(id, name:str) -> None
sys.monitoring.free_tool_id(id) -> None
sys.monitoring.get_tool(id) -> str | None
sys.monitoring.use_tool_id
raises a ValueError
if id
is in use.
sys.monitoring.get_tool
returns the name of the tool if id
is in use,
otherwise it returns None
.
All IDs are treated the same by the VM with regard to events, but the following IDs are pre-defined to make co-operation of tools easier:
sys.monitoring.DEBUGGER_ID = 0
sys.monitoring.COVERAGE_ID = 1
sys.monitoring.PROFILER_ID = 2
sys.monitoring.OPTIMIZER_ID = 3
There is no obligation to set an ID, nor is there anything preventing a tool from using an ID even it is already in use. However, tool are encouraged to use a unique ID and respect other tools.
For example, if a debugger were attached and DEBUGGER_ID
were in use, it should
report an error, rather than carrying on regardless.
The OPTIMIZER_ID
is provided for tools like Cinder or PyTorch
that want to optimize Python code, but need to decide what to
optimize in a way that depends on some wider context.
Setting events globally
Events can be controlled globally by modifying the set of events being monitored:
sys.monitoring.get_events(tool_id:int)->Event
Returns theEvent
set for all the active events.sys.monitoring.set_events(tool_id:int, event_set: Event)
Activates all events which are set inevent_set
.
No events are active by default.
Per code object events
Events can also be controlled on a per code object basis:
sys.monitoring.get_local_events(tool_id:int, code: CodeType)->Event
Returns theEvent
set for all the local events forcode
sys.monitoring.set_local_events(tool_id:int, code: CodeType, event_set: Event)
Activates all the local events forcode
which are set inevent_set
.
Local events add to global events, but do not mask them. In other words, all global events will trigger for a code object, regardless of the local events.
Register callback functions
To register a callable for events call:
sys.monitoring.register_callback(tool_id:int, event: Event, func: Callable | None) -> Callable | None
If another callback was registered for the given tool_id
and event
,
it is unregistered and returned.
Otherwise register_callback
returns None
.
Functions can be unregistered by calling
sys.monitoring.register_callback(tool_id, event, None)
.
Callback functions can be registered and unregistered at any time.
Registering or unregistering a callback function will generate a sys.audit
event.
Callback function arguments
When an active event occurs, the registered callback function is called. Different events will provide the callback function with different arguments, as follows:
- All events starting with
PY_
:func(code: CodeType, instruction_offset: int) -> DISABLE | Any
C_CALL
andC_RETURN
:func(code: CodeType, instruction_offset: int, callable: object) -> DISABLE | Any
RAISE
andEXCEPTION_HANDLED
:func(code: CodeType, instruction_offset: int, exception: BaseException) -> DISABLE | Any
LINE
:func(code: CodeType, line_number: int) -> DISABLE | Any
BRANCH
:func(code: CodeType, instruction_offset: int, destination_offset: int) -> DISABLE | Any
Note that the
destination_offset
is where the code will next execute. For an untaken branch this will be the offset of the instruction following the branch.INSTRUCTION
:func(code: CodeType, instruction_offset: int) -> DISABLE | Any
MARKER
:func(code: CodeType, instruction_offset: int) -> DISABLE | Any
If a callback function returns DISABLE
, then that function will no longer
be called for that (code, instruction_offset)
until
sys.monitoring.restart_events()
is called.
This feature is provided for coverage and other tools that are only interested
seeing an event once.
Note that sys.monitoring.restart_events()
is not specific to one tool,
so tools must be prepared to receive events that they have chosen to DISABLE.
Events in callback functions
Events are suspended in callback functions and their callees for the tool that registered that callback.
That means that other tools will see events in the callback functions for other tools. This could be useful for debugging a profiling tool, but would produce misleading profiles, as the debugger tool would show up in the profile.
Inserting and removing markers
Two new functions are added to the sys
module to support markers.
sys.monitoring.insert_marker(tool_id: int, code: CodeType, offset: int)
sys.monitoring.remove_marker(tool_id: int, code: CodeType, offset: int)
A single code object may not have more than 255 markers at once.
sys.monitoring.insert_marker
raises a ValueError
if this limit
is exceeded.
Order of events
If an instructions triggers several events they occur in the following order:
- MARKER
- INSTRUCTION
- LINE
- All other events (only one of these events can occur per instruction)
Each event is delivered to tools in ascending order of ID.
Attributes of the sys.monitoring
namespace
class Event(enum.IntFlag)
def use_tool_id(id)->None
def free_tool_id(id)->None
def get_events(tool_id: int)->Event
def set_events(tool_id: int, event_set: Event)->None
def get_local_events(tool_id: int, code: CodeType)->Event
def set_local_events(tool_id: int, code: CodeType, event_set: Event)->None
def register_callback(tool_id: int, event: Event, func: Callable)->Optional[Callable]
def insert_marker(tool_id: int, code: CodeType, offset: Event)->None
def remove_marker(tool_id: int, code: CodeType, offset: Event)->None
def restart_events()->None
DISABLE: object
Access to “debug only” features
Some features of the standard library are not accessible to normal code, but are accessible to debuggers. For example, setting local variables, or the line number.
These features will be available to callback functions.
Backwards Compatibility
This PEP is mostly backwards compatible.
There are some compatibility issues with PEP 523, as the behavior
of PEP 523 plugins is outside of the VM’s control.
It is up to PEP 523 plugins to ensure that they respect the semantics
of this PEP. Simple plugins that do not change the state of the VM, and
defer execution to _PyEval_EvalFrameDefault()
should continue to work.
sys.settrace()
and sys.setprofile()
will act as if they were tools
6 and 7 respectively, so can be used along side this PEP.
This means that sys.settrace()
and sys.setprofile()
may not work
correctly with all PEP 523 plugins. Although, simple PEP 523
plugins, as described above, should be fine.
Performance
If no events are active, this PEP should have a small positive impact on
performance. Experiments show between 1 and 2% speedup from not supporting
sys.settrace()
directly.
The performance of sys.settrace()
will be worse.
The performance of sys.setprofile()
should be better.
However, tools relying on sys.settrace()
and
sys.setprofile()
can be made a lot faster by using the
API provided by this PEP.
If a small set of events are active, e.g. for a debugger, then the overhead
of callbacks will be orders of magnitudes less than for sys.settrace()
and much cheaper than using PEP 523.
Coverage tools can be implemented at very low cost,
by returning DISABLE
in all callbacks.
For heavily instrumented code, e.g. using LINE
, performance should be
better than sys.settrace
, but not by that much as performance will be
dominated by the time spent in callbacks.
For optimizing virtual machines, such as future versions of CPython
(and PyPy
should they choose to support this API), changes to the set
active events in the midst of a long running program could be quite
expensive, possibly taking hundreds of milliseconds as it triggers
de-optimizations. Once such de-optimization has occurred, performance should
recover as the VM can re-optimize the instrumented code.
In general these operations can be considered to be fast:
def get_events(tool_id: int)->Event
def get_local_events(tool_id: int, code: CodeType)->Event
def register_callback(tool_id: int, event: Event, func: Callable)->Optional[Callable]
def get_tool(tool_id) -> str | None
These operations are slower, but not especially so:
def set_local_events(tool_id: int, code: CodeType, event_set: Event)->None
def insert_marker(tool_id: int, code: CodeType, offset: Event)->None
def remove_marker(tool_id: int, code: CodeType, offset: Event)->None
And these operations should be regarded as slow:
def use_tool_id(id, name:str)->None
def free_tool_id(id)->None
def set_events(tool_id: int, event_set: Event)->None
def restart_events()->None
How slow the slow operations are depends on when then happen. If done early in the program, before modules are loaded, they should be fairly inexpensive.
Memory Consumption
When not in use, this PEP will have a neglible change on memory consumption.
How memory is used is very much an implementation detail. However, we expect that for 3.12 the additional memory consumption per code object will be roughly as follows:
Events | |||
---|---|---|---|
Tools | Others | LINE | INSTRUCTION |
One | None | ≈40% | ≈80% |
Two or more | ≈40% | ≈120% | ≈200% |
Security Implications
Allowing modification of running code has some security implications, but no more than the ability to generate and call new code.
All the new functions listed above will trigger audit hooks.
Implementation
This outlines the proposed implementation for CPython 3.12. The actual implementation for later versions of CPython and other Python implementations may differ considerably.
The proposed implementation of this PEP will be built on top of the quickening step of CPython 3.11, as described in PEP 659. Instrumentation works in much the same way as quickening, bytecodes are replaced with instrumented ones as needed.
For example, if the C_CALL
event is turned on,
then all call instructions will be
replaced with a INSTRUMENTED_CALL
instruction.
Note that this will interfere with specialization, which will result in some performance degradation in addition to the overhead of calling the registered callable.
When the set of active events changes, the VM will immediately update all code objects present on the call stack of any thread. It will also set in place traps to ensure that all code objects are correctly instrumented when called. Consequently changing the set of active events should be done as infrequently as possible, as it could be quite an expensive operation.
Other events, such as RAISE
can be turned on or off cheaply,
as they do not rely on code instrumentation, but runtime checks when the
underlying event occurs.
The exact set of events that require instrumentation is an implementation detail, but for the current design, the following events will require instrumentation:
- PY_START
- PY_RESUME
- PY_RETURN
- PY_YIELD
- C_CALL
- C_RETURN
- LINE
- INSTRUCTION
- JUMP
- BRANCH
Each instrumented bytecode will require an additional 8 bits of information to
note which tool the instrumentation applies to.
LINE
and INSTRUCTION
events require additional information, as they
need to store the original instruction, or even the instrumented instruction
if they overlap other instrumentation.
Implementing tools
It is the philosophy of this PEP that it should be possible for third-party monitoring tools to achieve high-performance, not that it should be easy for them to do so.
Converting events into data that is meaningful to the users is the responsibility of the tool.
All events have a cost, and tools should attempt to the use set of events that trigger the least often and still provide the necessary information.
Debuggers
Inserting breakpoints
Breakpoints can be inserted by using markers. For example:
sys.monitoring.insert_marker(code, offset)
Which will insert a marker at offset
in code
,
which can be used as a breakpoint.
To insert a breakpoint at a given line, the matching instruction offsets
should be found from code.co_lines()
.
Breakpoints can be removed by removing the marker:
sys.monitoring.remove_marker(code, offset)
Stepping
Debuggers usually offer the ability to step execution by a single instruction or line.
This can be implemented by inserting a new marker at the required offset(s) of the code to be stepped to, and by removing the current marker.
It is the job of the debugger to compute the relevant offset(s).
Attaching
Debuggers can use the PY_CALL
, etc. events to be informed when
a code object is first encountered, so that any necessary breakpoints
can be inserted.
Coverage Tools
Coverage tools need to track which parts of the control graph have been
executed. To do this, they need to register for the PY_
events,
plus JUMP
and BRANCH
.
This information can be then be converted back into a line based report after execution has completed.
Profilers
Simple profilers need to gather information about calls. To do this profilers should register for the following events:
- PY_CALL
- PY_RESUME
- PY_THROW
- PY_RETURN
- PY_YIELD
- PY_UNWIND
- C_CALL
- C_RETURN
Line based profilers
Line based profilers can use the LINE
and JUMP
events.
Implementers of profilers should be aware that instrumenting LINE
events will have a large impact on performance.
备注
Instrumenting profilers have significant overhead and will distort the results of profiling. Unless you need exact call counts, consider using a statistical profiler.
Rejected ideas
A draft version of this PEP proposed making the user responsible for inserting the monitoring instructions, rather than have VM do it. However, that puts too much of a burden on the tools, and would make attaching a debugger nearly impossible.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/pep-0669.rst
Last modified: 2022-10-07 23:42:35 GMT