Python 对 Linux 性能分析器 perf
的支持¶
- author
Pablo Galindo
Linux 的 perf
分析器是一个非常强大的工具,它允许你分析和获取关于应用程序性能的信息。perf
也有一个非常活跃的工具生态系统,可以帮助分析它产生的数据。
在 Python 应用程序中使用 perf
分析器的主要问题是,perf
只允许获取有关本机符号的信息,即用 C 编写的函数和过程的名称。这意味着代码中的 Python 函数的名称和文件名不会出现在 perf
的输出中。
从 Python 3.12 开始,解释器可以一种特殊的模式运行,该模式允许 Python 函数出现在性能分析器的输出中。当启用此模式时,解释器将在每个 Python 函数执行之前插入一小段动态编译的代码,并使用 perf map files 将这段代码与相关的 Python 函数之间的关系教授给 perf
。
警告
对性能分析器 perf
的支持目前只适用于选定架构上的 Linux。检查 configure build 步骤的输出,或者检查 python -m sysconfig | grep HAVE_PERF_TRAMPOLINE
的输出,看看是否支持您的系统。
例如,考虑以下脚本:
def foo(n):
result = 0
for _ in range(n):
result += 1
return result
def bar(n):
foo(n)
def baz(n):
bar(n)
if __name__ == "__main__":
baz(1000000)
可以运行 perf 以 9999 赫兹的频率对 CPU 堆栈跟踪进行采样:
$ perf record -F 9999 -g -o perf.data python my_script.py
然后可以使用 perf 报告来分析数据:
$ perf report --stdio -n -g
# Children Self Samples Command Shared Object Symbol
# ........ ........ ............ .......... .................. ..........................................
#
91.08% 0.00% 0 python.exe python.exe [.] _start
|
---_start
|
--90.71%--__libc_start_main
Py_BytesMain
|
|--56.88%--pymain_run_python.constprop.0
| |
| |--56.13%--_PyRun_AnyFileObject
| | _PyRun_SimpleFileObject
| | |
| | |--55.02%--run_mod
| | | |
| | | --54.65%--PyEval_EvalCode
| | | _PyEval_EvalFrameDefault
| | | PyObject_Vectorcall
| | | _PyEval_Vector
| | | _PyEval_EvalFrameDefault
| | | PyObject_Vectorcall
| | | _PyEval_Vector
| | | _PyEval_EvalFrameDefault
| | | PyObject_Vectorcall
| | | _PyEval_Vector
| | | |
| | | |--51.67%--_PyEval_EvalFrameDefault
| | | | |
| | | | |--11.52%--_PyLong_Add
| | | | | |
| | | | | |--2.97%--_PyObject_Malloc
...
As you can see here, the Python functions are not shown in the output, only _Py_Eval_EvalFrameDefault
appears
(the function that evaluates the Python bytecode) shows up. Unfortunately that’s not very useful because all Python
functions use the same C function to evaluate bytecode so we cannot know which Python function corresponds to which
bytecode-evaluating function.
Instead, if we run the same experiment with perf support activated we get:
$ perf report --stdio -n -g
# Children Self Samples Command Shared Object Symbol
# ........ ........ ............ .......... .................. .....................................................................
#
90.58% 0.36% 1 python.exe python.exe [.] _start
|
---_start
|
--89.86%--__libc_start_main
Py_BytesMain
|
|--55.43%--pymain_run_python.constprop.0
| |
| |--54.71%--_PyRun_AnyFileObject
| | _PyRun_SimpleFileObject
| | |
| | |--53.62%--run_mod
| | | |
| | | --53.26%--PyEval_EvalCode
| | | py::<module>:/src/script.py
| | | _PyEval_EvalFrameDefault
| | | PyObject_Vectorcall
| | | _PyEval_Vector
| | | py::baz:/src/script.py
| | | _PyEval_EvalFrameDefault
| | | PyObject_Vectorcall
| | | _PyEval_Vector
| | | py::bar:/src/script.py
| | | _PyEval_EvalFrameDefault
| | | PyObject_Vectorcall
| | | _PyEval_Vector
| | | py::foo:/src/script.py
| | | |
| | | |--51.81%--_PyEval_EvalFrameDefault
| | | | |
| | | | |--13.77%--_PyLong_Add
| | | | | |
| | | | | |--3.26%--_PyObject_Malloc
Enabling perf profiling mode¶
There are two main ways to activate the perf profiling mode. If you want it to be
active since the start of the Python interpreter, you can use the -Xperf
option:
$ python -Xperf my_script.py
You can also set the PYTHONPERFSUPPORT
to a nonzero value to actiavate perf
profiling mode globally.
There is also support for dynamically activating and deactivating the perf
profiling mode by using the APIs in the sys
module:
import sys
sys.activate_stack_trampoline("perf")
# Run some code with Perf profiling active
sys.deactivate_stack_trampoline()
# Perf profiling is not active anymore
These APIs can be handy if you want to activate/deactivate profiling mode in response to a signal or other communication mechanism with your process.
Now we can analyze the data with perf report
:
$ perf report -g -i perf.data
How to obtain the best results¶
For the best results, Python should be compiled with
CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer"
as this allows
profilers to unwind using only the frame pointer and not on DWARF debug
information. This is because as the code that is interposed to allow perf
support is dynamically generated it doesn’t have any DWARF debugging information
available.
You can check if you system has been compiled with this flag by running:
$ python -m sysconfig | grep ‘no-omit-frame-pointer’
If you don’t see any output it means that your interpreter has not been compiled with
frame pointers and therefore it may not be able to show Python functions in the output
of perf
.