Python 对 Linux 性能分析器 perf 的支持

author

Pablo Galindo

Linux 的 perf 分析器是一个非常强大的工具,它允许你分析和获取关于应用程序性能的信息。perf 也有一个非常活跃的工具生态系统,可以帮助分析它产生的数据。

在 Python 应用程序中使用 perf 分析器的主要问题是,perf 只允许获取有关本机符号的信息,即用 C 编写的函数和过程的名称。这意味着代码中的 Python 函数的名称和文件名不会出现在 perf 的输出中。

从 Python 3.12 开始,解释器可以一种特殊的模式运行,该模式允许 Python 函数出现在性能分析器的输出中。当启用此模式时,解释器将在每个 Python 函数执行之前插入一小段动态编译的代码,并使用 perf map files 将这段代码与相关的 Python 函数之间的关系教授给 perf

警告

对性能分析器 perf 的支持目前只适用于选定架构上的 Linux。检查 configure build 步骤的输出,或者检查 python -m sysconfig | grep HAVE_PERF_TRAMPOLINE 的输出,看看是否支持您的系统。

例如,考虑以下脚本:

def foo(n):
    result = 0
    for _ in range(n):
        result += 1
    return result

def bar(n):
    foo(n)

def baz(n):
    bar(n)

if __name__ == "__main__":
    baz(1000000)

可以运行 perf 以 9999 赫兹的频率对 CPU 堆栈跟踪进行采样:

$ perf record -F 9999 -g -o perf.data python my_script.py

然后可以使用 perf 报告来分析数据:

$ perf report --stdio -n -g

# Children      Self       Samples  Command     Shared Object       Symbol
# ........  ........  ............  ..........  ..................  ..........................................
#
    91.08%     0.00%             0  python.exe  python.exe          [.] _start
            |
            ---_start
            |
                --90.71%--__libc_start_main
                        Py_BytesMain
                        |
                        |--56.88%--pymain_run_python.constprop.0
                        |          |
                        |          |--56.13%--_PyRun_AnyFileObject
                        |          |          _PyRun_SimpleFileObject
                        |          |          |
                        |          |          |--55.02%--run_mod
                        |          |          |          |
                        |          |          |           --54.65%--PyEval_EvalCode
                        |          |          |                     _PyEval_EvalFrameDefault
                        |          |          |                     PyObject_Vectorcall
                        |          |          |                     _PyEval_Vector
                        |          |          |                     _PyEval_EvalFrameDefault
                        |          |          |                     PyObject_Vectorcall
                        |          |          |                     _PyEval_Vector
                        |          |          |                     _PyEval_EvalFrameDefault
                        |          |          |                     PyObject_Vectorcall
                        |          |          |                     _PyEval_Vector
                        |          |          |                     |
                        |          |          |                     |--51.67%--_PyEval_EvalFrameDefault
                        |          |          |                     |          |
                        |          |          |                     |          |--11.52%--_PyLong_Add
                        |          |          |                     |          |          |
                        |          |          |                     |          |          |--2.97%--_PyObject_Malloc
...

As you can see here, the Python functions are not shown in the output, only _Py_Eval_EvalFrameDefault appears (the function that evaluates the Python bytecode) shows up. Unfortunately that’s not very useful because all Python functions use the same C function to evaluate bytecode so we cannot know which Python function corresponds to which bytecode-evaluating function.

Instead, if we run the same experiment with perf support activated we get:

$ perf report --stdio -n -g

# Children      Self       Samples  Command     Shared Object       Symbol
# ........  ........  ............  ..........  ..................  .....................................................................
#
    90.58%     0.36%             1  python.exe  python.exe          [.] _start
            |
            ---_start
            |
                --89.86%--__libc_start_main
                        Py_BytesMain
                        |
                        |--55.43%--pymain_run_python.constprop.0
                        |          |
                        |          |--54.71%--_PyRun_AnyFileObject
                        |          |          _PyRun_SimpleFileObject
                        |          |          |
                        |          |          |--53.62%--run_mod
                        |          |          |          |
                        |          |          |           --53.26%--PyEval_EvalCode
                        |          |          |                     py::<module>:/src/script.py
                        |          |          |                     _PyEval_EvalFrameDefault
                        |          |          |                     PyObject_Vectorcall
                        |          |          |                     _PyEval_Vector
                        |          |          |                     py::baz:/src/script.py
                        |          |          |                     _PyEval_EvalFrameDefault
                        |          |          |                     PyObject_Vectorcall
                        |          |          |                     _PyEval_Vector
                        |          |          |                     py::bar:/src/script.py
                        |          |          |                     _PyEval_EvalFrameDefault
                        |          |          |                     PyObject_Vectorcall
                        |          |          |                     _PyEval_Vector
                        |          |          |                     py::foo:/src/script.py
                        |          |          |                     |
                        |          |          |                     |--51.81%--_PyEval_EvalFrameDefault
                        |          |          |                     |          |
                        |          |          |                     |          |--13.77%--_PyLong_Add
                        |          |          |                     |          |          |
                        |          |          |                     |          |          |--3.26%--_PyObject_Malloc

Enabling perf profiling mode

There are two main ways to activate the perf profiling mode. If you want it to be active since the start of the Python interpreter, you can use the -Xperf option:

$ python -Xperf my_script.py

You can also set the PYTHONPERFSUPPORT to a nonzero value to actiavate perf profiling mode globally.

There is also support for dynamically activating and deactivating the perf profiling mode by using the APIs in the sys module:

import sys
sys.activate_stack_trampoline("perf")

# Run some code with Perf profiling active

sys.deactivate_stack_trampoline()

# Perf profiling is not active anymore

These APIs can be handy if you want to activate/deactivate profiling mode in response to a signal or other communication mechanism with your process.

Now we can analyze the data with perf report:

$ perf report -g -i perf.data

How to obtain the best results

For the best results, Python should be compiled with CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" as this allows profilers to unwind using only the frame pointer and not on DWARF debug information. This is because as the code that is interposed to allow perf support is dynamically generated it doesn’t have any DWARF debugging information available.

You can check if you system has been compiled with this flag by running:

$ python -m sysconfig | grep ‘no-omit-frame-pointer’

If you don’t see any output it means that your interpreter has not been compiled with frame pointers and therefore it may not be able to show Python functions in the output of perf.