Echion
Near-zero-overhead, in-process CPython frame stack sampler with async support
Synopsis
Echion is an in-process CPython frame stack sampler. It can achieve
near-zero-overhead, similar to Austin, by sampling the frame stack of
each thread without holding the GIL. Native stacks can be sampled too, but the
overhead is higher.
Echion is also the first example of a high-performance sampling async profiler
for CPython.
Installation
Currently Echion is available to install from PyPI with
pip install echion
Alternativey, if a wheel is not available for your combination of platform and
architecture, it can be installed from sources with
pip install git+https://github.com/p403n1x87/echion
Compilation requires a C++ compiler and static versions of the libunwind
and
lzma
libraries.
Usage
The following is the output of the echion --help
command.
usage: echion [-h] [-i INTERVAL] [-c] [-n] [-o OUTPUT] [-s] [-w] [-v] [-V] ...
In-process CPython frame stack sampler
positional arguments:
command Command string to execute.
options:
-h, --help show this help message and exit
-i INTERVAL, --interval INTERVAL
sampling interval in microseconds
-c, --cpu sample on-CPU stacks only
-x EXPOSURE, --exposure EXPOSURE
exposure time, in seconds
-m, --memory Collect memory allocation events
-n, --native sample native stacks
-o OUTPUT, --output OUTPUT
output location (can use %(pid) to insert the process ID)
-p PID, --pid PID Attach to the process with the given PID
-s, --stealth stealth mode (sampler thread is not accounted for)
-w WHERE, --where WHERE
where mode: display thread stacks of the given process
-v, --verbose verbose logging
-V, --version show program's version number and exit
The output is written to a file specified with the --output
option. Curretly, this is in
the format of the normal Austin format, that is collapsed stacks with
metadata at the top. This makes it easy to re-use existing visualisation tools,
like the Austin VS Code extension.
Compatibility
Supported platforms: Linux (amd64, i686), Darwin (amd64, aarch64)
Supported interpreters: CPython 3.8-3.11
Notes
Attaching to a process (including in where mode) requires extra permissions. On
Unix, you can attach to a running process with sudo
. On Linux, one may also
set the ptrace scope to 0
with sudo sysctl kernel.yama.ptrace_scope=0
to
allow attaching to any process. However, this is not recommended for security
reasons.
Where mode
The where mode is similar to Austin's where mode, that is Echion will
dump the stacks of all running threads to standard error. This is useful for
debugging deadlocks and other issues that may occur in a running process.
When running or attaching to a process, you can also send a SIGQUIT
signal to
dump the stacks of all running threads. The result is similar to the where mode.
You can normally send a SIGQUIT
signal with the CTRL+\
key combination.
Memory mode
Besides wall time and CPU time, Echion can be used to profile memory
allocations. In this mode, Echion tracks the Python memory domain allocators and
accounts for each single event. Because of the tracing nature, this mode
introduces considerable overhead, but gives pretty accurate results that can be
used to investigate potential memory leaks. To fully understand that data that
is collected in this mode, one should be aware of how Echion tracks allocations
and deallocations. When an allocation is made, Echion records the frame stack
that was involved and maps it to the returned memory address. When a
deallocation for a tracked memory address is made, the freed memory is accounted
for the same stack. Therefore, objects that are allocated and deallocated during
the tracking period account for a total of 0 allocated bytes. This means that
all the non-negative values reported by Echion represent memory that was still
allocated by the time the tracking ended.
Since Echion 0.3.0.
Why Echion?
Sampling in-process comes with some benefits. One has easier access to more
information, like thread names, and potentially the task abstraction of async
frameworks, like asyncio
, gevent
, ... . Also available is more accurate
per-thread CPU timing information.
Currently, Echion supports sampling asyncio-based applications, but not in
native mode. This makes Echion the very first example of an async profiler for
CPython.
Echion relies on some assumptions to collect and sample all the running threads
without holding the GIL. This makes Echion very similar to tools like
Austin. However, some features, like multiprocess support, are more
complicated to handle and would require the use of e.g. IPC solutions.
Furthermore, Echion normally requires that you install it within your
environment, wheareas Austin can be installed indepdendently.
How it works
On a fundamental level, there is one key assumption that Echion relies upon:
The interpreter state object lives as long as the CPython process itself.
All unsafe memory reads are performed indirectly via copies of data structure
obtained with the use of system calls like process_vm_readv
. This is
essentially what allows Echion to run its sampling thread without the GIL.
As for attaching to a running process, we make use of the hypno library
to inject Python code that bootstraps Echion into the target process.