Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
See CONTRIBUTING.md for details on the internals of cppimport
and how to get involved in development.
Install with pip install cppimport
.
Save the C++ code below as somecode.cpp
.
// cppimport
#include <pybind11/pybind11.h>
namespace py = pybind11;
int square(int x) {
return x * x;
}
PYBIND11_MODULE(somecode, m) {
m.def("square", &square);
}
/*
<%
setup_pybind11(cfg)
%>
*/
Then open a Python interpreter and import the C++ extension:
>>> import cppimport.import_hook
>>> import somecode #This will pause for a moment to compile the module
>>> somecode.square(9)
81
Hurray, you've called some C++ code from Python using a combination of cppimport
and pybind11
.
I'm a big fan of the workflow that this enables, where you can edit both C++ files and Python and recompilation happens transparently! It's also handy for quickly whipping together an optimized version of a slow Python function.
Okay, now that I've hopefully convinced you on how exciting this is, let's get into the details of how to do this yourself. First, the comment at top is essential to opt in to cppimport. Don't forget this! (See below for an explanation of why this is necessary.)
// cppimport
The bulk of the file is a generic, simple pybind11 extension. We include the pybind11
headers, then define a simple function that squares x
, then export that function as part of a Python extension called somecode
.
Finally at the end of the file, there's a section I'll call the "configuration block":
<%
setup_pybind11(cfg)
%>
This region surrounded by <%
and %>
is a Mako code block. The region is evaluated as Python code during the build process and provides configuration info like compiler and linker flags to the cppimport build system.
Note that because of the Mako pre-processing, the comments around the configuration block may be omitted. Putting the configuration block at the end of the file, while optional, ensures that line numbers remain correct in compilation error messages.
In production deployments you usually don't want to include a c/c++ compiler, all the sources and compile at runtime. Therefore, a simple cli utility for pre-compiling all source files is provided. This utility may, for example, be used in CI/CD pipelines.
Usage is as simple as
python -m cppimport build
This will build all *.c
and *.cpp
files in the current directory (and it's subdirectories) if they are eligible to be imported (i.e. contain the // cppimport
comment in the first line).
Alternatively, you may specifiy one or more root directories or source files to be built:
python -m cppimport build ./my/directory/ ./my/single/file.cpp
Note: When specifying a path to a file, the header check (// cppimport
) is skipped for that file.
To further improve startup performance for production builds, you can opt-in to skip the checksum and compiled binary existence checks during importing by either setting the environment variable CPPIMPORT_RELEASE_MODE
to true
or setting the configuration from within Python:
cppimport.settings['release_mode'] = True
Warning: Make sure to have all binaries pre-compiled when in release mode, as importing any missing ones will cause exceptions.
Sometimes Python just isn't fast enough. Or you have existing code in a C or C++ library. So, you write a Python extension module, a library of compiled code. I recommend pybind11 for C++ to Python bindings or cffi for C to Python bindings. I've done this a lot over the years. But, I discovered that my productivity is slower when my development process goes from Edit -> Test in just Python to Edit -> Compile -> Test in Python plus C++. So, cppimport
combines the process of compiling and importing an extension in Python so that you can just run import foobar
and not have to worry about multiple steps. Internally, cppimport
looks for a file foobar.cpp
. Assuming one is found, it's run through the Mako templating system to gather compiler options, then it's compiled and loaded as an extension module.
No! Compilation should only happen the first time the module is imported. The C++ source is compared with a checksum on each import to determine if any relevant file has changed. Additional dependencies (e.g. header files!) can be tracked by adding to the Mako header:
cfg['dependencies'] = ['file1.h', 'file2.h']
The checksum is computed by simply appending the contents of the extension C++ file together with the files in cfg['sources']
and cfg['dependencies']
.
Standard distutils configuration options are valid:
cfg['extra_link_args'] = ['...']
cfg['extra_compile_args'] = ['...']
cfg['libraries'] = ['...']
cfg['include_dirs'] = ['...']
For example, to use C++11, add:
cfg['extra_compile_args'] = ['-std=c++11']
In the configuration block:
cfg['sources'] = ['extra_source1.cpp', 'extra_source2.cpp']
cppimport
uses the standard Python logging tools. Please add logging handlers to either the root logger or the "cppimport"
logger. For example, to output all debug level log messages:
root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
root_logger.addHandler(handler)
Set:
cppimport.settings['force_rebuild'] = True
And if this is a common occurence, I would love to hear your use case and why the combination of the checksum, cfg['dependencies']
and cfg['sources']
is insufficient!
Note that force_rebuild
does not work when importing the module concurrently.
It's safe to use cppimport
to import a module concurrently using multiple threads, processes or even machines!
Before building a module, cppimport
obtains a lockfile preventing other processors from building it at the same time - this prevents clashes that can lead to failure.
Other processes will wait maximum 10 mins until the first process has built the module and load it. If your module does not build within 10 mins then it will timeout.
You can increase the timeout time in the settings:
cppimport.settings['lock_timeout'] = 10*60 # 10 mins
You should not use force_rebuild
when importing concurrently.
The module name is available as the fullname
variable and the C++ module file is available as filepath
.
For example,
<%
module_dir = os.path.dirname(filepath)
%>
In single file extensions, this is a fundamental issue with C++. Heavily templated code is often quite slow to compile.
If your extension has multiple source files using the cfg['sources']
capability, then you might be hoping for some kind of incremental compilation. For the uninitiated, incremental compilation involves only recompiling those source files that have changed. Unfortunately this isn't possible because cppimport is built on top of the setuptools and distutils and these standard library components do not support incremental compilation.
I recommend following the suggestions on this SO answer. That is:
ccache
to reduce the cost of rebuildscfg['parallel'] = True
in the C++ file's configuration header.As a further thought, if your extension has many source files and you're hoping to do incremental compiles, that probably indicates that you've outgrown cppimport
and should consider using a more complete build system like CMake.
Modifying the Python import system is a global modification and thus affects all imports from any other package. As a result, when I first implemented cppimport
, other packages (e.g. scipy
) suddenly started breaking because import statements internal to those packages were importing C or C++ files instead of the modules they were intended to import. To avoid this failure mode, the import hook uses an "opt in" system where C and C++ files can specify they are meant to be used with cppimport by having a comment on the first line that includes the text "cppimport".
As an alternative to the import hook, you can use imp
or imp_from_filepath
. The cppimport.imp
and cppimport.imp_from_filepath
performs exactly the same operation as the import hook but in a slightly more explicit way:
foobar = cppimport.imp("foobar")
foobar = cppimport.imp_from_filepath("src/foobar.cpp")
By default, these explicit function do not require the "cppimport" keyword on the first line of the C++ source file.
The CI system does not run on Windows. A PR would be welcome adding further Windows support. I've used cppimport
with MinGW-w64 and Python 3.6 and had good success. I've also had reports that cppimport
works on Windows with Python 3.6 and Visual C++ 2015 Build Tools. The main challenge is making sure that distutils is aware of your available compilers. Try out the suggestion here.
FAQs
Import C++ files directly from Python!
We found that cppimport demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.