
Security News
Browserslist-rs Gets Major Refactor, Cutting Binary Size by Over 1MB
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
You would normally install the DuckDB released version using pip as follows:
pip install duckdb
For development, you may need a DuckDB Python package that is installed from source. Make sure you have the dependencies installed:
pip install -r requirements-dev.txt
In order to install from source, the simplest way is to install by cloning the Git repository and using pip:
cd tools/pythonpkg
python3 -m pip install .
To install in debug mode, set the environment variable $DUCKDEBUG=1
(or some other non-zero value).
Note that this will override any existing DuckDB installation you might have. You might also run into conflicts depending on your Python environment. In order to remedy that, it is possible to use virtualenv for installation, e.g. by running the following commands:
cd tools/pythonpkg
virtualenv .venv --python=python3.12
source .venv/bin/activate
python3 -m pip install .
To test, run:
cd ../..
python3 -c "import duckdb; duckdb.sql('SELECT version() AS version').show()"
You can build using the make command with BUILD_PYTHON
flag set. For example:
BUILD_PYTHON=1 make debug
Alternatively, you may need the package files to reside under the same prefix where the library is installed; e.g., when installing to cloud storage from a notebook.
First, get the repository based version number and extract the source distribution.
python3 -m pip install build # required for PEP 517 compliant source dists
cd tools/pythonpkg
export SETUPTOOLS_SCM_PRETEND_VERSION=$(python3 -m setuptools_scm)
pyproject-build . --sdist
cd ../..
Next, copy over the python package related files, and install the package.
mkdir -p $DUCKDB_PREFIX/src/duckdb-pythonpkg
tar --directory=$DUCKDB_PREFIX/src/duckdb-pythonpkg -xzpf tools/pythonpkg/dist/duckdb-${SETUPTOOLS_SCM_PRETEND_VERSION}.tar.gz
pip install --prefix $DUCKDB_PREFIX -e $DUCKDB_PREFIX/src/duckdb-pythonpkg/duckdb-${SETUPTOOLS_SCM_PRETEND_VERSION}
*.pyi
stubs in duckdb-stubs
are manually maintained. The connection-related stubs are generated using dedicated scripts in tools/pythonpkg/scripts/
:
generate_connection_stubs.py
generate_connection_wrapper_stubs.py
These stubs are important for autocomplete in many IDEs, as static-analysis based language servers can't introspect duckdb
's binary module.
To verify the stubs match the actual implementation:
python3 -m pytest tests/stubs
If you add new methods to the DuckDB Python API, you'll need to manually add corresponding type hints to the stub files.
If you are faced with an error on import duckdb
:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/bin/python3/site-packages/duckdb/__init__.py", line 4, in <module>
import duckdb.functional as functional
File "/usr/bin/python3/site-packages/duckdb/functional/__init__.py", line 1, in <module>
from duckdb.duckdb.functional import (
ImportError: dlopen(/usr/bin/python3/site-packages/duckdb/duckdb.cpython-311-darwin.so, 0x0002): symbol not found in flat namespace '_MD5_Final'
When building DuckDB it outputs which extensions are linked into DuckDB, the python package does not deal with linked in extensions very well.
The output looks something like this:
-- Extensions linked into DuckDB: [json, fts, tpcds, tpch, parquet, icu, httpfs]
httpfs
should not be in that list, among others.
Refer to extension/extension_config_local.cmake
or the other *.cmake
files and make sure you add DONT_LINK to the problematic extension.
tools/pythonpkg/duckdb_extension_config.cmake
contains the default list of extensions built for the python package
Anything that is linked that is not listed there should be considered problematic.
The pythonpkg does not use the CMakeLists for compilation, for that it uses pip and package_build.py
mostly.
But we still have CMakeLists in the pythonpkg, for tidy-check and intellisense purposes.
For this reason it might not be instantly apparent that the CMakeLists are incorrectly set up, and will only result in a very confusing CI failure of TidyCheck.
To prevent this, or to help you when you encounter said CI failure, here are a couple of things to note about the CMakeLists in here.
The pythonpkg depends on PythonLibs
, and pybind11
, for some reason PythonLibs
can not be found by clang-tidy when generating the compile_commands.json
file
So the reason for clang-tidy failing is likely that there is no entry for a file in the compile_commands.json
, check the CMakeLists to see why cmake did not register it.
Helpful information:
clang-tidy
is not a standard binary on MacOS, and can not be installed with brew directly (doing so will try to install clang-format, and they are not the same thing)
Instead clang-tidy is part of llvm
, so you'll need to install that (brew install llvm
), after installing llvm you'll likely have to add the llvm binaries folder to your PATH variable to use clang-tidy
For example:
export PATH="$PATH:/opt/homebrew/Cellar/llvm/15.0.2/bin"
These are classes provided by pybind11, the library we use to manage our interaction with the python environment. py::handle is a direct wrapper around a raw PyObject* and does not manage any references. py::object is similar to py::handle but it can handle refcounts.
I say can because it doesn't have to, using py::reinterpret_borrow<py::object>(...)
we can create a non-owning py::object, this is essentially just a py::handle but py::handle can't be used if the prototype requires a py::object.
py::reinterpret_steal<py::object>(...)
creates an owning py::object, this will increase the refcount of the python object and will decrease the refcount when the py::object goes out of scope.
When directly interacting with python functions that return a PyObject*
, such as PyDateTime_DATE_GET_TZINFO
, you should generally wrap the call in py::reinterpret_steal
to take ownership of the returned object.
FAQs
DuckDB in-process database
We found that duckdb demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Research
Security News
Eight new malicious Firefox extensions impersonate games, steal OAuth tokens, hijack sessions, and exploit browser permissions to spy on users.
Security News
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.