🚀 Socket Launch Week Day 5:Introducing Repository Access Permissions and Custom Roles.Learn more →

pandas-maxminddb

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

pandas-maxminddb

Fast geolocation library for Pandas Dataframes, built on Numpy C-FFI

PyPI

Version: 0.2.1

Weekly downloads: 68

Maintainers: 1

Weekly downloads

Pandas Maxmind

Provides fast and convenient geolocation bindings for Pandas Dataframes. Uses numpy ndarray's internally to speed it up compared to naively applying function per column. Based on the maxminddb-rust.

Features

Supports both MMAP and in-memory implementations
Supports parallelism (useful for very big datasets)
Comes with pre-built wheels, no need to install and maintain external C-library to get (better than) C-performance

Installation

Minimal supported Python is 3.8
pip install pandas_maxminddb
The preferred way is to use precompiled binary wheel, as this requires no toolchain and is fastest.
If you want to build from source any platform Rust has target for is supported.

Pre-built wheels

The wheels are built against following numpy and pandas distributions:

If you're on Windows / macOS / Linux there is no need to do anything extra.
If you use ARMv7 (RaspberryPi and such) use PiWheels --extra-index-url=https://www.piwheels.org/simple, install libatlas-base-dev for numpy.
If you use musl-based distro like Alpine use Alpine-wheels --extra-index-url https://alpine-wheels.github.io/index , install libstdc++ for pandas.

Refer to the build workflow for details.

Py	win x86	win x64	macOS x86_64	macOS AArch64	linux x86_64	linux i686	linux AArch64	linux ARMv7	musl linux x86_64
3.8	✅	✅	✅	✅	✅	✅	✅	🚫	✅
3.9	✅	✅	✅	✅	✅	✅	✅	✅	🚫
3.10	🚫	✅	✅	✅	✅	✅	🚫	🚫	✅

Usage

By importing pandas_maxminddb you add Pandas geo extension which allows you to add columns in-place. This example uses context manager for reader lifetime:

import pandas as pd
from pandas_maxminddb import open_database

ips = pd.DataFrame(data={
    'ip': ["75.63.106.74", "132.206.246.203", "94.226.237.31", "128.119.189.49", "2.30.253.245"]})
with open_database('./GeoLite.mmdb/GeoLite2-City.mmdb') as reader:
    ips.geo.geolocate('ip', reader, ['country', 'city', 'state', 'postcode'])
ips

	ip	city	postcode	state	country
0	75.63.106.74	Houston	77070	TX	US
1	132.206.246.203	Montreal	H3A	QC	CA
2	94.226.237.31	Kapellen	2950	VLG	BE
3	128.119.189.49	Northampton	01060	MA	US
4	2.30.253.245	London	SW15	ENG	GB

Without context manager

You can also instantiate reader yourself, eg:

import pandas as pd
from pandas_maxminddb import ReaderMem, ReaderMmap

reader = ReaderMem('./GeoLite.mmdb/GeoLite2-City.mmdb')
ips = pd.DataFrame(data={
    'ip': ["75.63.106.74", "132.206.246.203", "94.226.237.31", "128.119.189.49", "2.30.253.245"]})
ips.geo.geolocate('ip', reader, ['country', 'city', 'state', 'postcode'])
ips

Parallelism

If dataset is big enough, and you have extra cores you might benefit from using them. Currently only ReaderMem is supported:

import pandas as pd
from pandas_maxminddb import ReaderMem

reader = ReaderMem('./GeoLite.mmdb/GeoLite2-City.mmdb')
ips = pd.DataFrame(data={
    'ip': ["75.63.106.74", "132.206.246.203", "94.226.237.31", "128.119.189.49", "2.30.253.245"]})
ips.geo.geolocate('ip', reader, ['country', 'city', 'state', 'postcode'], parallel=True)
ips

Benchmarks

Tested on M1 Max with 1024 chunk size on 100k dataset, refer to benchmark

Name (time in ms)	Min	Max	Mean	StdDev	Median	IQR	Outliers	OPS	Rounds	Iterations
test_benchmark_pandas_parallel_mem_maxminddb	52.7588 (1.0)	57.4206 (1.0)	54.0573 (1.0)	1.1782 (1.15)	53.8497 (1.0)	1.4194 (1.09)	4;1	18.4989 (1.0)	20	1
test_benchmark_pandas_mmap_maxminddb	240.0050 (4.55)	244.3257 (4.26)	242.2177 (4.48)	1.9017 (1.85)	243.1021 (4.51)	3.2122 (2.46)	2;0	4.1285 (0.22)	5	1
test_benchmark_pandas_mem_maxminddb	241.4630 (4.58)	244.2553 (4.25)	242.8391 (4.49)	1.0288 (1.0)	242.7672 (4.51)	1.3064 (1.0)	2;0	4.1180 (0.22)	5	1
test_benchmark_c_maxminddb	1,010.6569 (19.16)	1,055.1080 (18.38)	1,021.3691 (18.89)	18.9273 (18.40)	1,013.3819 (18.82)	12.9544 (9.92)	1;1	0.9791 (0.05)	5	1
test_benchmark_python_maxminddb	9,021.2686 (170.99)	9,188.7629 (160.03)	9,071.0055 (167.80)	70.0512 (68.09)	9,039.7811 (167.87)	84.7766 (64.89)	1;0	0.1102 (0.01)	5	1

Extending

Due to Dataframe columns being flat arrays and geolocation data coming in a hierarchical format you might need to provide more mappings to serve your particular use-case. In order to do that follow Development section to setup your environment and then:

Add column name to the geo_column.rs
Add column mapping to the geolocate.rs

Development

Setting up environment

git clone --recurse-submodules git@github.com:andrusha/pandas-maxminddb.git
PYTHON_CONFIGURE_OPTS="--enable-shared" asdf install
PYTHON_CONFIGURE_OPTS="--enable-shared" python -m venv .venv
source .venv/bin/activate
pip install nox
nox -s test
PYTHONPATH=.venv/lib/python3.8/site-packages cargo test --no-default-features

libmaxminddb

In order to run nox -s bench properly you would need libmaxminddb installed as per maxminddb instructions prior to installing Python package, so that C-extension could be benchmarked properly.

On macOS this would require following:

brew instal libmaxminddb
PATH="/opt/homebrew/Cellar/libmaxminddb/1.7.1/bin:$PATH" LDFLAGS="-L/opt/homebrew/Cellar/libmaxminddb/1.7.1/lib" CPPFLAGS="-I/opt/homebrew/Cellar/libmaxminddb/1.7.1/include" pip install maxminddb --force-reinstall --verbose --no-cache-dir

FAQs

What is pandas-maxminddb?

Is pandas-maxminddb popular?

Is pandas-maxminddb well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

pandas-maxminddb

Pandas Maxmind

Features

Installation

Pre-built wheels

Usage

Without context manager

Parallelism

Benchmarks

Extending

Development

Setting up environment

libmaxminddb

Related posts

Socket MCP Adds Org Alerts, Threat Feed Review, and Package Inspection

Socket Firewall Now Blocks Malicious VS Code and Open VSX Extensions