🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more →

Book a Demo Install Sign in

pynonymizer

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

pynonymizer

An anonymization tool for production databases

2.5.0

PyPI

Maintainers: 1

`pynonymizer`

pynonymizer

pynonymizer is a tool for anonymizing sensitive production database dumps, allowing you to create realistic test datasets while maintaining GDPR/Data Protection compliance. It replaces personally identifiable information (PII) in your database with random, yet realistic data, using the Faker library and other functions.

Key features:

Supports MySQL, PostgreSQL, and MSSQL databases
Accepts various input formats (SQL, compressed files)
Generates anonymized output in multiple formats
Flexible data generation strategies for different use cases
Easy to use command-line interface and Python library

With pynonymizer, you can safely share production database copies with developers and testers, enabling better staging environments, integration tests, and database migration simulations, without compromising user privacy.

How does it work?

pynonymizer replaces personally identifiable data in your database with realistic pseudorandom data, from the Faker library or from other functions. There are a wide variety of data types available which should suit the column in question, for example:

unique_email
company
file_path
[...]

Pynonymizer's main data replacement mechanism fake_update is a random selection from a small pool of data (--seed-rows controls the available Faker data). This process is chosen for compatibility and speed of operation, but does not guarantee uniqueness. This may or may not suit your exact use-case. For a full list of data generation strategies, see the docs on strategyfiles

Examples

You can see strategyfile examples for existing databases, in the the examples folder.

Process outline

Restore from dumpfile to temporary database.
Anonymize temporary database with strategy.
Dump resulting data to file.
Drop temporary database.

If this workflow doesnt work for you, see process control to see if it can be adjusted to suit your needs.

mysql

mysql/mysqldump Must be in $PATH
Local or remote mysql >= 5.5
Supported Inputs:
- Plain SQL over stdout
- Plain SQL file .sql
- GZip-compressed SQL file .gz
Supported Outputs:
- Plain SQL over stdout
- Plain SQL file .sql
- GZip-compressed SQL file .gz
- LZMA-compressed SQL file .xz

mssql

Requires extra dependencies: install package pynonymizer[mssql]
MSSQL >= 2008
For RESTORE_DB/DUMP_DB operations, the database server must be running locally with pynonymizer. This is because MSSQL RESTORE and BACKUP instructions are received by the database, so piping a local backup to a remote server is not possible.
The anonymize process can be performed on remote servers, but you are responsible for creating/managing the target database.
Supported Inputs:
- Local backup file
Supported Outputs:
- Local backup file

postgres

psql/pg_dump Must be in $PATH
Local or remote postgres server
Supported Inputs:
- Plain SQL over stdout
- Plain SQL file .sql
- GZip-compressed SQL file .gz
Supported Outputs:
- Plain SQL over stdout
- Plain SQL file .sql
- GZip-compressed SQL file .gz
- LZMA-compressed SQL file .xz

Getting Started

Usage

CLI

Write a strategyfile for your database
Check out the help for a description of options pynonymizer --help
Start Anonymizing!

Docker

Docker Image Version

pynonymizer is available as a docker image so that you dont have to install the client tools for your database.

See https://hub.docker.com/repository/docker/rwnxt/pynonymizer

# As pynonymizer depends on strategyfiles, you'll need to create a file mount so the file can be read.
docker run --mount type=bind,source=./strategyfile.yml,target=/tmp/strategyfile.yml rwnxt/pynonymizer -s /tmp/strategyfile.yml --db-host [...]

Package

Pynonymizer can also be invoked programmatically / from other python code. See the module entrypoint pynonymizer or pynonymizer/pynonymize.py

import pynonymizer

pynonymizer.run(input_path="./backup.sql", strategyfile_path="./strategy.yml" [...] )

Keywords

anonymization gdpr database mysql

FAQs

What is pynonymizer?

Is pynonymizer well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

pynonymizer

pynonymizer

How does it work?

Examples

Process outline

mysql

mssql

postgres

Getting Started

Usage

CLI

Docker

Package

Keywords

Related posts

Browserslist-rs Gets Major Refactor, Cutting Binary Size by Over 1MB

8 More Malicious Firefox Extensions: Exploiting Popular Game Recognition, Hijacking User Sessions, and Stealing OAuth Credentials