
Security News
VulnCon 2025: NVD Scraps Industry Consortium Plan, Raising Questions About Reform
At VulnCon 2025, NIST scrapped its NVD consortium plans, admitted it can't keep up with CVEs, and outlined automation efforts amid a mounting backlog.
PIICatcher is a scanner for PII and PHI information. It finds PII data in your databases and file systems and tracks critical data. PIICatcher uses two techniques to detect PII:
Read more in the blog post on both these strategies.
PIICatcher is batteries-included with a growing set of plugins to scan column metadata as well as metadata. For example, piicatcher_spacy uses Spacy to detect PII in column data.
PIICatcher supports incremental scans and will only scan new or not-yet scanned columns. Incremental scans allow easy scheduling of scans. It also provides powerful options to include or exclude schema and tables to manage compute resources.
There are ingestion functions for both Datahub and Amundsen which will tag columns and tables with PII and the type of PII tags.
PIICatcher is available as a docker image or command-line application.
Docker:
alias piicatcher='docker run -v ${HOME}/.config/tokern:/config -u $(id -u ${USER}):$(id -g ${USER}) -it --add-host=host.docker.internal:host-gateway tokern/piicatcher:latest'
Pypi: # Install development libraries for compiling dependencies. # On Amazon Linux sudo yum install mysql-devel gcc gcc-devel python-devel
python3 -m venv .env
source .env/bin/activate
pip install piicatcher
# Install Spacy plugin
pip install piicatcher_spacy
# add a sqlite source
piicatcher catalog add-sqlite --name sqldb --path '/db/sqldb/test.db'
# run piicatcher on a sqlite db and print report to console
piicatcher detect --source-name sqldb
╭─────────────┬─────────────┬─────────────┬─────────────╮
│ schema │ table │ column │ has_pii │
├─────────────┼─────────────┼─────────────┼─────────────┤
│ main │ full_pii │ a │ 1 │
│ main │ full_pii │ b │ 1 │
│ main │ no_pii │ a │ 0 │
│ main │ no_pii │ b │ 0 │
│ main │ partial_pii │ a │ 1 │
│ main │ partial_pii │ b │ 0 │
╰─────────────┴─────────────┴─────────────┴─────────────╯
Code Snippet:
from dbcat.api import open_catalog, add_postgresql_source
from piicatcher.api import scan_database
# PIICatcher uses a catalog to store its state.
# The easiest option is to use a sqlite memory database.
# For production usage check, https://tokern.io/docs/data-catalog
catalog = open_catalog(app_dir='/tmp/.config/piicatcher', path=':memory:', secret='my_secret')
with catalog.managed_session:
# Add a postgresql source
source = add_postgresql_source(catalog=catalog, name="pg_db", uri="127.0.0.1", username="piiuser",
password="p11secret", database="piidb")
output = scan_database(catalog=catalog, source=source)
print(output)
# Example Output
[
['public', 'sample', 'gender', 'PiiTypes.GENDER'],
['public', 'sample', 'maiden_name', 'PiiTypes.PERSON'],
['public', 'sample', 'lname', 'PiiTypes.PERSON'],
['public', 'sample', 'fname', 'PiiTypes.PERSON'],
['public', 'sample', 'address', 'PiiTypes.ADDRESS'],
['public', 'sample', 'city', 'PiiTypes.ADDRESS'],
['public', 'sample', 'state', 'PiiTypes.ADDRESS'],
['public', 'sample', 'email', 'PiiTypes.EMAIL']
]
PIICatcher can be extended by creating new detectors. PIICatcher supports two scanning techniques:
Plugins can be created for either of these two techniques. Plugins are then registered using an API or using Python Entry Points.
To create a new detector, simply create a new class that inherits from MetadataDetector
or DatumDetector
.
In the new class, define a function detect
that will return a PIIType
If you are detecting a new PII type, then you can define a new class that inherits from PIIType.
For detailed documentation, check piicatcher plugin docs.
PIICatcher supports the following databases:
For advanced usage refer documentation PIICatcher Documentation.
Please take this survey if you are a user or considering using PIICatcher. The responses will help to prioritize improvements to the project.
We use cookies to a analyse our traffic and features usage. We may share information about your use of our product for our social media and marketing purposes. These cookies don't collect your sensitive and/or confidential information. If you would like to opt out of these cookies, run
piicatcher --disable-stats
To Enable:
piicatcher --enable-stats
For Contribution guidelines, PIICatcher Developer documentation.
FAQs
Find PII data in databases
We found that piicatcher demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
At VulnCon 2025, NIST scrapped its NVD consortium plans, admitted it can't keep up with CVEs, and outlined automation efforts amid a mounting backlog.
Product
We redesigned our GitHub PR comments to deliver clear, actionable security insights without adding noise to your workflow.
Product
Our redesigned Repositories page adds alert severity, filtering, and tabs for faster triage and clearer insights across all your projects.