
Security News
Opengrep Adds Apex Support and New Rule Controls in Latest Updates
The latest Opengrep releases add Apex scanning, precision rule tuning, and performance gains for open source static code analysis.
This package provides base data structures for the management of PII i.e. Personally Identifiable Information (it does not contain code for processing documents, or extracting PII from documents).
For the full specification embodied by these base data structures, check the PIISA Data Specification.
Two main data types are defined to hold PII information: PII Entities and PII Collections. There is also a Source Document data type.
A PII Source Document defines the raw data from which PII is detected. This document is modeled as a number of chunks, each one having an identifier and a data contents (a raw text excerpt, or other types of content). This is managed in this package by the SrcDocument class and subclasses.
The package contains the capability to dump a Source Document to a local file, following a standardized schema, and to read it back from the file. This schema uses YAML as support file format, and is the only document read capability natively provided by the package (to read other formats into Source Document objects there is an auxiliary pii-preprocess package, or you can implement yout own).
The package can also export documents as raw text files.
A PII Collection contains a list of detected/extracted PII Entities. Each entity contains all the information needed to correctly identify one PII instance and locate it in the document it belongs to.
These are the PII data classes defined:
PiiEntityInfo
object)PiiCollectionLoader
subclass can load a collection from a JSON file)PiiDetector
: an object to describe the module used to generate a given
PiiEntity
objectThere is partial support to use these data classes in an streaming fashion, providing a way to feed data incrementally.
FAQs
Base data structures for PII Processing
We found that pii-data demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The latest Opengrep releases add Apex scanning, precision rule tuning, and performance gains for open source static code analysis.
Security News
npm now supports Trusted Publishing with OIDC, enabling secure package publishing directly from CI/CD workflows without relying on long-lived tokens.
Research
/Security News
A RubyGems malware campaign used 60 malicious packages posing as automation tools to steal credentials from social media and marketing tool users.