Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Data anonymization package, supporting different anonymization strategies
Documentation: https://eriknovak.github.io/anonipy
Source code: https://github.com/eriknovak/anonipy
The anonipy package is a python package for data anonymization. It is designed to be simple to use and highly customizable, supporting different anonymization strategies. Powered by LLMs.
pip install anonipy
pip install anonipy --upgrade
original_text = """\
Medical Record
Patient Name: John Doe
Date of Birth: 15-01-1985
Date of Examination: 20-05-2024
Social Security Number: 123-45-6789
Examination Procedure:
John Doe underwent a routine physical examination. The procedure included measuring vital signs (blood pressure, heart rate, temperature), a comprehensive blood panel, and a cardiovascular stress test. The patient also reported occasional headaches and dizziness, prompting a neurological assessment and an MRI scan to rule out any underlying issues.
Medication Prescribed:
Ibuprofen 200 mg: Take one tablet every 6-8 hours as needed for headache and pain relief.
Lisinopril 10 mg: Take one tablet daily to manage high blood pressure.
Next Examination Date:
15-11-2024
"""
Use the language detector to detect the language of the text:
from anonipy.utils.language_detector import LanguageDetector
language_detector = LanguageDetector()
language = language_detector(original_text)
Prepare the entity extractor and extract the personal infomation from the original text:
from anonipy.anonymize.extractors import NERExtractor
# define the labels to be extracted and anonymized
labels = [
{"label": "name", "type": "string"},
{"label": "social security number", "type": "custom"},
{"label": "date of birth", "type": "date"},
{"label": "date", "type": "date"},
]
# initialize the NER extractor for the language and labels
extractor = NERExtractor(labels, lang=language, score_th=0.5)
# extract the entities from the original text
doc, entities = extractor(original_text)
# display the entities in the original text
extractor.display(doc)
Use generators to create substitutes for the entities:
from anonipy.anonymize.generators import (
LLMLabelGenerator,
DateGenerator,
NumberGenerator,
)
# initialize the generators
llm_generator = LLMLabelGenerator()
date_generator = DateGenerator()
number_generator = NumberGenerator()
# prepare the anonymization mapping
def anonymization_mapping(text, entity):
if entity.type == "string":
return llm_generator.generate(entity, temperature=0.7)
if entity.label == "date":
return date_generator.generate(entity, output_gen="MIDDLE_OF_THE_MONTH")
if entity.label == "date of birth":
return date_generator.generate(entity, output_gen="MIDDLE_OF_THE_YEAR")
if entity.label == "social security number":
return number_generator.generate(entity)
return "[REDACTED]"
Anonymize the text using the anonymization mapping:
from anonipy.anonymize.strategies import PseudonymizationStrategy
# initialize the pseudonymization strategy
pseudo_strategy = PseudonymizationStrategy(mapping=anonymization_mapping)
# anonymize the original text
anonymized_text, replacements = pseudo_strategy.anonymize(original_text, entities)
Anonipy is developed by the Department for Artificial Intelligence at the Jozef Stefan Institute, and other contributors.
The project has received funding from the European Union's Horizon Europe research and innovation programme under Grant Agreement No 101080288 (PREPARE).
FAQs
The data anonymization package
We found that anonipy demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.