
Security News
vlt Launches "reproduce": A New Tool Challenging the Limits of Package Provenance
vlt's new "reproduce" tool verifies npm packages against their source code, outperforming traditional provenance adoption in the JavaScript ecosystem.
presidio-image-redactor
Advanced tools
Please notice, this package is still in alpha and not production ready.
The Presidio Image Redactor is a Python based module for detecting and redacting PII text entities in images.
Use the following button to deploy presidio image redactor to your Azure subscription.
Process for standard images:
Process for DICOM files:
Pre-requisites:
Install Tesseract OCR by following the instructions on how to install it for your operating system.
For best performance, please use the most up-to-date version of Tesseract OCR. Presidio was tested with v5.2.0.
To get started with Presidio-image-redactor, run the following:
pip install presidio-image-redactor
Once Installed, run the following command to download the default spacy model needed for Presidio Analyzer:
python -m spacy download en_core_web_lg
The engine will receive 2 parameters:
from PIL import Image
from presidio_image_redactor import ImageRedactorEngine
# Get the image to redact using PIL lib (pillow)
image = Image.open("presidio-image-redactor/tests/integration/resources/ocr_test.png")
# Initialize the engine
engine = ImageRedactorEngine()
# Redact the image with pink color
redacted_image = engine.redact(image, (255, 192, 203))
# save the redacted image
redacted_image.save("new_image.png")
# uncomment to open the image for viewing
# redacted_image.show()
In folder presidio/presidio-image-redactor run:
docker-compose up -d
Receives an image and color fill (optional, default is black). Redact the image PII text and returns a new redacted image.
POST /redact
Payload:
Sent as multipart-form. Contains image file and data of the required color fill.
{
"data": "{'color_fill':'0,0,0'}"
}
Result:
200 OK
curl example:
# use ocr_test.png as the image to redact, and 255 as the color fill.
# out.png is the new redacted image received from the server.
curl -XPOST "http://localhost:3000/redact" -H "content-type: multipart/form-data" -F "image=@ocr_test.png" -F "data=\"{'color_fill':'255'}\"" > out.png
Python script example can be found under: /presidio/e2e-tests/tests/test_image_redactor.py
This module only redacts pixel data and does not scrub text PHI which may exist in the DICOM metadata.
We highly recommend using the DICOM image redactor engine to redact text from images before scrubbing metadata PHI. To redact sensitive information from metadata, consider using another package such as the Tools for Health Data Anonymization.
To redact burnt-in text PHI in DICOM images, see the below sample code:
import pydicom
from presidio_image_redactor import DicomImageRedactorEngine
# Set input and output paths
input_path = "path/to/your/dicom/file.dcm"
output_dir = "./output"
# Initialize the engine
engine = DicomImageRedactorEngine()
# Option 1: Redact from a loaded DICOM image
dicom_image = pydicom.dcmread(input_path)
redacted_dicom_image = engine.redact(dicom_image, fill="contrast")
# Option 2: Redact from a loaded DICOM image and return redacted regions
redacted_dicom_image, bboxes = engine.redact_and_return_bbox(dicom_image, fill="contrast")
# Option 3: Redact from DICOM file and save redacted regions as json file
engine.redact_from_file(input_path, output_dir, padding_width=25, fill="contrast", save_bboxes=True)
# Option 4: Redact from directory and save redacted regions as json files
ocr_kwargs = {"ocr_threshold": 50}
engine.redact_from_directory("path/to/your/dicom", output_dir, fill="background", save_bboxes=True, ocr_kwargs=ocr_kwargs)
See the example notebook for more details and visual confirmation of the output: docs/samples/python/example_dicom_image_redactor.ipynb.
If you are using a Windows machine, you may run into issues if file paths are too long. Unfortunatley, this is not rare when working with DICOM images that are often nested in directories with descriptive names.
To avoid errors where the code may not recognize a path as existing due to the length of the characters in the file path, please enable long paths on your system.
The DICOM data used for unit and integration testing for DicomImageRedactorEngine
are stored in this repository with permission from the original dataset owners. Please see the dataset information as follows:
Rutherford, M., Mun, S.K., Levine, B., Bennett, W.C., Smith, K., Farmer, P., Jarosz, J., Wagner, U., Farahani, K., Prior, F. (2021). A DICOM dataset for evaluation of medical image de-identification (Pseudo-PHI-DICOM-Data) [Data set]. The Cancer Imaging Archive. DOI: https://doi.org/10.7937/s17z-r072
FAQs
Presidio image redactor package
We found that presidio-image-redactor demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
vlt's new "reproduce" tool verifies npm packages against their source code, outperforming traditional provenance adoption in the JavaScript ecosystem.
Research
Security News
Socket researchers uncovered a malicious PyPI package exploiting Deezer’s API to enable coordinated music piracy through API abuse and C2 server control.
Research
The Socket Research Team discovered a malicious npm package, '@ton-wallet/create', stealing cryptocurrency wallet keys from developers and users in the TON ecosystem.