
Research
/Security News
Critical Vulnerability in NestJS Devtools: Localhost RCE via Sandbox Escape
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
A toolkit for content injection, obfuscation, scanning, and sanitization of various document formats. If you use this library, please cite: Castagnaro et al. 'The Hidden Threat in Plain Text: Attacking RAG Data Loaders' (2025).
PhantomText is a Python library designed for handling content injection, content obfuscation, file scanning, and file sanitization across various document formats. This toolkit provides a comprehensive set of tools to manage and secure document content effectively.
FileScanner
class that detects obfuscated and injected content.FileSanitizer
class.PhantomText supports the following document formats:
To install PhantomText, you can use pip:
pip install phantomtext
from phantomtext.content_injection import ContentInjector
injector = ContentInjector()
injector.inject_content('document.pdf', 'New Content')
from phantomtext.content_obfuscation import ContentObfuscator
obfuscator = ContentObfuscator()
# Basic obfuscation
obfuscated_content = obfuscator.obfuscate_content('Sensitive Information')
# Advanced obfuscation with specific techniques
content = "Sensitive info: email@example.com and phone 123-456-7890."
target = "email@example.com"
# Zero-width character obfuscation
obfuscated = obfuscator.obfuscate(content, target,
obfuscation_technique="zeroWidthCharacter",
modality="default",
file_format="html")
# Homoglyph character obfuscation
obfuscated = obfuscator.obfuscate(content, target,
obfuscation_technique="homoglyph",
file_format="pdf")
# Diacritical marks obfuscation
obfuscated = obfuscator.obfuscate(content, target,
obfuscation_technique="diacritical",
modality="heavy",
file_format="docx")
# Bidi/reordering character obfuscation
obfuscated = obfuscator.obfuscate(content, target,
obfuscation_technique="bidi",
modality="default",
file_format="html")
from phantomtext.injection.zerosize_injection import ZeroSizeInjection
from phantomtext.injection.transparent_injection import TransparentInjection
# Zero-size injection
injector = ZeroSizeInjection(modality="default", file_format="pdf")
injector.apply(input_document="document.pdf",
injection="Hidden content",
output_path="injected_document.pdf")
# Transparent injection
injector = TransparentInjection(modality="opacity-0", file_format="html")
injector.apply(input_document="document.html",
injection="Invisible text",
output_path="injected_document.html")
Attack Family | Attack Name | Variant | HTML | DOCX | |
---|---|---|---|---|---|
Obfuscation | diacritical_marks | default | ✅ | ✅ | ✅ |
heavy | ✅ | ✅ | ✅ | ||
Obfuscation | homoglyph_characters | default | ✅ | ✅ | ✅ |
Obfuscation | zero_width_characters | default | ✅ | ✅ | ✅ |
heavy | ✅ | ✅ | ✅ | ||
Obfuscation | bidi_reordering | default | ✅ | ✅ | ✅ |
heavy | ✅ | ✅ | ✅ |
Attack Family | Attack Name | Variant | HTML | DOCX | |
---|---|---|---|---|---|
Injection | zero_size | default | ✅ | ✅ | ✅ |
close-to-zero | ✅ | ❌ | ✅ | ||
Injection | transparent | default | ✅ | ✅ | ✅ |
opacity-0 | ✅ | ❌ | ✅ | ||
opacity-close-to-zero | ✅ | ❌ | ✅ | ||
vanish | ❌ | ✅ | ❌ | ||
Injection | camouflage | default | 🚧 | 🚧 | 🚧 |
Injection | out_of_bound | default | 🚧 | 🚧 | 🚧 |
Injection | metadata | default | 🚧 | 🚧 | 🚧 |
Legend:
from phantomtext.file_scanning import FileScanner
scanner = FileScanner()
# Scan a single file
result = scanner.scan_file('document.docx')
print(f"Malicious content found: {result['malicious_content_found']}")
print(f"Vulnerabilities: {result['vulnerabilities']}")
# Scan an entire directory
reports = scanner.scan_dir('./output')
for report in reports:
if report['malicious_content_found']:
print(f"⚠️ Issues found in {report['file_path']}")
for vulnerability in report['vulnerabilities']:
print(f" - {vulnerability}")
The FileScanner can detect the following obfuscation techniques:
from phantomtext.file_sanitization import FileSanitizer
sanitizer = FileSanitizer()
sanitizer.sanitize_file('malicious_file.txt')
If you use PhantomText in your research, please cite our paper:
@article{castagnaro2025hidden,
title={The Hidden Threat in Plain Text: Attacking RAG Data Loaders},
author={Castagnaro, Alberto and Salviati, Umberto and Conti, Mauro and Pajola, Luca and Pizzi, Simeone},
journal={arXiv preprint arXiv:2507.05093},
year={2025}
}
Contributions are welcome! Please feel free to submit a pull request or open an issue for any enhancements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for more details.
FAQs
A toolkit for content injection, obfuscation, scanning, and sanitization of various document formats. If you use this library, please cite: Castagnaro et al. 'The Hidden Threat in Plain Text: Attacking RAG Data Loaders' (2025).
We found that phantomtext demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
Product
Customize license detection with Socket’s new license overlays: gain control, reduce noise, and handle edge cases with precision.
Product
Socket now supports Rust and Cargo, offering package search for all users and experimental SBOM generation for enterprise projects.