
Product
Introducing Scala and Kotlin Support in Socket
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.
A tool for obfuscating text by manipulating token IDs while preserving token count and structure
A tool for obfuscating text by manipulating token IDs while preserving token count and structure. Originally developed for benchmarking LLM inference performance and prefix caching behavior by generating test data that maintains token patterns but with obscured text.
This project provides a system for obfuscating text by applying a shift to token IDs. The obfuscation is reversible and preserves the token count, making it useful for:
Example of obfuscated text patterns:
Original: The quick brown fox jumps
Obfuscated: eng($_ ét rl manga
Original: The quick brown fox runs
Obfuscated: eng($_ ét rl Android
Note how the common prefix "The quick brown" is obfuscated to "eng($_ ét" in both cases, preserving the pattern.
pip install llm-obfuscator
git clone https://github.com/yourusername/llm-obfusicator.git
cd llm-obfusicator
pip install -r requirements.txt
pip install -e .
The package provides a command-line interface for easy use:
# Tokenize text
llm-obfuscator tokenize gpt-4 "Hello, world!"
# Obfuscate text
llm-obfuscator obfuscate gpt-4 "Hello, world!"
# Obfuscate text with a fixed shift
llm-obfuscator obfuscate gpt-4 "Hello, world!" --shift 42
from llm_obfuscator import obfuscate_text, tokenize_text
# Obfuscate text using a specific model's tokenizer
obfuscated = obfuscate_text("gpt-4", "Hello, world!")
print(obfuscated)
# Use a fixed shift value for deterministic results
obfuscated = obfuscate_text("gpt-4", "Hello, world!", shift=42)
print(obfuscated)
# Tokenize text
tokens = tokenize_text("gpt-4", "Hello, world!")
print(tokens)
The system supports both OpenAI and HuggingFace tokenizers:
gpt-4
, gpt-3.5-turbo
, cl100k_base
, etc.gpt2
, bert-base-uncased
, etc.The project includes several test suites to validate the obfuscation system:
The easiest way to run all tests is to use the provided shell script:
# Make the script executable (if needed)
chmod +x run_all_tests.sh
# Run all tests
./run_all_tests.sh
This will run all test files in sequence, including unit tests and specialized test scripts.
# Run all tests
python -m pytest tests/
# Run specific test file
python -m pytest tests/test.py
The project includes specialized test scripts for different aspects of the obfuscation system:
# Test with real-world examples
python tests/test_real_world.py
# Test obfuscation demonstration
python tests/test_obfuscation.py
# Test mathematical properties
python tests/test_mapping_properties.py
The obfuscation system has been validated to ensure:
The obfuscation process works as follows:
The shift operation is performed modulo the vocabulary size (typically 50,000) to ensure all tokens remain within the valid vocabulary range.
Note: The token count preservation has a known margin of error of up to 8% for obscured texts. We are working on improving this.
FAQs
A tool for obfuscating text by manipulating token IDs while preserving token count and structure
We found that llm-obfuscator demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.
Application Security
/Security News
Socket CEO Feross Aboukhadijeh and a16z partner Joel de la Garza discuss vibe coding, AI-driven software development, and how the rise of LLMs, despite their risks, still points toward a more secure and innovative future.
Research
/Security News
Threat actors hijacked Toptal’s GitHub org, publishing npm packages with malicious payloads that steal tokens and attempt to wipe victim systems.