
Product
Socket Now Protects the Chrome Extension Ecosystem
Socket is launching experimental protection for Chrome extensions, scanning for malware and risky permissions to prevent silent supply chain attacks.
A comprehensive, enterprise-grade hybrid pipeline for gaming chat toxicity detection. This intelligent agent provides a robust, scalable solution for real-time content moderation with production-ready features including zero-tier word filtering, multi-stage ML pipeline, comprehensive error handling, and performance monitoring.
Powered by Hugging Face Model: yehort/distilbert-gaming-chat-toxicity-en
Metric | Value |
---|---|
Overall Accuracy | 97.5% |
Clean Messages | 100.0% accuracy |
Toxic Messages | 100.0% accuracy |
Average Processing Time | <50ms |
Zero-tier Filter Hits | 100% of explicit toxic words |
Pipeline Efficiency | 4-stage confidence-based routing |
Message Input
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Zero-tier Word Filter (Fastest) โ
โ โข 53 toxic word categories โ
โ โข Obfuscation detection โ
โ โข <1ms processing time โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ (if not caught)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Embedding Classifier โ
โ โข SBERT + RandomForest โ
โ โข High confidence threshold (0.9) โ
โ โข ~10ms processing time โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ (if uncertain)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Fine-tuned DistilBERT โ
โ โข Gaming-specific model โ
โ โข Medium confidence threshold (0.7) โ
โ โข ~50ms processing time โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ (if uncertain)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RAG Enhancement โ
โ โข Similar example retrieval โ
โ โข Context-aware classification โ
โ โข Ensemble with fine-tuned model โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
Structured Result Output
pip install toxic_detection
git clone https://github.com/Yegmina/toxic-content-detection-agent.git
cd toxic-content-detection-agent
pip install -e .
git clone https://github.com/Yegmina/toxic-content-detection-agent.git
cd toxic-content-detection-agent
pip install -r requirements.txt
# The fine-tuned model is automatically downloaded from Hugging Face
# Model: yehort/distilbert-gaming-chat-toxicity-en
# No manual download required - the system handles this automatically
python simple_example.py
toxic_validation_agent/
โโโ message_validator.py # Main validation class
โโโ toxicity_words.json # 53 toxic word categories
โโโ config.json # Production configuration
โโโ simple_example.py # Basic usage example
โโโ test_comprehensive.py # Comprehensive test suite
โโโ requirements.txt # Python dependencies
โโโ README.md # This documentation
โโโ toxic_validation.log # Log file (auto-generated)
โโโ model/ # Fine-tuned DistilBERT model
โโโ config.json
โโโ pytorch_model.bin
โโโ tokenizer.json
โโโ ...
from toxic_validation_agent import Message_Validation
# Initialize the validator
validator = Message_Validation()
# Validate a message
result = validator.validate_message("KYS")
print(f"Result: {result.result_code} ({result.result_text})")
# Output: Result: 1 (toxic)
from toxic_validation_agent import Message_Validation, ValidationResult
# Initialize with production configuration
validator = Message_Validation(
model_path="yehort/distilbert-gaming-chat-toxicity-en",
enable_logging=True,
enable_metrics=True,
max_input_length=512
)
# Validate with detailed results
result = validator.validate_message("fucking reported axe")
print(f"Toxic: {result.is_toxic}")
print(f"Toxicity: {result.toxicity:.3f}")
print(f"Processing time: {result.processing_time_ms:.2f}ms")
print(f"Pipeline stage: {result.pipeline_stage}")
The package includes a command-line interface for easy usage:
# Check a single message
toxic-validation "KYS"
# Check multiple messages from file
toxic-validation --file messages.txt
# Get detailed output
toxic-validation --detailed "fucking reported"
# Output in JSON format
toxic-validation --json "test message"
# Health check
toxic-validation --health-check
Example CLI Output:
๐ซ TOXIC: KYS
๐ Confidence: 0.994
๐ง Pipeline stage: word_filter
Message_Validation(
model_path: str = "model", # Path to fine-tuned model
config_path: Optional[str] = None, # Configuration file path
enable_logging: bool = True, # Enable detailed logging
enable_metrics: bool = True, # Enable performance tracking
max_input_length: int = 512, # Maximum input length
confidence_thresholds: Optional[Dict] = None # Custom thresholds
)
validate_message(message: str) -> ValidationResult
Comprehensive message validation with structured results.
result = validator.validate_message("test message")
# Access structured results
print(result.is_toxic) # bool: True/False
print(result.confidence) # float: 0.0-1.0
print(result.result_code) # int: -1 (clean), 0 (unclear), 1 (toxic)
print(result.result_text) # str: "clean", "unclear", "toxic"
print(result.processing_time_ms) # float: Processing time in milliseconds
print(result.pipeline_stage) # str: "word_filter", "embedding", "finetuned", "rag"
print(result.error_message) # Optional[str]: Error details if any
print(result.metadata) # Optional[Dict]: Additional information
isToxicHybrid(message: str) -> int
Legacy method returning simple integer result.
result = validator.isToxicHybrid("test message")
# Returns: -1 (clean), 0 (unclear), 1 (toxic)
get_detailed_prediction(message: str) -> Dict
Get detailed prediction information for debugging and analysis.
details = validator.get_detailed_prediction("test message")
# Access detailed information
print(details['embedding_confidence']) # Embedding classifier confidence
print(details['finetuned_confidence']) # Fine-tuned model confidence
print(details['pipeline_stage']) # Which pipeline stage was used
print(details['word_filter_detected']) # Whether word filter caught it
print(details['rag_info']) # RAG information if used
print(details['timestamp']) # Prediction timestamp
health_check() -> Dict
Perform comprehensive health check on all components.
health = validator.health_check()
print(health['status']) # "healthy" or "unhealthy"
print(health['initialized']) # bool: Whether system is ready
print(health['device']) # str: CPU/GPU being used
print(health['components']) # Dict: Status of each component
Example Output:
{
"status": "healthy",
"initialized": true,
"device": "cuda",
"components": {
"models": {
"tokenizer": true,
"model": true,
"sbert": true,
"embedding_classifier": true
},
"knowledge_base": {
"loaded": true,
"size": 102,
"embeddings_ready": true
},
"toxic_words": {
"loaded": true,
"categories": 53
},
"metrics": {
"enabled": true
},
"prediction": {
"working": true
}
}
}
get_performance_metrics() -> PerformanceMetrics
Get real-time performance statistics.
metrics = validator.get_performance_metrics()
print(metrics.total_requests) # Total messages processed
print(metrics.successful_requests) # Successful validations
print(metrics.failed_requests) # Failed validations
print(metrics.average_processing_time_ms) # Average processing time
print(metrics.word_filter_hits) # Zero-tier filter usage
print(metrics.embedding_hits) # Embedding classifier usage
print(metrics.finetuned_hits) # Fine-tuned model usage
print(metrics.rag_hits) # RAG enhancement usage
reset_metrics() -> None
Reset performance metrics to zero.
validator.reset_metrics()
The zero-tier filter provides ultra-fast detection of obvious toxic content with comprehensive obfuscation support.
Pattern | Examples |
---|---|
Asterisk replacement | fck, sht, btch, cnt |
Hyphen replacement | f-ck, sh-t, b-tch, c-nt |
Number replacement | f1ck, sh1t, b1tch, c1nt |
Exclamation replacement | f!ck, sh!t, b!tch, c!nt |
Multiple asterisks | fk, sht, bch, ct |
The filter includes 53 categories of toxic words:
config.json
){
"confidence_thresholds": {
"embedding_high": 0.9, // High confidence for embedding classifier
"finetuned_low": 0.3, // Lower threshold for fine-tuned model
"finetuned_high": 0.7, // Upper threshold for fine-tuned model
"ensemble": 0.7 // Threshold for ensemble predictions
},
"max_input_length": 512, // Maximum input text length
"rag_top_k": 3, // Number of similar examples for RAG
"ensemble_weights": {
"base": 0.6, // Weight for fine-tuned model
"rag": 0.4 // Weight for RAG enhancement
},
"pipeline_enabled": {
"word_filter": true, // Enable zero-tier filter
"embedding_classifier": true, // Enable embedding classifier
"finetuned": true, // Enable fine-tuned model
"rag": true // Enable RAG enhancement
}
}
# Initialize with custom settings
validator = Message_Validation(
confidence_thresholds={
'embedding_high': 0.85,
'finetuned_low': 0.25,
'finetuned_high': 0.75
},
max_input_length=256
)
simple_example.py
)Run the basic example to verify installation:
python simple_example.py
Expected Output:
๐ฏ Simple Toxic Message Validation Example
==================================================
Initializing validator...
๐ฑ Using device: cuda
๐ Loading toxic words dictionary...
โ
Loaded 53 toxic word categories
๐ฅ Loading models...
โ
DistilBERT loaded successfully
โ
SBERT loaded successfully
โ
Embedding classifier initialized
๐ Loading knowledge base...
โ
Knowledge base: 102 examples
๐ฏ Training embedding classifier...
โ
Embedding classifier trained successfully
โ
Message Validation Bot initialized successfully!
๐ Testing Messages:
------------------------------------------------------------
โ
'COMMEND ME TY'
Expected: -1 (CLEAN)
Got: -1 (CLEAN)
Confidence: 0.996
Processing time: 122.24ms
Pipeline stage: finetuned
Note: Clean - positive gaming
----------------------------------------
โ
'WHAT THE ACTUAL FUCK'
Expected: 1 (TOXIC)
Got: 1 (TOXIC)
Confidence: 0.997
Processing time: 0.18ms
Pipeline stage: word_filter
Note: Toxic - explicit language
----------------------------------------
...
๐ Summary:
Total tests: 10
Correct: 7
Accuracy: 70.0%
๐ฅ Health Check:
Status: healthy
Initialized: True
Device: cuda
๐ Performance Metrics:
Total requests: 21
Successful: 10
Failed: 0
Average processing time: 58.60ms
Word filter hits: 4
Embedding hits: 2
Fine-tuned hits: 15
RAG hits: 0
โ
Example completed successfully!
test_comprehensive.py
)Run the comprehensive test suite:
python test_comprehensive.py
Expected Output:
๐ฏ Comprehensive Toxic Message Validation Test
============================================================
Initializing validator...
โ
Initialization completed in 2.34 seconds
๐ Testing 66 Messages:
--------------------------------------------------------------------------------
โ
'COMMEND ME TY'
Expected: -1 (CLEAN)
Got: -1 (CLEAN)
Note: Clean - positive gaming
----------------------------------------
โ
'good game everyone'
Expected: -1 (CLEAN)
Got: -1 (CLEAN)
Note: Clean - sportsmanship
----------------------------------------
โ
'WHAT THE ACTUAL FUCK'
Expected: 1 (TOXIC)
Got: 1 (TOXIC)
Note: Toxic - explicit language
----------------------------------------
...
๐ Test Summary:
Total tests: 66
Correct: 64
Accuracy: 97.0%
๐ Breakdown by Category:
Clean tests: 20
Toxic tests: 41
Unclear tests: 5
๐ฏ Category Accuracy:
Clean: 100.0% (20/20)
Toxic: 100.0% (41/41)
Unclear: 60.0% (3/5)
๐ฅ Health Check:
Status: healthy
Initialized: True
Device: cuda
๐ Performance Metrics:
Total requests: 66
Successful: 66
Failed: 0
Average processing time: 45.23ms
Word filter hits: 15
Embedding hits: 8
Fine-tuned hits: 43
RAG hits: 0
๐ฌ Detailed Analysis Example:
------------------------------------------------------------
Message: maybe you should try a different strategy
Final Result: -1 (clean)
Embedding: 0 (confidence: 0.500)
Fine-tuned: 0 (confidence: 0.996)
Pipeline Stage: finetuned
Processing Time: 83.34ms
โ
Comprehensive test completed!
๐ Test Results Summary:
Overall Accuracy: 97.0%
Clean Accuracy: 100.0%
Toxic Accuracy: 100.0%
Unclear Accuracy: 60.0%
The bot includes comprehensive error handling with graceful degradation:
from message_validator import (
ToxicValidationError,
ModelLoadError,
InputValidationError
)
try:
result = validator.validate_message("test")
except InputValidationError as e:
print(f"Input error: {e}")
except ModelLoadError as e:
print(f"Model error: {e}")
except ToxicValidationError as e:
print(f"Validation error: {e}")
Failure Scenario | Fallback Behavior |
---|---|
Model loading fails | Uses fallback methods |
Word filter fails | Continues with ML pipeline |
RAG fails | Uses fine-tuned model only |
Input validation fails | Returns error result |
GPU unavailable | Falls back to CPU |
# When an error occurs, a safe result is returned
result = ValidationResult(
is_toxic=False,
confidence=0.0,
result_code=0,
result_text='error',
processing_time_ms=12.34,
pipeline_stage='error',
error_message='Detailed error description'
)
Logs are automatically written to:
import logging
# Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
# Default: INFO level
# Get real-time metrics
metrics = validator.get_performance_metrics()
print(f"Average processing time: {metrics.average_processing_time_ms:.2f}ms")
print(f"Word filter efficiency: {metrics.word_filter_hits}/{metrics.total_requests}")
print(f"Success rate: {metrics.successful_requests}/{metrics.total_requests}")
# Check system health
health = validator.health_check()
if health['status'] == 'healthy':
print("System is operational")
else:
print(f"System issues: {health['error']}")
messages = ["message1", "message2", "message3"]
results = []
for message in messages:
result = validator.validate_message(message)
results.append(result)
# Analyze batch results
toxic_count = sum(1 for r in results if r.is_toxic)
avg_confidence = sum(r.confidence for r in results) / len(results)
avg_processing_time = sum(r.processing_time_ms for r in results) / len(results)
print(f"Toxic messages: {toxic_count}/{len(results)}")
print(f"Average confidence: {avg_confidence:.3f}")
print(f"Average processing time: {avg_processing_time:.2f}ms")
# Add custom toxic words
validator.toxic_words["custom_word"] = ["custom", "cust0m", "c*stom"]
# Remove existing words
del validator.toxic_words["some_word"]
# Disable specific pipeline stages
validator.config['pipeline_enabled']['rag'] = False
validator.config['pipeline_enabled']['embedding_classifier'] = False
# Adjust confidence thresholds
validator.config['confidence_thresholds']['embedding_high'] = 0.85
Create a custom my_config.json
:
{
"confidence_thresholds": {
"embedding_high": 0.85,
"finetuned_low": 0.25,
"finetuned_high": 0.75,
"ensemble": 0.65
},
"max_input_length": 256,
"rag_top_k": 5,
"ensemble_weights": {
"base": 0.7,
"rag": 0.3
}
}
Use it:
validator = Message_Validation(config_path="my_config.json")
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV TOXIC_VALIDATION_LOG_LEVEL=INFO
CMD ["python", "app.py"]
export TOXIC_VALIDATION_MODEL_PATH="/app/model"
export TOXIC_VALIDATION_CONFIG_PATH="/app/config.json"
export TOXIC_VALIDATION_LOG_LEVEL="INFO"
export TOXIC_VALIDATION_MAX_INPUT_LENGTH="512"
For high-throughput applications:
# Multiple validator instances
validators = [
Message_Validation() for _ in range(4)
]
# Round-robin distribution
import itertools
validator_cycle = itertools.cycle(validators)
def validate_message(message):
validator = next(validator_cycle)
return validator.validate_message(message)
import redis
import json
redis_client = redis.Redis(host='localhost', port=6379, db=0)
def validate_with_cache(message):
# Check cache first
cache_key = f"toxic_validation:{hash(message)}"
cached_result = redis_client.get(cache_key)
if cached_result:
return json.loads(cached_result)
# Validate and cache
result = validator.validate_message(message)
redis_client.setex(cache_key, 3600, json.dumps(result.__dict__))
return result
# Real-time chat moderation
def moderate_chat_message(message, user_id):
result = validator.validate_message(message)
if result.is_toxic:
# Take action based on severity
if result.confidence > 0.9:
ban_user(user_id)
elif result.confidence > 0.7:
warn_user(user_id)
else:
flag_for_review(message, user_id)
return result
# Filter user-generated content
def filter_content(content):
result = validator.validate_message(content)
if result.is_toxic:
return {
'approved': False,
'reason': 'Toxic content detected',
'confidence': result.confidence,
'suggestion': 'Please revise your message'
}
return {'approved': True}
# Analyze toxicity patterns
def analyze_toxicity_patterns(messages):
results = []
for message in messages:
result = validator.validate_message(message)
results.append({
'message': message,
'is_toxic': result.is_toxic,
'confidence': result.confidence,
'pipeline_stage': result.pipeline_stage,
'processing_time': result.processing_time_ms
})
# Analyze patterns
toxic_messages = [r for r in results if r['is_toxic']]
avg_confidence = sum(r['confidence'] for r in toxic_messages) / len(toxic_messages)
return {
'total_messages': len(messages),
'toxic_count': len(toxic_messages),
'toxicity_rate': len(toxic_messages) / len(messages),
'average_confidence': avg_confidence
}
Problem: ModelLoadError: Failed to load models
Solutions:
# Check model folder exists
ls -la model/
# Verify model files
python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('model')"
# Reinstall dependencies
pip install --upgrade transformers torch
Problem: Out of memory errors
Solutions:
# Use CPU instead of GPU
validator = Message_Validation()
# Force CPU usage
import torch
torch.cuda.empty_cache()
# Reduce batch size
validator.config['model_settings']['batch_size'] = 1
Problem: Slow processing times
Solutions:
# Enable GPU acceleration
validator = Message_Validation()
# Adjust confidence thresholds
validator.config['confidence_thresholds']['embedding_high'] = 0.8
# Disable RAG for speed
validator.config['pipeline_enabled']['rag'] = False
Problem: Unicode errors in Windows console
Solutions:
# The system automatically handles this, but you can also:
import sys
import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.detach())
# Check GPU availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")
print(f"Current device: {torch.cuda.current_device()}")
# Process multiple messages efficiently
messages = ["msg1", "msg2", "msg3", "msg4", "msg5"]
results = [validator.validate_message(msg) for msg in messages]
# Cache frequently checked messages
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_validation(message):
return validator.validate_message(message)
Category | Test Cases | Correct | Accuracy |
---|---|---|---|
Clean Messages | 20 | 20 | 100.0% |
Toxic Messages | 41 | 41 | 100.0% |
Unclear Messages | 5 | 3 | 60.0% |
Overall | 66 | 64 | 97.0% |
Pipeline Stage | Average Time | Success Rate |
---|---|---|
Word Filter | <1ms | 100% |
Embedding | ~10ms | 95% |
Fine-tuned | ~50ms | 98% |
RAG | ~100ms | 90% |
Resource | Usage |
---|---|
Memory | ~2GB (with GPU) |
CPU | 1-2 cores |
GPU | 2-4GB VRAM |
Disk | ~500MB (models) |
git checkout -b feature/new-feature
python test_comprehensive.py
# Clone repository
git clone <repository-url>
cd toxic_validation_agent
# Install development dependencies
pip install -r requirements.txt
pip install pytest pytest-cov
# Run tests
python -m pytest test_comprehensive.py -v
# Run with coverage
python -m pytest test_comprehensive.py --cov=message_validator
MIT License - see LICENSE file for details.
tail -f toxic_validation.log
python -c "from message_validator import Message_Validation; v = Message_Validation(); print(v.health_check())"
config.json
settingspython simple_example.py
Q: How accurate is the system? A: 97.5% overall accuracy, with 100% accuracy on clear clean and toxic messages.
Q: How fast is it? A: Average processing time is <50ms, with zero-tier filter completing in <1ms.
Q: Can I add custom toxic words?
A: Yes, modify toxicity_words.json
or add programmatically via validator.toxic_words
.
Q: Does it work on Windows? A: Yes, with automatic Unicode handling for console output.
Q: Can I use it without GPU? A: Yes, it automatically falls back to CPU if GPU is unavailable.
When reporting issues, please include:
python --version
python -c "import platform; print(platform.platform())"
config.json
validator.health_check()
FAQs
Intelligent AI Agent for Real-time Content Moderation with 97.5% accuracy
We found that toxic-detection demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.ย It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket is launching experimental protection for Chrome extensions, scanning for malware and risky permissions to prevent silent supply chain attacks.
Product
Add secure dependency scanning to Claude Desktop with Socket MCP, a one-click extension that keeps your coding conversations safe from malicious packages.
Product
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.