Email Typo Fixer

A Python library to automatically detect and fix common typos in email addresses using intelligent algorithms and domain knowledge.
Features
- Email Normalization: Lowercases, strips, and removes invalid characters
- Extension Validation: Validates and corrects TLDs using the official PublicSuffixList (parses
.dat
file directly)
- Smart Typo Detection: Uses Levenshtein distance to detect and correct TLD and domain name typos
- Domain Correction: Fixes common domain typos (e.g.,
gamil.com
→ gmail.com
)
- Configurable: Custom typo dictionary and distance thresholds
- Logging Support: Built-in logging for debugging and monitoring
Installation
pip install email-typo-fixer
Quick Start
from email_typo_fixer import normalize_email, EmailTypoFixer
corrected_email = normalize_email("user@gamil.com")
print(corrected_email)
fixer = EmailTypoFixer(max_distance=1)
corrected_email = fixer.normalize("user@yaho.com")
print(corrected_email)
Limitations
TLD '.co' False Positives
By default, the library may correct emails ending in .co
(such as user@example.co
) to .com
if the Levenshtein distance is within the allowed threshold. This can lead to false positives, especially for valid .co
domains (e.g., Colombian domains or legitimate .co
TLDs).
How to control this behavior:
- The
normalize
method and the normalize_email
function accept an optional parameter fix_tld_co: bool
(default: True
).
- If you want to prevent
.co
domains from being auto-corrected to .com
, call:
from email_typo_fixer import normalize_email
normalize_email("user@example.co", fix_tld_co=False)
Or, with the class:
fixer = EmailTypoFixer()
fixer.normalize("user@example.co", fix_tld_co=False)
This gives you control to avoid unwanted corrections for .co
domains.
Usage Examples
Basic Email Correction
from email_typo_fixer import normalize_email
normalize_email("john.doe@gamil.com")
normalize_email("jane@yaho.com")
normalize_email("user@outlok.com")
normalize_email("test@hotmal.com")
normalize_email("user@example.co")
normalize_email("user@site.rog")
Robust Suffix Handling
This library parses the official public_suffix_list.dat
file at runtime, ensuring all TLDs and public suffixes are always up to date. No hardcoded suffixes are used.
Advanced Usage with Custom Configuration
from email_typo_fixer import EmailTypoFixer
import logging
logger = logging.getLogger("email_fixer")
logger.setLevel(logging.INFO)
custom_typos = {
'companytypo': 'company',
'orgtypo': 'org',
}
fixer = EmailTypoFixer(
max_distance=2,
typo_domains=custom_typos,
logger=logger
)
corrected = fixer.normalize("user@companytypo.com")
print(corrected)
Email Validation and Normalization
from email_typo_fixer import EmailTypoFixer
fixer = EmailTypoFixer()
try:
email = fixer.normalize(" USER@EXAMPLE.COM ")
print(email)
email = fixer.normalize("us*er@exam!ple.com")
print(email)
except ValueError as e:
print(f"Invalid email: {e}")
API Reference
normalize_email(email: str) -> str
Simple function interface for email normalization.
Parameters:
email
(str): The email address to normalize
Returns:
str
: The corrected and normalized email address
Raises:
ValueError
: If the email cannot be fixed or is invalid
EmailTypoFixer
Main class for email typo correction with customizable options.
__init__(max_distance=1, typo_domains=None, logger=None)
Parameters:
max_distance
(int): Maximum Levenshtein distance for extension corrections (default: 1)
typo_domains
(dict): Custom dictionary of domain typos to corrections
logger
(logging.Logger): Custom logger instance
normalize(email: str) -> str
Normalize and fix typos in an email address.
Parameters:
email
(str): The email address to normalize
Returns:
str
: The corrected and normalized email address
Raises:
ValueError
: If the email cannot be fixed or is invalid
Default Typo Corrections
The library includes built-in corrections for common email provider typos:
gamil | gmail |
gmial | gmail |
gnail | gmail |
gmaill | gmail |
yaho | yahoo |
yahho | yahoo |
outlok | outlook |
outllok | outlook |
outlokk | outlook |
hotmal | hotmail |
hotmial | hotmail |
homtail | hotmail |
hotmaill | hotmail |
Error Handling
The library raises ValueError
exceptions for emails that cannot be corrected:
from email_typo_fixer import normalize_email
try:
normalize_email("invalid.email")
except ValueError as e:
print(f"Cannot fix email: {e}")
try:
normalize_email("user@")
except ValueError as e:
print(f"Cannot fix email: {e}")
Requirements
- Python 3.10+
- RapidFuzz >= 3.13.0
- publicsuffixlist >= 1.0.2
Development
Setting up for Development
git clone https://github.com/yourusername/email-typo-fixer.git
cd email-typo-fixer
curl -sSL https://install.python-poetry.org | python3 -
poetry install
poetry shell
Running Tests
poetry run pytest
poetry run pytest -v
poetry run pytest tests/test_email_typo_fixer.py
Code Quality
poetry run flake8 email_typo_fixer tests
poetry run mypy email_typo_fixer
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
)
- Commit your changes (
git commit -m 'Add some AmazingFeature'
)
- Push to the branch (
git push origin feature/AmazingFeature
)
- Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Uses the Levenshtein and RapidFuzz libraries for string distance calculations
- Uses publicsuffixlist for TLD (Top Level Domain) validation
- Inspired by various email validation libraries in the Python ecosystem