
Security News
Meet Socket at Black Hat and DEF CON 2025 in Las Vegas
Meet Socket at Black Hat & DEF CON 2025 for 1:1s, insider security talks at Allegiant Stadium, and a private dinner with top minds in software supply chain security.
A Fast and Accurate SpellCorrection using Sound and Edit-distance based Correction available in English and Hindi language.
What is it • Installation • Getting Started
Spello is a spellcorrection model built with combination of two models, Phoneme and Symspell Phoneme Model uses Soundex algo in background and suggests correct spellings using phonetic concepts to identify similar sounding words. On the other hand, Symspell Model uses concept of edit-distance in order to suggest correct spellings. Spello get's you best of both, taking into consideration context of the word as well.
Currently, this module is available for English(en) and Hindi(hi).
$ pip install spello
You can either train a new model from scratch or use pre-trained model. Alternatively you can also train model for your domain and use that on priority while use pre-trained model as a fallback
Initialise the model for one of the suppored languages.
>>> from spello.model import SpellCorrectionModel
>>> sp = SpellCorrectionModel(language='en')
You can choose to train model by providing data in one of the following format
Training providing list of sentences
>>> sp.train(['I want to play cricket', 'this is a text corpus'])
Training providing words counter
>>> sp.train({'i': 2, 'want': 1, 'play': 1, 'cricket': 10, 'mumbai': 5})
List of text is a recommended type for training data as here model also tries to learn context in which words are appearing, which further help to find best possible suggestion in case more than one suggestions are suggested by symspell or phoneme model
>>> sp.spell_correct('i wnt to plai kricket')
{'original_text': 'i wnt to plai kricket',
'spell_corrected_text': 'i want to play cricket',
'correction_dict': {'wnt': 'want', 'plai': 'play', 'kricket': 'cricket'}
}
Call the save method to save the trained model at given model dir
>>> sp.save(model_save_dir='/home/ubuntu/')
'/home/ubuntu/model.pkl' # saved model path
Load the trained model from saved path, First initialise the model and call the load method
>>> from spello.model import SpellCorrectionModel
>>> sp = SpellCorrectionModel(language='en')
>>> sp.load('/home/ubuntu/model.pkl')
Here, you are also provided to customize various configuration of the model like
>>> sp.config.min_length_for_spellcorrection = 4 # default is 3
>>> sp.config.max_length_for_spellcorrection = 12 # default is 15
>>> sp.config.symspell_allowed_distance_map = {2:0, 3: 1, 4: 2, 5: 3, 6: 3, 7: 4, 8: 4, 9:5, 10:5, 11:5, 12:5, 13: 6, 14: 6, 15: 6, 16: 6, 17: 6, 18: 6, 19: 6, 20: 6}
# above dict signifies max edit distance possible for word of length 6 is 3, for length 7 is 4 and so on..
To reset to default config
>>> sp.set_default_config()
there are many more configurations which you can set, check this file for more details
We have trained a basic model on 30K news + 30k wikipedia sentences
Follow below steps to get started with these model
Download a pretrained model from below links
language | model | size | md5 hash |
---|---|---|---|
en | en.pkl.zip | 84M | ec55760a7e25846bafe90b0c9ce9b09f |
en | en_large.pkl.zip | 284M | 9a4f5069b2395c9d5a1e8b9929e0c0a9 |
hi | hi.pkl.zip | 75M | ad8681161932fdbb8b1368bb16b9644a |
hi | hi_large.pkl.zip | 341M | 0cc73068f88a73612e7dd84434ad61e6 |
Unzip the downloaded file
Init and Load the model by specifying path of unzipped file
>>> from spello.model import SpellCorrectionModel
>>> sp = SpellCorrectionModel(language='en')
>>> sp.load('/path/to/file/en.pkl')
>>> sp.spell_correct('i wnt to plei futbal')
{'original_text': 'i wnt to plei futbal',
'spell_corrected_text': 'i want to play football',
'correction_dict': {'wnt': 'want', 'plei': 'play', 'futbal': 'football'}
}
To train model for other languages, you can download data from here and follow training process.
This software uses the following open source packages:
This project follows the all-contributors specification. Contributions of any kind welcome!
Please read the contribution guidelines first.
One of the limitations of the current model is, it does not suggest corrections for any grammatical mistakes or for words in the vocabulary of the model. For example, in a sentence “I want to by Apple”, it will not suggest any correction for “by” as it is a valid English word but the correct replacement should be "buy".
In a future release, we will be adding features to suggest corrections for improper use word in a sentence.
If you use spello in a scientific publication, we would appreciate references to the following BibTex entry:
@misc{haptik2020spello,
title={spello},
author={Srivastava Aman, Reddy SL Ruthvik },
howpublished={\url{https://github.com/hellohaptik/spello}},
year={2020}
}
FAQs
Spello: Fast and Smart Spell Correction
We found that spello demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Meet Socket at Black Hat & DEF CON 2025 for 1:1s, insider security talks at Allegiant Stadium, and a private dinner with top minds in software supply chain security.
Security News
CAI is a new open source AI framework that automates penetration testing tasks like scanning and exploitation up to 3,600× faster than humans.
Security News
Deno 2.4 brings back bundling, improves dependency updates and telemetry, and makes the runtime more practical for real-world JavaScript projects.