Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Cython wrapper on Hunspell Dictionary
This fork is based on https://github.com/MSeal/cython_hunspell and modified to reduce dependencies by removing caching and batch functionalities. Apart from Hunspell itself, there are no other third-party dependencies.
Additional, providing precompiled wheels for multiple platforms.
This repository provides a wrapper on Hunspell to be used natively in Python. The module uses cython to link between the C++ and Python code, with some additional features. There's very little Python overhead as all the heavy lifting is done on the C++ side of the module interface, which gives optimal performance.
For the simplest install simply run:
pip install chunspell
This will install the hunspell 1.7.2 C++ bindings on your behalf for your platform.
The library installs hunspell version 1.7.2. As new version of hunspell become available this library will provide new versions to match.
Spell checking & spell suggestions
Below are some simple examples for how to use the repository.
from hunspell import Hunspell
h = Hunspell()
You now have a usable hunspell object that can make basic queries for you.
h.spell('test') # True
It's a simple task to ask if a particular word is in the dictionary.
h.spell('correct') # True
h.spell('incorect') # False
This will only ever return True or False, and won't give suggestions about why it might be wrong. It also depends on your choice of dictionary.
If you want to get a suggestion from Hunspell, it can provide a corrected label given a basestring input.
h.suggest('incorect') # ('incorrect', 'correction', corrector', 'correct', 'injector')
The suggestions are in sorted order, where the lower the index the closer to the input string.
h.suffix_suggest('do') # ('doing', 'doth', 'doer', 'doings', 'doers', 'doest')
The module can also stem words, providing the stems for pluralization and other inflections.
h.stem('testers') # ('tester', 'test')
h.stem('saves') # ('save',)
Like stemming but return morphological analysis of the input instead.
h.analyze('permanently') # (' st:permanent fl:Y',)
You can also specify the language or dictionary you wish to use.
h = Hunspell('en_US') # Canadian English
By default you have the only en_US
dictionaries available.
However you can download your own and point Hunspell to your custom dictionaries.
h = Hunspell('en_GB-large', hunspell_data_dir='/custom/dicts/dir')
You can also add new dictionaries at runtime by calling the add_dic method.
h.add_dic(os.path.join(PATH_TO, 'special.dic'))
You can add individual words to a dictionary at runtime.
h.add('sillly')
Furthermore you can attach an affix to the word when doing this by providing a second argument
h.add('silllies', "is:plural")
Much like adding, you can remove words.
h.remove(word)
system_encoding='UTF-8'
in the Hunspell
constructor or set the environment variable HUNSPELL_PATH_ENCODING=UTF-8
. Then you must re-encode your hunspell_data_dir
in UTF-8 by passing that argument name to the Hunspell
constructor or setting the HUNSPELL_DATA
environment variable. This is a restriction of Hunspell / Windows operations.Author(s): Tim Rodriguez, Matthew Seal, cdhigh
MIT
FAQs
A wrapper on hunspell for use in Python
We found that chunspell demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.