
Research
PyPI Package Disguised as Instagram Growth Tool Harvests User Credentials
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
Welcome to Leven-Search, a library designed for efficient and fast searching of words within a specified Levenshtein distance.
This library is designed with Kaggle developers and researchers in mind as well as all others who deal with natural language processing, text analysis, and similar domains where the closeness of strings is a pivotal aspect.
Levenshtein distance measures the difference between two sequences. In the context of strings, it is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into another.
For example, the Levenshtein distance between "table" and "marble" is 2:
table
→ mable
(substitution of t
for `m')mable
→ marble
(insertion of r
)The library is designed with the following goals in mind:
Example performance of the library on a Brown corpus (only words larger than 2 characters) and a modern laptop:
Distance | Time per 1000 searches (in seconds) |
---|---|
0 | 0.0146 |
1 | 0.3933 |
1 (*) | 0.4154 |
2 | 7.9556 |
(*) with the per-letter cost granularity
To install the library, simply run:
pip install leven-search
First, import the library:
import leven_search as lev
Then, create a LevenSearch object:
searcher = lev.LevenSearch()
Next, add words to the searcher:
searcher.insert("hello")
searcher.insert("world")
Finally, search for words within a specified Levenshtein distance:
searcher.find_dist("mello", 1)
Result:
hello: ResultItem(word='hello', dist=1, updates=[m -> h])
The following example shows how to use the library to search for words within a Brown corpus:
import nltk
import leven_search as lev
# Download the Brown corpus
nltk.download('brown')
# Create a LevenSearch object
searcher = lev.LevenSearch()
for w in nltk.corpus.brown.words():
if len(w) > 2:
searcher.insert(w)
# Search for words within a Levenshtein distance
searcher.find_dist('komputer', 1)
Result:
computer: ResultItem(word='computer', dist=1, updates=[k -> c])
cost = lev.GranularEditCostConfig(default_cost=2, edit_costs=[lev.EditCost('k', 'c', 0.1)])
searcher.find_dist('komputer', 2, cost)
Result:
computer: ResultItem(word='computer', dist=0.1, updates=[k -> c])
searcher.find_dist('yomputer', 2, cost)
Result:
computer: ResultItem(word='computer', dist=2, updates=[y -> c])
searcher.find_dist('yomputer', 1, cost)
Result:
None
FAQs
Fast and flexible search in a dictionary using Levenshtein distance
We found that leven-search demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
Product
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
Security News
Research
Socket uncovered two npm packages that register hidden HTTP endpoints to delete all files on command.