
Research
/Security News
Critical Vulnerability in NestJS Devtools: Localhost RCE via Sandbox Escape
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
A package for extracting keywords from large text very quickly (much faster than regex and the original flashtext package
pip install flashtext2
flashtext2
is an optimized version of the flashtext
library for fast keyword extraction and replacement.
Its orders of magnitude faster compared to regular expressions.
[A-Za-z0-9_]+
,
flashtext2 uses the Unicode Standard Annex #29 to split strings into tokens.
This ensures compatibility with all languages, not just Latin-based ones.from flashtext2 import KeywordProcessor
kp = KeywordProcessor(case_sensitive=False)
kp.add_keyword('Python')
kp.add_keyword('flashtext')
kp.add_keyword('program')
text = "I love programming in Python and using the flashtext library."
keywords_found = kp.extract_keywords(text)
print(keywords_found)
# Output: ['Python', 'flashtext']
keywords_found = kp.extract_keywords_with_span(text)
print(keywords_found)
# Output: [('Python', 22, 28), ('flashtext', 43, 52)]
from flashtext2 import KeywordProcessor
kp = KeywordProcessor(case_sensitive=False)
kp.add_keyword('Java', 'Python')
kp.add_keyword('regex', 'flashtext')
text = "I love programming in Java and using the regex library."
new_text = kp.replace_keywords(text)
print(new_text)
# Output: "I love programming in Python and using the flashtext library."
from flashtext2 import KeywordProcessor
text = 'abc aBc ABC'
kp = KeywordProcessor(case_sensitive=True)
kp.add_keyword('aBc')
print(kp.extract_keywords(text))
# Output: ['aBc']
kp = KeywordProcessor(case_sensitive=False)
kp.add_keyword('aBc')
print(kp.extract_keywords(text))
# Output: ['aBc', 'aBc', 'aBc']
Overlapping keywords (returns the longest sequence)
from flashtext2 import KeywordProcessor
kp = KeywordProcessor(case_sensitive=True)
kp.add_keyword('machine')
kp.add_keyword('machine learning')
text = "machine learning is a subset of artificial intelligence"
print(kp.extract_keywords(text))
# Output: ['machine learning']
Case folding
from flashtext2 import KeywordProcessor
kp = KeywordProcessor(case_sensitive=False)
kp.add_keywords_from_iter(["flour", "Maße", "ᾲ στο διάολο"])
text = "flour, MASSE, ὰι στο διάολο"
print(kp.extract_keywords(text))
# Output: ['flour', 'Maße', 'ᾲ στο διάολο']
Extracting keywords is usually 2.5-3x faster, and replacing them is about 10x.
There is still room to optimize the code and improve performance.
You can find the benchmarks here.
The words have on average 6 characters, and a sentence has 10k words, so the length is 60k.
Credit to Vikash Singh, the author of the original flashtext
package.
FAQs
A package for extracting keywords from large text very quickly (much faster than regex and the original flashtext package
We found that flashtext2 demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
Product
Customize license detection with Socket’s new license overlays: gain control, reduce noise, and handle edge cases with precision.
Product
Socket now supports Rust and Cargo, offering package search for all users and experimental SBOM generation for enterprise projects.