This package contains a function for efficiently representing a set of keywords as regex. This regex can be used to replace keywords in sentences or extract keywords
from sentences
Why use tregex?
- Pure Python, no other dependencies
- trex is fast, about 300 times faster than a regex union, and about 2.5 times faster than FlashText
- Plays well with others, can be integrated easily with pandas
Install trex
Use pip,
pip install tregex
Usage
import tregex as tx
pattern = tx.compile(['baby', 'bat', 'bad'])
hits = pattern.findall('The baby was scared by the bad bat.')
Why the name?
Naming is difficult, but as we had to call it something:
- trex: trie to regex
- trex: Tyrannosaurus rex, a large dinosaur species with small arms (rex meaning "king" in Latin)
Acknowledgments
This project is based on the following resources: