Homoglyphs
Homoglyphs -- python library for getting homoglyphs and converting to ASCII.
Features
It's smarter version of confusable_homoglyphs:
- Autodect or manual choosing category (aliases from ISO 15924).
- Auto or manual load only needed alphabets in memory.
- Converting to ASCII.
- More configurable.
- More stable.
Installation
sudo pip install homoglyphs
Usage
Best way to explain something is show how it works. So, let's have a look on the real usage.
Importing:
import homoglyphs as hg
Languages
hg.Languages.detect('w')
hg.Languages.detect('т')
hg.Languages.detect('.')
hg.Languages.get_alphabet(['ru'])
hg.Languages.get_all()
Categories
Categories -- (aliases from ISO 15924).
hg.Categories.detect('w')
hg.Categories.detect('т')
hg.Categories.detect('.')
hg.Categories.get_alphabet(['CYRILLIC'])
hg.Categories.get_all()
Homoglyphs
Get homoglyphs:
hg.Homoglyphs().get_combinations('q')
Alphabet loading:
homoglyphs = hg.Homoglyphs(categories=('LATIN', 'COMMON', 'CYRILLIC'))
homoglyphs.get_combinations('гы')
homoglyphs = hg.Homoglyphs(languages={'ru', 'en'})
homoglyphs.get_combinations('гы')
homoglyphs = hg.Homoglyphs(alphabet='abc абс')
homoglyphs.get_combinations('с')
homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)
homoglyphs.get_combinations('гы')
You can combine categories
, languages
, alphabet
and any strategies as you want. The strategies specify how to handle any characters not already loaded:
STRATEGY_LOAD
: load category for this characterSTRATEGY_IGNORE
: add character to resultSTRATEGY_REMOVE
: remove character from result
Converting glyphs to ASCII chars
homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)
homoglyphs.to_ascii('ТЕСТ')
homoglyphs.to_ascii('ХР123.')
homoglyphs.to_ascii('лол')
homoglyphs = hg.Homoglyphs(
languages={'en'},
strategy=hg.STRATEGY_LOAD,
ascii_strategy=hg.STRATEGY_REMOVE,
)
homoglyphs.to_ascii('лол')
homoglyphs = hg.Homoglyphs(
languages={'en'},
strategy=hg.STRATEGY_LOAD,
ascii_strategy=hg.STRATEGY_REMOVE,
ascii_range=range(ord('a'), ord('z')),
)
homoglyphs.to_ascii('ХР123.')
homoglyphs.to_ascii('хр123.')