You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 7-8.RSVP
Socket
Socket
Sign inDemoInstall

pyfranc

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pyfranc

Text language detection basic on trigrams.


Maintainers
1

Readme

Pyfranc

Text language detection basic on trigrams. Support 414 language from franc-all

Install

This package is tested in Python 3.8, but should work on the whole 3rd revision of Python.

pip:

pip install pyfranc

Use

How library

from pyfranc import franc

franc.lang_detect('Alle menslike wesens word vry')[0][0] # 'afr'
franc.lang_detect('এটি একটি ভাষা একক IBM স্ক্রিপ্ট')[0][0]  # 'ben'
franc.lang_detect('Alle menneske er fødde til fridom')[0][0] # 'nno'
franc.lang_detect('')[0][0] # 'und'

# You can change what’s too short (default: 10):
franc.lang_detect('the')[0][0] # 'und'
franc.lang_detect('the', minlength=3)[0][0] # 'sco'

[0][0] has taken first value (iso code lang) in first element in output array.
whitelist
franc.lang_detect('Considerando ser essencial que os direitos humanos', whitelist = ['por', 'spa'])
# [['por', 1], ['spa', 0.6034146900423971]]
blacklist
franc.lang_detect('Considerando ser essencial que os direitos humanos', blacklist = ['src', 'glg'])
#[['por', 1],
# ['ina', 0.6211756617394293], 
# ['spa', 0.6034146900423971], 
# ['ast', 0.5628509224246592], 
# ['oci', 0.5583820327718574],
# ... 317 more items]

How CLI

usage: pyfranc_cli [-h] [-v] [-s STRING] [-t TOP] [-m MINLENGTH] [-w [WHITELIST [WHITELIST ...]]]
                   [-b [BLACKLIST [BLACKLIST ...]]] [-a] [-f] [-p]

CLI to detect the language of text.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         Print version number.
  -s STRING, --string STRING
                        Input string.
  -t TOP, --top TOP     Print top results.
  -m MINLENGTH, --minlength MINLENGTH
                        Minimum string length to accept.
  -w [WHITELIST [WHITELIST ...]], --whitelist [WHITELIST [WHITELIST ...]], 
  -o [WHITELIST [WHITELIST ...]], --only [WHITELIST [WHITELIST ...]]
                        Allow languages.
  -b [BLACKLIST [BLACKLIST ...]], --blacklist [BLACKLIST [BLACKLIST ...]], 
  -i [BLACKLIST [BLACKLIST ...]], --ignore [BLACKLIST [BLACKLIST ...]]
                        Disallow languages.
  -a, --all             Output all raw results.
  -f, --full            Print full name of language (with lang code).
  -p, --percentage      Print relative match value (in percent).

usage:

# output language
$ pyfranc_cli -t 1 -s "Alle menslike wesens word vry"
# 'afr' : 1.0

# output language from stdin (expects utf8)
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | pyfranc_cli -t 1 -s $0
# 'ben' : 1.0

# ignore certain languages
$ pyfranc_cli --blacklist por glg -s "O Brasil caiu 26 posições"
# 'vec' : 1.0

# output language from stdin with only
$ echo "Alle mennesker er født frie og" | pyfranc_cli -t 1 --whitelist nob dan -s $0
# 'nob' : 1.0'

# output all results in raw-list format
$ pyfranc_cli --all -s "Considerando ser essencial que os direitos humanos"
# [['por', 1.0], ['glg', 0.771284519307895], ... 320 more items]

# display the result language name
$ pyfranc_cli --full -t 1 -s "Alle menslike wesens word vry"
# Afrikaans (afr) : 1.0

# output result with relative percentage of value
$ pyfranc_cli -t 5 --percentage -s "Considerando ser essencial que os direitos humanos"
# por : 28%
# glg : 22%
# ina : 17%
# spa : 17%
# ast : 16%

Derivation

Pyfranc is a outright port from Franc (JavaScript, MIT), trigram-utils (JavaScript, MIT), collapse-white-space (JavaScript, MIT), and n-gram (JavaScript, MIT). All this by Titus Wormer.

License

MIT © cyb3rk0tik

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc