Detect the language of text.
What’s so cool about franc?
- franc supports more languages(†) than any other library, or Google;
- franc is easily forked to support 339 languages;
- franc is just as fast as the competition.
† - If humans write in the language, on the web, and the language has
more than one million speakers, franc detects it.
Installation
npm:
npm install franc
franc is also available pre-built as an AMD, CommonJS, and globals
module, supporting 75, 176, and 339 languages.
Usage
var franc = require('franc');
franc('Alle menslike wesens word vry');
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট');
franc('Alle mennesker er født frie og');
franc('');
franc.all('O Brasil caiu 26 posições');
franc('the');
franc('the', {'minLength': 3});
franc.all('O Brasil caiu 26 posições', {
'whitelist' : ['por', 'src', 'glg', 'spa']
});
franc.all('O Brasil caiu 26 posições', {
'blacklist' : ['src', 'glg', 'lav']
});
CLI
Install:
npm install --global franc
Use:
Usage: franc [options] <string>
Detect the language of text
Options:
-h, --help output usage information
-v, --version output version number
-m, --min-length <number> minimum length to accept
-w, --whitelist <string> allow languages
-b, --blacklist <string> disallow languages
Usage:
# output language
$ franc "Alle menslike wesens word vry"
# afr
# output language from stdin (expects utf8)
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | franc
# ben
# blacklist certain languages
$ franc --blacklist por,glg "O Brasil caiu 26 posições"
# src
# output language from stdin with whitelist
$ echo "Alle mennesker er født frie og" | franc --whitelist nob,dan
# nob
Supported languages
franc supports 176 “languages”, by default. For a complete list,
check out supported-languages.md.
Supporting more or less languages
Supporting more or less languages is easy: fork the project and run
the following:
npm install
export THRESHOLD=100000
npm run build
The above would create a version of franc with support for any
language with 100,000 or more speakers. To support all languages, even
dead ones like Latin, specify -1
.
Browser
I’ve compiled three versions of franc for use in the browser.
They’re UMD compliant: they work with AMD, CommonJS, and
<script>
s.
-
franc.js
— franc with support for languages with 8 million or
more speakers (75 languages);
-
franc-most.js
— franc with support for languages with 1
million or more speakers (175 languages, the same as the npm
version);
-
franc-all.js
— franc with support for all languages (339
languages, carful, huge!).
Derivation
Franc is a derivative work from guess-language (Python, LGPL),
guesslanguage (C++, LGPL), and Language::Guess
(Perl, GPL). Their creators granted me the rights to distribute franc
under the MIT license: respectively, Maciej Ceglowski,
Jacob R. Rideout, and Kent S. Johnson.
License
MIT © Titus Wormer