Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

@smodin/fast-text-language-detection

Package Overview
Dependencies
Maintainers
1
Versions
6
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@smodin/fast-text-language-detection

Language detection with facebook fast-text model

  • 0.2.3
  • latest
  • npm
  • Socket score

Version published
Weekly downloads
48
increased by26.32%
Maintainers
1
Weekly downloads
 
Created
Source

Fast-Text Language Detection

In a search for the best option for predicting a language from text which didn't require a large machine learning model, it appeared that fast-text, created by FaceBook, was the best option (https://towardsdatascience.com/benchmarking-language-detection-for-nlp-8250ea8b67c).

Improving Accuracy

Most incorrect suggestions are due to non-text characters (i.e. punctuation) that should be filtered out to provide better results. Please submit an issue for incorrect suggestions so we can work on improving the accuracy.

Installation

npm i --save @smodin/fast-text-language-detection

Note: This will install the fast-text model by facebook which is about 150MB

Usage

Prediction

Testing

;(async () => {
  const LanguageDetection = require('@smodin/fast-text-language-detection')
  const lid = new LanguageDetection()

  console.log(await lid.predict('FastText-LID provides a great language identification'))
  console.log(await lid.predict('FastText-LID bietet eine hervorragende Sprachidentifikation'))
  console.log(await lid.predict('FastText-LID fornisce un ottimo linguaggio di identificazione'))
  console.log(await lid.predict('FastText-LID fournit une excellente identification de la langue'))
  console.log(await lid.predict('FastText-LID proporciona una gran identificación de idioma'))
  console.log(await lid.predict('FastText-LID обеспечивает отличную идентификацию языка'))
  console.log(await lid.predict('FastText-LID提供了很好的語言識別'))
})()

Output

[ { lang: 'en', prob: 0.6313226222991943, isReliableLanguage: true } ]
[ { lang: 'de', prob: 0.9137917160987854, isReliableLanguage: true } ]
[ { lang: 'it', prob: 0.974501371383667, isReliableLanguage: true } ]
[ { lang: 'fr', prob: 0.7358829379081726, isReliableLanguage: true } ]
[ { lang: 'es', prob: 0.9211937189102173, isReliableLanguage: true } ]
[ { lang: 'ru', prob: 0.9899846911430359, isReliableLanguage: true } ]
[ { lang: 'zh', prob: 0.8515647649765015, isReliableLanguage: true } ]

isReliableLanguage is true if there were 10 + test results and accuracy was 95% or more

Other Helpers

const LanguageDetection = require('@smodin/fast-text-language-detection')
const lid = new LanguageDetection()
const languageIsoCodes = lid.languageIsoCodes // ['af', 'als', 'am', 'an', 'ar', ...]

Similar Libaries

FastText has been used and implemented in other computer languages.

Reference Documents

Accuracy from Benchmark Testing

Long Input (30 to 250 characters)

Translated sentence data was obtained from tatoeba.org. Testing the 550k sentences of 30 - 250 characters took less than 30 seconds (personal macbook Pro).

Language (101)Symbol (alternates)Count (558260)Accuracy (30 - 250 chars)Mislabels
Englishen224281
Greekel120391
Hebrewhe86161
Japaneseja21691
Georgianka19731
Bengalibn11641
Thaith5721
Mandarin Chinesezh5681
Malayalamml5171
Koreanko4821
Burmesemy2161
Tamilta2051
Kannadakn1181
Telugute1021
Punjabi (Eastern)pa881
Laolo701
Gujaratigu571
Tibetanbo201
Divehi, Dhivehi, Maldiviandv151
Sinhalasi91
Amharicam31
Germande220140.9998637230853094en
Polishpl177680.999718595227375en,eo,de,ro
Russianru173290.9997114663281205bg,kk,uk,mk
Hungarianhu179420.9996655891204994tr,br,it,de,en
Hindihi53620.999627004848937mr
Vietnamesevi130000.9996153846153846eo,hu,fr
Turkishtr199190.9995983734123199eo,en,it,fr,nds
Esperantoeo178410.999551594641556it,es,pt,fr,ceb
Frenchfr230760.999523314265904en,es,it,ru
Marathimr104610.9995220342223496hi
Uyghurug36920.9991874322860238ba,ru,hu
Finnishfi174060.9990807767436516it,et,en,hr,de
Italianit183260.9989632216522972es,de,fr,en,la
Spanishes182270.998134635430954pt,it,io,ca,ia
Armenianhy5180.9980694980694981de
Arabicar87610.9978312977970552arz,fa,es,mzn,en
Ukrainianuk142850.9963598179908996ru,sr
Macedonianmk144650.9959903214656066bg,sr,ru
Dutchnl196260.9934780393355752en,af,de,nds,fr
Lithuanianlt138350.9933501987712324fi,pl,eo,pt,sr
Portuguesept201740.9933082184990581es,gl,it,en,fr
Khmerkm3790.9920844327176781az,et
Urduur9630.9906542056074766pnb,fa,ro,en
Czechcs108630.9898738838258307sk,pl,hu,sl,en
Swedishsv121880.9886773875943551no,da,en,fi,id
Romanianro135600.9886430678466077es,fr,it,en,pt
Bulgarianbg111440.9869885139985642mk,ru,uk,sr
Ossetianos590.9830508474576272ru
Icelandicis63640.9803582652419862et,no,da,hu,cs
Kazakhkk22320.9802867383512545ru,tr,tt,uk,ky
Tagalogtl103510.9737223456670853ceb,en,id,es,war
Tatartt81780.9680851063829787az,tr,ru,fi,kk
Basqueeu29990.9676558852950984it,nl,id,en,io
Tajiktg300.9666666666666667ru
Belarusianbe62530.9625779625779626uk,ru,pl,bg,sr
Latvianlv12430.9597747385358005lt,hr,sr,fi,eo
Chuvashcv4600.9543478260869566ru,uk,ba,sr
Bretonbr24510.9543043655650755fr,nl,eu,de,pt
Bashkirba1200.95tt,av
Indonesianid93720.949637217242851ms,it,en,eo,tr
Danishda152990.948035819334597no,sv,de,en,nn
Estonianet12270.9356153219233904fi,en,hu,it,nl
Latinla114370.9206085511934948fr,it,en,es,pt
Irishga8670.9065743944636678en,gd,ca,kv,cs
Scottish Gaelicgd5420.8966789667896679en,ga,de,fr,pam
Welshcy6190.8917609046849758es,en,la,kw,de
Catalanca47250.8833862433862434es,pt,fr,it,ro
Kyrgyzky660.8787878787878788ru,kk
Cornishkw4260.8779342723004695en,cy,de,br,sq
Assameseas9600.8635416666666667bn
Volapükvo8060.8511166253101737id,de,fi,en,eo
Serbiansr134940.8489699125537276hr,sh,mk,bs,sl
Slovaksk43700.8263157894736842cs,pl,sl,no,sr
Maltesemt520.8076923076923077es,cs,pt,sr,eo
Norwegian Nynorsknn (no)6570.7990867579908676da,sv,de,es,fi
Afrikaansaf16320.7879901960784313nl,en,fr,de,nds
Occitanoc28610.7679133170220203ca,es,fr,pt,it
Interlinguaia187820.7500798636992866es,it,fr,la,pt
Sanskritsa110.7272727272727273hi,ne
Chechence70.7142857142857143mn,ru
Sloveniansl3720.6774193548387096sr,hr,bs,pl,eo
Frisianfy1070.6635514018691588nl,en,de,af,fr
Javanesejv2600.6461538461538462id,en,ms,ko,su
Yorubayo50.6sk,rm
Luxembourgishlb2170.5944700460829493de,nds,sv,fr,nl
Galiciangl26180.5790679908326967pt,es,it,fr,ca
Turkmentk37930.5710519377801213tr,uz,en,et,io
Croatianhr22220.5333033303330333sr,sh,bs,sl,pl
Aragonesean40.5es
Idoio29050.48055077452667816eo,es,it,pt,tr
Interlingueie20070.4718485301444943es,it,fr,en,ia
Limburgan, Limburger, Limburgishli30.3333333333333333de
Walloonwa160.3125fr,pt,tl,oc,en
Somaliso320.21875fi,eo,cy,en,az
Corsicanco50.2it,fr
Sundanesesu110.18181818181818182id,ms,es
Haitian Creoleht150.06666666666666667fr,br,su,diq,no
Romanshrm160.0625it,fr,en,tl,qu
Bosnianbs1390.03597122302158273sr,hr,sh,pl,sl
Manxgv60cy,fr,nl,et,en

Short Form (10 to 40 characters)

As a test of accuracy on shorter phrases, the min and max character count was changed to 10 - 40, and similar results can be seen for major languages, but less known languages suffer significantly:

Language (102)Symbol (alternates)Count (837539)Accuracy (10 - 40 chars)Mislabels
Thaith33991
Malayalamml5251
Burmesemy2431
Tamilta2291
Telugute2201
Punjabi (Eastern)pa1561
Amharicam1541
Kannadakn1261
Gujaratigu1161
Sinhalasi371
Tibetanbo291
Divehi, Dhivehi, Maldiviandv151
Japaneseja280600.9999643620812545zh
Greekel249800.9999599679743795en
Hebrewhe264610.9999244170666264en,yi
Koreanko61280.9996736292428199tr,ja
Armenianhy18550.9994609164420485de
Bengalibn41320.9992739593417231bpy,as
Marathimr256330.9989466703078064hi,gom,pt,new
Englishen170940.9986544986544986nl,it,hu,eo,es
Mandarin Chinesezh178010.9978652884669401wuu,yue,ja,sr,pt
Turkishtr188790.9978282748026909en,eo,az,es,it
Russianru208550.9977942939343083uk,bg,mk,sr,be
Germande172230.9974452766649248en,it,fr,es,sv
Uyghurug61350.9973920130399349ar,ba,tt,ca,hu
Vietnamesevi131300.9971058644325971it,pms,eo,pt,fr
Esperantoeo216410.9966729818400258it,es,tr,pt,pl
Georgianka45500.996043956043956xmf,en
Hindihi114970.9958249978255197mr,dty,new,bh,ne
Italianit204490.995598806787618es,en,fr,eo,pt
Arabicar255310.9955348399984333arz,fa,en,mzn,ps
Frenchfr160400.9953865336658354en,it,ia,es,pt
Hungarianhu208430.9952502039053879en,pt,it,nl,eo
Laolo1830.994535519125683el
Polishpl213860.9940147760216964en,it,eo,de,cs
Khmerkm12520.9920127795527156az,ru,sr,et
Spanishes204980.9895599570689824pt,it,fr,ca,en
Finnishfi207310.9849500747672567it,en,eo,et,nl
Portuguesept183520.9833805579773321es,it,gl,fr,en
Macedonianmk236020.9830099144140327ru,bg,sr,uk
Ukrainianuk232510.982667412154316ru,mk,bg,be,sr
Urduur15830.9797852179406191pnb,fa,ug,en,ro
Dutchnl193490.9720915809602564en,de,nds,af,fr
Lithuanianlt241840.9597667879589812eo,fi,sr,pt,pl
Czechcs251890.951605859700663sk,pl,hu,en,sl
Chuvashcv13320.9481981981981982ru,uk,krc,ba,sr
Tatartt82830.9471206084751902ru,tr,az,kk,ky
Swedishsv244660.9464563067113545da,no,en,de,eo
Icelandicis77450.9449967721110394da,et,cs,no,de
Bulgarianbg193280.9352235099337748mk,ru,uk,sr,tg
Sanskritsa1350.9259259259259259hi,ne,mr
Kazakhkk23730.9258322798145807uk,tt,tr,ru,ky
Romanianro183670.9235041106332008it,es,en,fr,pt
Tagalogtl111330.9193389023623462ceb,en,it,id,es
Ossetianos2050.9170731707317074ru,hy,sr,kv,mrj
Indonesianid97070.9138765839085197ms,en,it,eo,tr
Danishda225390.9081591907360576no,sv,de,en,fr
Latinla246990.8979310903275436it,fr,en,es,pt
Basqueeu45700.8851203501094091it,id,hu,nl,eo
Belarusianbe90050.8785119378123265ru,uk,bg,mk,pl
Cornishkw37570.8759648655842428en,de,cy,es,br
Tajiktg480.875ru,uk
Latvianlv21980.8735213830755232lt,es,sr,en,fr
Bretonbr54680.8579005120702268en,fr,pt,de,eu
Irishga19770.840161861406171en,pt,es,ca,gd
Bashkirba1280.8359375tt,ru,sr,av,kk
Sindhisd60.8333333333333334ur
Serbiansr231280.8054738844690419hr,mk,sh,ru,sl
Estonianet30770.8043548911277218fi,en,hu,tr,it
Scottish Gaelicgd7530.7822045152722443en,ga,de,fr,pam
Welshcy11670.7660668380462725en,es,kw,la,it
Volapükvo39410.7609743719868054id,en,eo,fi,de
Kyrgyzky2270.7533039647577092ru,kk,tt,mn,bg
Catalanca53130.7504234895539243es,pt,it,fr,en
Assameseas26350.7127134724857686bn,bpy,en,tl,bh
Yorubayo310.7096774193548387ga,pl,en,qu,ckb
Occitanoc40960.70751953125es,fr,ca,pt,it
Interlinguaia149490.7073382834972239it,es,fr,en,la
Afrikaansaf32990.6808123673840558nl,en,de,fr,nds
Norwegian Nynorsknn (no)12870.6798756798756799da,sv,de,es,hu
Maltesemt1650.6727272727272727hu,en,es,it,pl
Slovaksk138770.6105786553289616cs,pl,sl,no,sr
Chechence250.6bg,sr,mn,ba,uk
Interlingueie65380.5183542367696543es,it,en,fr,eo
Idoio64950.4857582755966128eo,es,it,pt,tr
Sloveniansl9080.46255506607929514sr,hr,cs,pl,bs
Javanesejv5480.45255474452554745id,en,ko,ms,hu
Turkmentk45850.45169029443838604tr,en,uz,et,pl
Croatianhr41860.4362159579550884sr,sh,bs,sl,pl
Galiciangl32450.4200308166409861pt,es,it,en,fr
Luxembourgishlb7320.3975409836065574de,fr,en,nds,nl
Frisianfy2820.36879432624113473nl,en,nds,de,fr
Walloonwa370.2972972972972973fr,en,no,it,gn
Corsicanco130.23076923076923078it,min,ro,ilo,id
Sundanesesu180.2222222222222222id,es,en,it,lmo
Somaliso610.14754098360655737en,fi,et,cy,su
Limburgan, Limburger, Limburgishli340.14705882352941177de,nl,en,no,is
Haitian Creoleht580.1206896551724138en,fr,br,la,de
Manxgv300.06666666666666667en,it,pt,fr,kw
Bosnianbs5200.04423076923076923sr,hr,sh,it,pl
Aragonesean730.0136986301369863es,pt,it,en,fr
Romanshrm110it,pt,fr,en,tl

Comparison NPM Libaries

Success benchmarking has been checked with other popular libraries (notably franc and languagedetect) and results are included in benchmark-testing/results/COMPARISONS.md

TODO List

This is an improved modification of https://www.npmjs.com/package/fasttext-lid

Created with <3 for https://smodin.io

Keywords

FAQs

Package last updated on 13 Sep 2021

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc