Socket
Socket
Sign inDemoInstall

@ocelotbot/tinyld

Package Overview
Dependencies
0
Maintainers
2
Versions
3
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    @ocelotbot/tinyld

Simple and Performant Language detection library (pure JS and zero dependencies)


Version published
Weekly downloads
2.4K
increased by3.46%
Maintainers
2
Created
Weekly downloads
 

Readme

Source

TinyLD

npm npm CDN Download License

logo

Tiny Language Detector, simply detect the language of a unicode UTF-8 text:

  • pure javascript, no api call, and no dependency (node and browser compatible)
  • alternative to libraries like CLD
  • blazing fast and low memory footprint (unlike ML methods)
  • support 62 languages (30 for the web version)
  • format ISO 639-1

Extra


Getting Started

Install

yarn add tinyld # or npm install --save tinyld

API

import { detect, detectAll } from 'tinyld'

// Detect
detect('これは日本語です.') // ja
detect('and this is english.') // en

// DetectAll
detectAll('ceci est un text en francais.')
// [ { lang: 'fr', accuracy: 0.5238 }, { lang: 'ro', accuracy: 0.3802 }, ... ]

More Information


TinyLD CLI

tinyld This is the text that I want to check
# [ { lang: 'en', accuracy: 1 } ]

More Information


Benchmark

Benchmark done on tatoeba dataset (~9M sentences) on 16 of the most common languages.

LibraryScriptProperly IdentifiedImproperly identifiedNot identifiedAvg Execution TimeDisk Size
TinyLDyarn bench:tinyld96.1747%2.6938%1.1315%0.1315ms.778KB
TinyLD Webyarn bench:tinyld-light92.1169%3.9536%3.9295%0.0616ms.89KB
node-cldyarn bench:cld88.9148%1.7489%9.3363%0.0612ms.> 10MB
node-linguayarn bench:lingua82.3157%0.2158%17.4685%0.7085ms.~100MB
francyarn bench:franc68.7783%26.3432%4.8785%0.1381ms.267KB
franc-minyarn bench:franc-min65.5163%23.5794%10.9044%0.0614ms.119KB
languagedetectyarn bench:languagedetect61.6068%12.295%26.0982%0.1585ms.240KB

Remark

  • For each category, top3 results are in Bold
  • Language evaluated in this benchmark:
    • Asia: jpn, cmn, kor, hin
    • Europe: fra, spa, por, ita, nld, eng, deu, fin, rus
    • Middle east: , tur, heb, ara
  • This kind of benchmark is not perfect and % can vary over time, but it gives a good idea of overall performances

Conclusion

  • For NodeJS: TinyLD or node-cld (fast and accurate)
  • For Browser: TinyLD Light or franc-min (small, decent accuracy, franc is less accurate but support more languages)
  • node-lingua is just too big and slow
  • languagedetect is light but just not accurate enough, really focused on indo-european languages (support kazakh but not chinese, korean or japanese)

Keywords

FAQs

Last updated on 04 Oct 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc