Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

@ocelotbot/tinyld

Package Overview
Dependencies
Maintainers
2
Versions
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@ocelotbot/tinyld

Simple and Performant Language detection library (pure JS and zero dependencies)

  • 1.1.8
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
798
decreased by-3.27%
Maintainers
2
Weekly downloads
 
Created
Source

TinyLD

npm npm CDN Download License

logo

Tiny Language Detector, simply detect the language of a unicode UTF-8 text:

  • pure javascript, no api call, and no dependency (node and browser compatible)
  • alternative to libraries like CLD
  • blazing fast and low memory footprint (unlike ML methods)
  • support 62 languages (30 for the web version)
  • format ISO 639-1

Extra


Getting Started

Install

yarn add tinyld # or npm install --save tinyld

API

import { detect, detectAll } from 'tinyld'

// Detect
detect('これは日本語です.') // ja
detect('and this is english.') // en

// DetectAll
detectAll('ceci est un text en francais.')
// [ { lang: 'fr', accuracy: 0.5238 }, { lang: 'ro', accuracy: 0.3802 }, ... ]

More Information


TinyLD CLI

tinyld This is the text that I want to check
# [ { lang: 'en', accuracy: 1 } ]

More Information


Benchmark

Benchmark done on tatoeba dataset (~9M sentences) on 16 of the most common languages.

LibraryScriptProperly IdentifiedImproperly identifiedNot identifiedAvg Execution TimeDisk Size
TinyLDyarn bench:tinyld96.1747%2.6938%1.1315%0.1315ms.778KB
TinyLD Webyarn bench:tinyld-light92.1169%3.9536%3.9295%0.0616ms.89KB
node-cldyarn bench:cld88.9148%1.7489%9.3363%0.0612ms.> 10MB
node-linguayarn bench:lingua82.3157%0.2158%17.4685%0.7085ms.~100MB
francyarn bench:franc68.7783%26.3432%4.8785%0.1381ms.267KB
franc-minyarn bench:franc-min65.5163%23.5794%10.9044%0.0614ms.119KB
languagedetectyarn bench:languagedetect61.6068%12.295%26.0982%0.1585ms.240KB

Remark

  • For each category, top3 results are in Bold
  • Language evaluated in this benchmark:
    • Asia: jpn, cmn, kor, hin
    • Europe: fra, spa, por, ita, nld, eng, deu, fin, rus
    • Middle east: , tur, heb, ara
  • This kind of benchmark is not perfect and % can vary over time, but it gives a good idea of overall performances

Conclusion

  • For NodeJS: TinyLD or node-cld (fast and accurate)
  • For Browser: TinyLD Light or franc-min (small, decent accuracy, franc is less accurate but support more languages)
  • node-lingua is just too big and slow
  • languagedetect is light but just not accurate enough, really focused on indo-european languages (support kazakh but not chinese, korean or japanese)

Keywords

FAQs

Package last updated on 04 Oct 2022

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc