Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

@ocelotbot/tinyld

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@ocelotbot/tinyld

Simple and Performant Language detection library (pure JS and zero dependencies)

1.1.8
latest
Source
npm

Version published: 2 years ago

Weekly downloads: 798; decreased by-3.27%

Maintainers: 2

Weekly downloads

Created: 2 years ago

Source

TinyLD

Tiny Language Detector, simply detect the language of a unicode UTF-8 text:

pure javascript, no api call, and no dependency (node and browser compatible)
alternative to libraries like CLD
blazing fast and low memory footprint (unlike ML methods)
support 62 languages (30 for the web version)
format ISO 639-1

Extra

Getting Started

Install

yarn add tinyld # or npm install --save tinyld

API

import { detect, detectAll } from 'tinyld'

// Detect
detect('これは日本語です.') // ja
detect('and this is english.') // en

// DetectAll
detectAll('ceci est un text en francais.')
// [ { lang: 'fr', accuracy: 0.5238 }, { lang: 'ro', accuracy: 0.3802 }, ... ]

More Information

TinyLD CLI

tinyld This is the text that I want to check
# [ { lang: 'en', accuracy: 1 } ]

More Information

Benchmark

Benchmark done on tatoeba dataset (~9M sentences) on 16 of the most common languages.

Library	Script	Properly Identified	Improperly identified	Not identified	Avg Execution Time	Disk Size
TinyLD	`yarn bench:tinyld`	96.1747%	2.6938%	1.1315%	0.1315ms.	778KB
TinyLD Web	`yarn bench:tinyld-light`	92.1169%	3.9536%	3.9295%	0.0616ms.	89KB
node-cld	`yarn bench:cld`	88.9148%	1.7489%	9.3363%	0.0612ms.	> 10MB
node-lingua	`yarn bench:lingua`	82.3157%	0.2158%	17.4685%	0.7085ms.	~100MB
franc	`yarn bench:franc`	68.7783%	26.3432%	4.8785%	0.1381ms.	267KB
franc-min	`yarn bench:franc-min`	65.5163%	23.5794%	10.9044%	0.0614ms.	119KB
languagedetect	`yarn bench:languagedetect`	61.6068%	12.295%	26.0982%	0.1585ms.	240KB

Remark

For each category, top3 results are in Bold
Language evaluated in this benchmark:
- Asia: jpn, cmn, kor, hin
- Europe: fra, spa, por, ita, nld, eng, deu, fin, rus
- Middle east: , tur, heb, ara
This kind of benchmark is not perfect and % can vary over time, but it gives a good idea of overall performances

Conclusion

Not recommended

node-lingua is just too big and slow
languagedetect is light but just not accurate enough, really focused on indo-european languages (support kazakh but not chinese, korean or japanese)

Keywords

FAQs

What is @ocelotbot/tinyld?

Is @ocelotbot/tinyld popular?

Is @ocelotbot/tinyld well maintained?

Package last updated on 04 Oct 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@ocelotbot/tinyld

TinyLD

Extra

Getting Started

Install

API

TinyLD CLI

Benchmark

Remark

Conclusion

Recommended

Not recommended

Keywords

Related posts

@ocelotbot/tinyld

TinyLD

Extra

Getting Started

Install

API

TinyLD CLI

Benchmark

Remark

Conclusion

Recommended

Not recommended

Keywords

Related posts

Threat Actor Exposes Playbook for Exploiting npm to Build Blockchain-Powered Botnets

NVD Backlog Tops 20,000 CVEs Awaiting Analysis as NIST Prepares System Updates