Security News
The Risks of Misguided Research in Supply Chain Security
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.
language-detector detects the language of text
pip install language-detector
Works with both Python 2 and 3
from language_detector import detect_language
text = "I arrived in that city on January 4, 1937"
language = detect_language(text)
# prints English
Languages Supported |
---|
Arabic |
English |
Farsi |
French |
German |
Kurmanci (Kurdish) |
Mandarin |
Russian |
Sorani (Kurdish) |
Spanish |
Turkish |
To test the package run
python -m unittest language_detector.tests.test
Test is a comparison of how well language-detector and langid identify languages in the data sources.
package | language-detector | langid |
---|---|---|
test-duration (in seconds) | 0.10 | 3.83 |
accuracy | 96.77% | 67.74% |
If you don't want language-detector to look for certain languages, you can monkey-patch the code. For example, in order to exclude English:
import language_detector
language_detector.char_language = [cl for cl in char_language if cl[1] != "English"]
# proceed as normal
The following is a list of datasets used for each language:
Language | Datasets |
---|---|
Arabic | UN Corpora |
English | UN Corpora |
Farsi | BBC News Persian |
French | UN Corpora |
German | Deutsche Welle |
Kurmanci (Kurdish) | Rudaw |
Mandarin | UN Corpora |
Russian | UN Corpora |
Sorani (Kurdish) | Rudaw |
Spanish | UN Corpora |
Turkish | BBC News Türkçe |
If you'd like to contribute a new language, please consult CONTRIBUTING.md
Contact the package author, Daniel J. Dufour, at daniel.j.dufour@gmail.com
FAQs
Detect language of text
We found that language-detector demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.
Research
Security News
Socket researchers found several malicious npm packages typosquatting Chalk and Chokidar, targeting Node.js developers with kill switches and data theft.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.