Socket
Book a DemoInstallSign in
Socket

langdetect-zh

Package Overview
Dependencies
Maintainers
1
Versions
5
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

langdetect-zh

Google's langdetect modified for Chinese texts

pipPyPI
Version
1.0.4
Maintainers
1

langdetect_zh

Installation

$ pip install langdetect_zh

Supported Python versions 2.7, 3.4+.

Languages

langdetect_zh supports 2 languages out of the box (ISO 639-1 codes):

zh-cn, zh-tw

Basic usage

Directly output the most similar language code:

>>> from langdetect_zh import detect
>>> detect("这是一段中文文本")
'zh-cn'

To find out the probabilities for the top languages:

>>> from langdetect_zh import detect_langs
>>> detect_langs("这是一段中文文本")
[zh-cn:0.999997316441747]

NOTE

Language detection algorithm is non-deterministic, which means that if you try to run it on a text which is either too short or too ambiguous, you might get different results everytime you run it.

To enforce consistent results, call following code before the first language detection:

from langdetect_zh import DetectorFactory
DetectorFactory.seed = 0

Original project

This package is an optimization of langdetect. The specific optimization measure is to subdivide simplified Chinese and traditional Chinese under the condition of pure Chinese.

Keywords

language detection chinese

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts