Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

biangbiang

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

biangbiang

Chinese NLP utilities

0.0.2
Source
npm

Version published: 4 years ago

Weekly downloads: 3

Maintainers: 1

Weekly downloads

Created: 4 years ago

Source

biangbiang

Chinese NLP utilities

Installation

For npm:

npm install biangbiang

For Yarn:

yarn add biangbiang

Getting started

With import:

import biangbiang from "biangbiang";

With require:

var biangbiang = require('biangbiang');

Methods

Dictionary

`define(word, dictionary)`

Get the pinyin and definition of a word, where dictionary is "simplified", "traditional", or "merged". Also returns the frequency index (rank).

define('面条', 'simplified');

{
    simplified: '面条',
    traditional: '麵條',
    pinyin: 'mian4 tiao2',
    definition: 'noodles',
    index: 6029
}

`kind(character)`

Check if a character is a traditional or simplified one. If so, returns the other form. type is 1 for simplified, 2 for traditional, and 3 for both.

kind("面");

{ type: 1, other: '麵'}

`wordsContaining(character)`

Get a list of all dictionary words containing a character, sorted in order of decreasing frequency.

wordsContaining('面');

[
	{
		word: '面',
		index: 322,
	},
	{
		word: '里面',
		index: 706,
	},
	{
		word: '面对',
		index: 930,
	},
	{
		word: '外面',
		index: 1234,
	},
	{
		word: '后面',
		index: 1270,
	},
  ...
]

Frequency

`characterFrequency(character)`

Get frequency statistics for a character.

characterFrequency('面');

{
	symbol: '面',
	index: 211,
	frequency: 1631866,
	percentage: 0.0006532897206780486,
	cumulativePercentage: 0.7101332080329651,
}

`wordFrequency(word)`

Get frequency statistics for a word.

wordFrequency('面条');

{
	symbol: '面条',
	index: 6029,
	frequency: 66879,
	percentage: 0.000015823013308250793,
	cumulativePercentage: 0.8864603725508198,
}

`multiFrequency(sentence)`

Get frequency statistics for a body of text.

multiFrequency('我喜欢吃面条。')；

{
	byCharacter: [
		{
			symbol: '我',
			index: 1,
			frequency: 107133693,
			percentage: 0.042889146765223256,
			cumulativePercentage: 0.12608816399204145,
		},
		{
			symbol: '喜',
			index: 479,
			frequency: 681772,
			percentage: 0.0002729357921827617,
			cumulativePercentage: 0.8216732504061582,
		},
		{
			symbol: '欢',
			index: 1490,
			frequency: 140530,
			percentage: 0.000056258788679270345,
			cumulativePercentage: 0.9496496712024702,
		},
		{
			symbol: '吃',
			index: 42,
			frequency: 9348265,
			percentage: 0.0037424184526636244,
			cumulativePercentage: 0.46991986609112824,
		},
		{
			symbol: '面',
			index: 211,
			frequency: 1631866,
			percentage: 0.0006532897206780486,
			cumulativePercentage: 0.7101332080329651,
		},
		{
			symbol: '条',
			index: 169,
			frequency: 2102653,
			percentage: 0.0008417612665824651,
			cumulativePercentage: 0.6785621013285376,
		},
		{
			symbol: '。',
			index: -1,
			frequency: -1,
			percentage: -1,
			cumulativePercentage: -1,
		},
	],
	indices: [1, 479, 1490, 42, 211, 169],
	percentages: [
		0.042889146765223256,
		0.0002729357921827617,
		0.000056258788679270345,
		0.0037424184526636244,
		0.0006532897206780486,
		0.0008417612665824651,
	],
	cumulativePercentages: [
		0.12608816399204145,
		0.8216732504061582,
		0.9496496712024702,
		0.46991986609112824,
		0.7101332080329651,
		0.6785621013285376,
	],
}

Components

`decompose(character, depth)`

Decompose a character into its components up to a specified depth. If depth is undefined, then the full component tree is returned.

decompose('面')；

{
	丆: {
		'㇐': '㇐',
		'㇓': '㇓',
	},
	囬: {
		'55103': {
			'10001': {
				'10001': '㇑',
			},
			二: {
				二: '㇐',
			},
		},
		囗: {
			'⺆': {
				'㇑': '㇑',
				'㇆': '㇆',
			},
			'㇐': '㇐',
		},
	},
}

`charactersWithComponent(component)`

Get a list of characters containing a component, sorted in order of decreasing frequency.

charactersWithComponent('囗')；

[
	{ character: '回', index: 139 },
	{ character: '图', index: 166 },
	{ character: '口', index: 307 },
	{ character: '因', index: 381 },
	{ character: '西', index: 382 },
	{ character: '团', index: 388 },
	{ character: '困', index: 413 },
	{ character: '国', index: 544 },
	{ character: '围', index: 644 },
	{ character: '圈', index: 717 },
  ...
]

How it works

JSON files containing character/word/component information are generated by /src/prepare.js from raw files contained in /data/raw, with outputs saved to /data/processed.

The preparation script can also be run with npm run prepare or yarn prepare.

Sources

Dictionary entries are entirely from CEDICT
Frequency statistics are from BCC_LEX_Zh
Character composition entries are from CJK-decomp

This project was inspired by HanziJS and offers many of the same functionalities.

FAQs

What is biangbiang?

Is biangbiang popular?

Is biangbiang well maintained?

Package last updated on 14 Aug 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

biangbiang

biangbiang

Installation

Getting started

Methods

Dictionary

define(word, dictionary)

kind(character)

wordsContaining(character)

Frequency

characterFrequency(character)

wordFrequency(word)

multiFrequency(sentence)

Components

decompose(character, depth)

charactersWithComponent(component)

How it works

Sources

Related posts

Noxia: Emerging Dark Web Hosting Provider Targets Python, Node.js, Go, and Rust Ecosystems

Socket secures $40M to combat next-generation software supply chain attacks led by industry titans Abstract Ventures, Elad Gil, and a16z

`define(word, dictionary)`

`kind(character)`

`wordsContaining(character)`

`characterFrequency(character)`

`wordFrequency(word)`

`multiFrequency(sentence)`

`decompose(character, depth)`

`charactersWithComponent(component)`