Latest Threat Research:SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains.Details
Socket
Book a DemoInstallSign in
Socket

vibrato

Package Overview
Dependencies
Maintainers
1
Versions
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

vibrato

pipPyPI
Version
0.2.0
Maintainers
1

🐍 python-vibrato 🎤

Vibrato is a fast implementation of tokenization (or morphological analysis) based on the Viterbi algorithm. This is a Python wrapper for Vibrato.

PyPI Build Status Documentation Status

Installation

Install pre-built package from PyPI

Run the following command:

$ pip install vibrato

Build from source

You need to install the Rust compiler following the documentation beforehand. daachorse uses pyproject.toml, so you also need to upgrade pip to version 19 or later.

$ pip install --upgrade pip

After setting up the environment, you can install daachorse as follows:

$ pip install git+https://github.com/daac-tools/python-vibrato

Example Usage

python-vibrato does not contain model files. To perform tokenization, follow the document of Vibrato to download distribution models or train your own models beforehand.

Check the version number as shown below to use compatible models:

>>> import vibrato
>>> vibrato.VIBRATO_VERSION
'0.5.0'

Examples:

>>> import vibrato

>>> with open('tests/data/system.dic', 'rb') as fp:
...     tokenizer = vibrato.Vibrato(fp.read())

>>> tokens = tokenizer.tokenize('社長は火星猫だ')

>>> len(tokens)
5

>>> tokens[0]
Token { surface: "社長", feature: "名詞,普通名詞,一般,*" }

>>> tokens[0].surface()
'社長'

>>> tokens[0].feature()
'名詞,普通名詞,一般,*'

>>> tokens[0].start()
0

>>> tokens[0].end()
2

Note for distributed models

The distributed models are compressed in zstd format. If you want to load these compressed models, you must decompress them outside the API.

>>> import vibrato
>>> import zstandard  # zstandard package in PyPI

>>> dctx = zstandard.ZstdDecompressor()
>>> with open('tests/data/system.dic.zst', 'rb') as fp:
...     with dctx.stream_reader(fp) as dict_reader:
...         tokenizer = vibrato.Vibrato(dict_reader.read())

License

Licensed under either of

at your option.

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts