Socket
Book a DemoInstallSign in
Socket

nnsplit

Package Overview
Dependencies
Maintainers
1
Versions
21
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

nnsplit

A tool to split text using a neural network. For sentence boundary detection, compound splitting and more.

0.5.8
latest
Source
npmnpm
Version published
Weekly downloads
25
2400%
Maintainers
1
Weekly downloads
 
Created
Source

NNSplit

PyPI Crates.io npm CI License

A tool to split text using a neural network. The main application is sentence boundary detection, but e. g. compound splitting for German is also supported.

Features

  • Robust: Not reliant on proper punctuation, spelling and case. See the metrics.
  • Small: NNSplit uses a byte-level LSTM, so weights are small (< 4MB) and models can be trained for every unicode encodable language.
  • Portable: NNSplit is written in Rust with bindings for Rust, Python, and Javascript (Browser and Node.js). See how to get started in the usage section.
  • Fast: Up to 2x faster than Spacy sentencization, see the benchmark.
  • Multilingual: NNSplit currently has models for 9 different languages (German, English, French, Norwegian, Swedish, Simplified Chinese, Turkish, Russian and Ukrainian). Try them in the demo.

Documentation has moved to the NNSplit website: https://bminixhofer.github.io/nnsplit.

License

NNSplit is licensed under the MIT license.

FAQs

Package last updated on 23 Jul 2021

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.