Socket
Socket
Sign inDemoInstall

nnsplit

Package Overview
Dependencies
2
Maintainers
1
Versions
21
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    nnsplit

A tool to split text using a neural network. For sentence boundary detection, compound splitting and more.


Version published
Weekly downloads
22
Maintainers
1
Created
Weekly downloads
 

Readme

Source

NNSplit

PyPI Crates.io npm CI License

A tool to split text using a neural network. The main application is sentence boundary detection, but e. g. compound splitting for German is also supported.

Features

  • Robust: Not reliant on proper punctuation, spelling and case. See the metrics.
  • Small: NNSplit uses a byte-level LSTM, so weights are small (< 4MB) and models can be trained for every unicode encodable language.
  • Portable: NNSplit is written in Rust with bindings for Rust, Python, and Javascript (Browser and Node.js). See how to get started in the usage section.
  • Fast: Up to 2x faster than Spacy sentencization, see the benchmark.
  • Multilingual: NNSplit currently has models for 9 different languages (German, English, French, Norwegian, Swedish, Simplified Chinese, Turkish, Russian and Ukrainian). Try them in the demo.

Documentation has moved to the NNSplit website: https://bminixhofer.github.io/nnsplit.

License

NNSplit is licensed under the MIT license.

FAQs

Last updated on 23 Jul 2021

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc