New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

sbd

Package Overview
Dependencies
Maintainers
1
Versions
25
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

sbd

Split text into sentences

  • 0.0.2
  • Source
  • npm
  • Socket score

Version published
Maintainers
1
Created
Source

Sentence Boundary Detection (SBD)

Simple sentence detection (i.e working ~95% of the time):

  • Split a text based on period, question- and exclamation marks.
    • Skips abbreviations
    • Skips numbers, currency
    • Skips urls, email address, phone nr.

Future work

Currently, sbd fails to recognize sentences ending in an abbreviation, for example "The president lives in Washington, D.C." and I do not really see a viable option other than using a real classifier with proper training.

Installation

Use npm:

$ npm install sbd

How to

var tokenizer = require('sbd');

var text = "In I.C.T we have multiple challenges!
This is a text of three sentences. Skip Mr. Money €10.00 right.";

var sentences = tokenizer.sentences(text);

// [
//  'In I.C.T we have multiple challenges!',
//  'This is a text of three sentences.',
//  'Skip Mr. Money €10.00 right.'
// ]

Keywords

FAQs

Package last updated on 22 May 2014

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc