Security News
NVD Backlog Tops 20,000 CVEs Awaiting Analysis as NIST Prepares System Updates
NVD’s backlog surpasses 20,000 CVEs as analysis slows and NIST announces new system updates to address ongoing delays.
sentence-splitter
Advanced tools
Split {Japanese, English} text into sentences.
npm install sentence-splitter
$ npm install -g sentence-splitter
$ echo "This is a pen.But This is not pen" | sentence-splitter
This is a pen.
But This is not pen
splitSentences(text, [options])
: Node[]
import {split, Syntax} from "sentence-splitter";
let sentences = split("text.\n\ntext");
console.log(JSON.stringify(sentences, null, 4));
/*
[
{
"type": "Sentence",
"raw": "text.",
"value": "text.",
"loc": {
"start": {
"line": 1,
"column": 0
},
"end": {
"line": 1,
"column": 5
}
},
"range": [
0,
5
]
},
{
"type": "WhiteSpace",
"raw": "\n",
"value": "\n",
"loc": {
"start": {
"line": 1,
"column": 5
},
"end": {
"line": 2,
"column": 0
}
},
"range": [
5,
6
]
},
{
"type": "WhiteSpace",
"raw": "\n",
"value": "\n",
"loc": {
"start": {
"line": 2,
"column": 0
},
"end": {
"line": 3,
"column": 0
}
},
"range": [
6,
7
]
},
{
"type": "Sentence",
"raw": "text",
"value": "text",
"loc": {
"start": {
"line": 3,
"column": 0
},
"end": {
"line": 3,
"column": 4
}
},
"range": [
7,
11
]
}
]
*/
// with splitting char options
let sentences = split("text¶text", {
charRegExp: /¶/
});
sentences.length; // 2
line
: start with 1column
: start with 0See more detail on Why do line
of location in JavaScript AST(ESTree) start with 1 and not 0?
charRegExp
/[\.。\?\!?!]/
newLineCharacters
"\n"
newLineCharacters: "\n\n"
to this optionSentence
: Sentence Node contain punctuation.WhiteSpace
: WhiteSpace Node has \n
.Get these Syntax
constants value from the module:
import {Syntax} from "sentence-splitter";
console.log(Syntax.Sentence);// "Sentence"
### Treat Markdown break line
td:lr: set `newLineCharacters: "\n\n"` to option.
```js
let sentences = splitSentences(text, {
newLineCharacters: "\n\n" // `\n\n` as a separator
});
sentence-splitter
split text into Sentence
and WhiteSpace
sentence-splitter
following text to 3 Sentence and 3 WhiteSpace.
Some markdown parser take cognizance 1 Sentence + 1 WhiteSpace + 1Sentence as 1 Sentence.
TextA
TextB
TextC
Output:
[
{
"type": "Sentence",
"raw": "TextA",
},
{
"type": "WhiteSpace",
"raw": "\n",
},
{
"type": "Sentence",
"raw": "TextB",
},
{
"type": "WhiteSpace",
"raw": "\n",
},
{
"type": "WhiteSpace",
"raw": "\n",
},
{
"type": "Sentence",
"raw": "TextC",
}
]
If you want to treat \n\n
as a separator of sentences, can use newLineCharacters
options.
let text = `TextA
TextB
TextC`;
let sentences = split(text, {
newLineCharacters: "\n\n" // `\n\n` as a separator
});
console.log(JSON.stringify(sentences, null, 4))
Output:
[
{
"type": "Sentence",
"raw": "TextA\nTextB",
},
{
"type": "WhiteSpace",
"raw": "\n",
},
{
"type": "WhiteSpace",
"raw": "\n",
},
{
"type": "Sentence",
"raw": "TextC",
}
]
npm test
git checkout -b my-new-feature
git commit -am 'Add some feature'
git push origin my-new-feature
MIT
FAQs
split {japanese, english} text into sentences.
The npm package sentence-splitter receives a total of 4,789 weekly downloads. As such, sentence-splitter popularity was classified as popular.
We found that sentence-splitter demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
NVD’s backlog surpasses 20,000 CVEs as analysis slows and NIST announces new system updates to address ongoing delays.
Security News
Research
A malicious npm package disguised as a WhatsApp client is exploiting authentication flows with a remote kill switch to exfiltrate data and destroy files.
Security News
PyPI now supports digital attestations, enhancing security and trust by allowing package maintainers to verify the authenticity of Python packages.