parse-english
English language parser for retext producing nlcst nodes.
Install
This package is ESM only: Node 12+ is needed to use it and it must be import
ed
instead of require
d.
npm:
npm install parse-english
Use
import inspect from 'unist-util-inspect'
import {ParseEnglish} from 'parse-english'
var tree = new ParseEnglish().parse(
'Mr. Henry Brown: A hapless but friendly City of London worker.'
)
console.log(inspect(tree))
Yields:
RootNode[1] (1:1-1:63, 0-62)
└─ ParagraphNode[1] (1:1-1:63, 0-62)
└─ SentenceNode[23] (1:1-1:63, 0-62)
├─ WordNode[2] (1:1-1:4, 0-3)
│ ├─ TextNode: "Mr" (1:1-1:3, 0-2)
│ └─ PunctuationNode: "." (1:3-1:4, 2-3)
├─ WhiteSpaceNode: " " (1:4-1:5, 3-4)
├─ WordNode[1] (1:5-1:10, 4-9)
│ └─ TextNode: "Henry" (1:5-1:10, 4-9)
├─ WhiteSpaceNode: " " (1:10-1:11, 9-10)
├─ WordNode[1] (1:11-1:16, 10-15)
│ └─ TextNode: "Brown" (1:11-1:16, 10-15)
├─ PunctuationNode: ":" (1:16-1:17, 15-16)
├─ WhiteSpaceNode: " " (1:17-1:18, 16-17)
├─ WordNode[1] (1:18-1:19, 17-18)
│ └─ TextNode: "A" (1:18-1:19, 17-18)
├─ WhiteSpaceNode: " " (1:19-1:20, 18-19)
├─ WordNode[1] (1:20-1:27, 19-26)
│ └─ TextNode: "hapless" (1:20-1:27, 19-26)
├─ WhiteSpaceNode: " " (1:27-1:28, 26-27)
├─ WordNode[1] (1:28-1:31, 27-30)
│ └─ TextNode: "but" (1:28-1:31, 27-30)
├─ WhiteSpaceNode: " " (1:31-1:32, 30-31)
├─ WordNode[1] (1:32-1:40, 31-39)
│ └─ TextNode: "friendly" (1:32-1:40, 31-39)
├─ WhiteSpaceNode: " " (1:40-1:41, 39-40)
├─ WordNode[1] (1:41-1:45, 40-44)
│ └─ TextNode: "City" (1:41-1:45, 40-44)
├─ WhiteSpaceNode: " " (1:45-1:46, 44-45)
├─ WordNode[1] (1:46-1:48, 45-47)
│ └─ TextNode: "of" (1:46-1:48, 45-47)
├─ WhiteSpaceNode: " " (1:48-1:49, 47-48)
├─ WordNode[1] (1:49-1:55, 48-54)
│ └─ TextNode: "London" (1:49-1:55, 48-54)
├─ WhiteSpaceNode: " " (1:55-1:56, 54-55)
├─ WordNode[1] (1:56-1:62, 55-61)
│ └─ TextNode: "worker" (1:56-1:62, 55-61)
└─ PunctuationNode: "." (1:62-1:63, 61-62)
API
This package exports the following identifiers: ParseEnglish
.
There is no default export.
parse-english
has the same API as parse-latin
.
Algorithm
All of parse-latin
is included, and the following support for the
English natural language:
- Unit abbreviations (
tsp.
, tbsp.
, oz.
, ft.
, and more) - Time references (
sec.
, min.
, tues.
, thu.
, feb.
, and more) - Business Abbreviations (
Inc.
and Ltd.
) - Social titles (
Mr.
, Mmes.
, Sr.
, and more) - Rank and academic titles (
Dr.
, Rep.
, Gen.
, Prof.
, Pres.
, and more) - Geographical abbreviations (
Ave.
, Blvd.
, Ft.
, Hwy.
, and more) - American state abbreviations (
Ala.
, Minn.
, La.
, Tex.
, and more) - Canadian province abbreviations (
Alta.
, Qué.
, Yuk.
, and more) - English county abbreviations (
Beds.
, Leics.
, Shrops.
, and more) - Common elision (omission of letters) (
’n’
, ’o
, ’em
, ’twas
, ’80s
,
and more)
License
MIT © Titus Wormer