New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

parse-english

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

parse-english

English natural language parser

7.0.0
latest
Source
npm

Version published: 2 years ago

Weekly downloads: 88K; decreased by-3.91%

Maintainers: 1

Weekly downloads

Created: 11 years ago

Source

parse-english

Natural language parser, for the English language, that produces nlcst.

What is this?

This package exposes a parser that takes English natural language and produces a syntax tree.

When should I use this?

If you want to handle English natural language as syntax trees manually, use this.

Alternatively, you can use the retext plugin retext-english, which wraps this project to also parse natural language at a higher-level (easier) abstraction.

For Dutch or most Latin-script languages, you can instead use parse-dutch or parse-latin.

Install

This package is ESM only. In Node.js (version 16+), install with npm:

npm install parse-english

In Deno with esm.sh:

import {ParseEnglish} from 'https://esm.sh/parse-english@7'

In browsers with esm.sh:

<script type="module">
  import {ParseEnglish} from 'https://esm.sh/parse-english@7?bundle'
</script>

Use

import {ParseEnglish} from 'parse-english'
import {inspect} from 'unist-util-inspect'

const tree = new ParseEnglish().parse(
  'Mr. Henry Brown: A hapless but friendly City of London worker.'
)

console.log(inspect(tree))

Yields:

RootNode[1] (1:1-1:63, 0-62)
└─0 ParagraphNode[1] (1:1-1:63, 0-62)
    └─0 SentenceNode[23] (1:1-1:63, 0-62)
        ├─0  WordNode[2] (1:1-1:4, 0-3)
        │    ├─0 TextNode "Mr" (1:1-1:3, 0-2)
        │    └─1 PunctuationNode "." (1:3-1:4, 2-3)
        ├─1  WhiteSpaceNode " " (1:4-1:5, 3-4)
        ├─2  WordNode[1] (1:5-1:10, 4-9)
        │    └─0 TextNode "Henry" (1:5-1:10, 4-9)
        ├─3  WhiteSpaceNode " " (1:10-1:11, 9-10)
        ├─4  WordNode[1] (1:11-1:16, 10-15)
        │    └─0 TextNode "Brown" (1:11-1:16, 10-15)
        ├─5  PunctuationNode ":" (1:16-1:17, 15-16)
        ├─6  WhiteSpaceNode " " (1:17-1:18, 16-17)
        ├─7  WordNode[1] (1:18-1:19, 17-18)
        │    └─0 TextNode "A" (1:18-1:19, 17-18)
        ├─8  WhiteSpaceNode " " (1:19-1:20, 18-19)
        ├─9  WordNode[1] (1:20-1:27, 19-26)
        │    └─0 TextNode "hapless" (1:20-1:27, 19-26)
        ├─10 WhiteSpaceNode " " (1:27-1:28, 26-27)
        ├─11 WordNode[1] (1:28-1:31, 27-30)
        │    └─0 TextNode "but" (1:28-1:31, 27-30)
        ├─12 WhiteSpaceNode " " (1:31-1:32, 30-31)
        ├─13 WordNode[1] (1:32-1:40, 31-39)
        │    └─0 TextNode "friendly" (1:32-1:40, 31-39)
        ├─14 WhiteSpaceNode " " (1:40-1:41, 39-40)
        ├─15 WordNode[1] (1:41-1:45, 40-44)
        │    └─0 TextNode "City" (1:41-1:45, 40-44)
        ├─16 WhiteSpaceNode " " (1:45-1:46, 44-45)
        ├─17 WordNode[1] (1:46-1:48, 45-47)
        │    └─0 TextNode "of" (1:46-1:48, 45-47)
        ├─18 WhiteSpaceNode " " (1:48-1:49, 47-48)
        ├─19 WordNode[1] (1:49-1:55, 48-54)
        │    └─0 TextNode "London" (1:49-1:55, 48-54)
        ├─20 WhiteSpaceNode " " (1:55-1:56, 54-55)
        ├─21 WordNode[1] (1:56-1:62, 55-61)
        │    └─0 TextNode "worker" (1:56-1:62, 55-61)
        └─22 PunctuationNode "." (1:62-1:63, 61-62)

API

This package exports the identifier ParseEnglish. There is no default export.

`ParseEnglish()`

Create a new parser.

ParseEnglish extends ParseLatin. See parse-latin for API docs.

Algorithm

All of parse-latin is included, and the following support for the English natural language:

unit abbreviations (tsp., tbsp., oz., ft., and more)
time references (sec., min., tues., thu., feb., and more)
business Abbreviations (Inc. and Ltd.)
social titles (Mr., Mmes., Sr., and more)
rank and academic titles (Dr., Rep., Gen., Prof., Pres., and more)
geographical abbreviations (Ave., Blvd., Ft., Hwy., and more)
American state abbreviations (Ala., Minn., La., Tex., and more)
Canadian province abbreviations (Alta., Qué., Yuk., and more)
English county abbreviations (Beds., Leics., Shrops., and more)
common elision (omission of letters) (’n’, ’o, ’em, ’twas, ’80s, and more)

Types

This package is fully typed with TypeScript. It exports no additional types.

Compatibility

Projects maintained by me are compatible with maintained versions of Node.js.

When I cut a new major release, I drop support for unmaintained versions of Node. This means I try to keep the current release line, parse-english@^7, compatible with Node.js 16.

Security

This package is safe.

parse-latin — Latin-script natural language parser
parse-dutch — Dutch natural language parser

Contribute

Yes please! See How to Contribute to Open Source.

License

Keywords

FAQs

What is parse-english?

Is parse-english popular?

Is parse-english well maintained?

Package last updated on 17 Jul 2023

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

parse-english

parse-english

Contents

What is this?

When should I use this?

Install

Use

API

`ParseEnglish()`

Algorithm

Types

Compatibility

Security

Contribute

License

Keywords

Related posts

parse-english

parse-english

Contents

What is this?

When should I use this?

Install

Use

API

ParseEnglish()

Algorithm

Types

Compatibility

Security

Related

Contribute

License

Keywords

Related posts

Bybit Hack Puts Crypto Losses at $1.6B, Surpassing All of Last Year in Just Two Months

OpenSSF Launches Open Source Project Security Baseline to Strengthen Software Supply Chain

`ParseEnglish()`