New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

parse-latin

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

parse-latin

Latin-script (natural language) parser

7.0.0
latest
Source
npm

Version published: 2 years ago

Weekly downloads: 478K; decreased by-11.38%

Maintainers: 1

Weekly downloads

Created: 11 years ago

What is parse-latin?

The parse-latin npm package is a JavaScript library used to parse Latin-script natural language into a syntax tree. It is particularly useful for text processing tasks such as tokenization, sentence splitting, and word segmentation.

What are parse-latin's main functionalities?

Tokenization

This feature allows you to tokenize a given text into individual tokens (words, punctuation, etc.). The code sample demonstrates how to tokenize a simple sentence.

const ParseLatin = require('parse-latin');
const parser = new ParseLatin();
const tokens = parser.tokenize('This is a sentence.');
console.log(tokens);

Sentence Splitting

This feature enables you to split a paragraph into individual sentences. The code sample shows how to split a paragraph into separate sentences.

const ParseLatin = require('parse-latin');
const parser = new ParseLatin();
const sentences = parser.tokenizeParagraph('This is a sentence. This is another sentence.');
console.log(sentences);

Word Segmentation

This feature allows you to segment a sentence into individual words. The code sample demonstrates how to segment a sentence into words.

const ParseLatin = require('parse-latin');
const parser = new ParseLatin();
const words = parser.tokenizeWords('This is a sentence.');
console.log(words);

Other packages similar to parse-latin

parse-latin

Natural language parser, for Latin-script languages, that produces nlcst.

What is this?

This package exposes a parser that takes Latin-script natural language and produces a syntax tree.

When should I use this?

If you want to handle natural language as syntax trees manually, use this.

Alternatively, you can use the retext plugin retext-latin, which wraps this project to also parse natural language at a higher-level (easier) abstraction.

Whether Old-English (“þā gewearþ þǣm hlāforde and þǣm hȳrigmannum wiþ ānum penninge”), Icelandic (“Hvað er að frétta”), French (“Où sont les toilettes?”), this project does a good job at tokenizing it.

For English and Dutch, you can instead use parse-english and parse-dutch.

You can somewhat use this for Latin-like scripts, such as Cyrillic (“привет”), Georgian (“გამარჯობა”), Armenian (“Բարեւ”), and such.

Install

This package is ESM only. In Node.js (version 16+), install with npm:

npm install parse-latin

In Deno with esm.sh:

import {ParseLatin} from 'https://esm.sh/parse-latin@7'

In browsers with esm.sh:

<script type="module">
  import {ParseLatin} from 'https://esm.sh/parse-latin@7?bundle'
</script>

Use

import {ParseLatin} from 'parse-latin'
import {inspect} from 'unist-util-inspect'

const tree = new ParseLatin().parse('A simple sentence.')

console.log(inspect(tree))

Yields:

RootNode[1] (1:1-1:19, 0-18)
└─0 ParagraphNode[1] (1:1-1:19, 0-18)
    └─0 SentenceNode[6] (1:1-1:19, 0-18)
        ├─0 WordNode[1] (1:1-1:2, 0-1)
        │   └─0 TextNode "A" (1:1-1:2, 0-1)
        ├─1 WhiteSpaceNode " " (1:2-1:3, 1-2)
        ├─2 WordNode[1] (1:3-1:9, 2-8)
        │   └─0 TextNode "simple" (1:3-1:9, 2-8)
        ├─3 WhiteSpaceNode " " (1:9-1:10, 8-9)
        ├─4 WordNode[1] (1:10-1:18, 9-17)
        │   └─0 TextNode "sentence" (1:10-1:18, 9-17)
        └─5 PunctuationNode "." (1:18-1:19, 17-18)

API

This package exports the identifier ParseLatin. There is no default export.

`ParseLatin()`

Create a new parser.

`ParseLatin#parse(value)`

Turn natural language into a syntax tree.

Parameters

value (string, optional) — value to parse

Returns

Tree (RootNode).

Algorithm

👉 Note: The easiest way to see how parse-latin parses, is by using the online parser demo, which shows the syntax tree corresponding to the typed text.

parse-latin splits text into white space, punctuation, symbol, and word tokens:

“word” is one or more unicode letters or numbers
“white space” is one or more unicode white space characters
“punctuation” is one or more unicode punctuation characters
“symbol” is one or more of anything else

Then, it manipulates and merges those tokens into a syntax tree, adding sentences and paragraphs where needed.

some punctuation marks are part of the word they occur in, such as non-profit, she’s, G.I., 11:00, N/A, &c, nineteenth- and…
some periods do not mark a sentence end, such as 1., e.g., id.
although periods, question marks, and exclamation marks (sometimes) end a sentence, that end might not occur directly after the mark, such as .), ."
…and many more exceptions

Types

This package is fully typed with TypeScript. It exports no additional types.

Compatibility

Projects maintained by me are compatible with maintained versions of Node.js.

When I cut a new major release, I drop support for unmaintained versions of Node. This means I try to keep the current release line, parse-latin@^7, compatible with Node.js 16.

Security

This package is safe.

parse-english — English (natural language) parser
parse-dutch — Dutch (natural language) parser

Contribute

Yes please! See How to Contribute to Open Source.

License

Keywords

FAQs

What is parse-latin?

Is parse-latin popular?

Is parse-latin well maintained?

Package last updated on 17 Jul 2023

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

parse-latin

What is parse-latin?

What are parse-latin's main functionalities?

Other packages similar to parse-latin

parse-latin

Contents

What is this?

When should I use this?

Install

Use

API

`ParseLatin()`

`ParseLatin#parse(value)`

Parameters

Returns

Algorithm

Types

Compatibility

Security

Contribute

License

Keywords

Related posts

parse-latin

What is parse-latin?

What are parse-latin's main functionalities?

Other packages similar to parse-latin

compromise

natural

parse-latin

Contents

What is this?

When should I use this?

Install

Use

API

ParseLatin()

ParseLatin#parse(value)

Parameters

Returns

Algorithm

Types

Compatibility

Security

Related

Contribute

License

Keywords

Related posts

TC39 Advances 3 Proposals to Stage 4: RegExp Escaping, Float16Array, and Redeclarable global eval vars

Deno 2.2 Improves Dependency Management and Expands Node.js Compatibility

React Team Updates CRA Migration Guidance After Community Pushback

`ParseLatin()`

`ParseLatin#parse(value)`