Product
Introducing Java Support in Socket
We're excited to announce that Socket now supports the Java programming language.
parse-latin
Advanced tools
The parse-latin npm package is a JavaScript library used to parse Latin-script natural language into a syntax tree. It is particularly useful for text processing tasks such as tokenization, sentence splitting, and word segmentation.
Tokenization
This feature allows you to tokenize a given text into individual tokens (words, punctuation, etc.). The code sample demonstrates how to tokenize a simple sentence.
const ParseLatin = require('parse-latin');
const parser = new ParseLatin();
const tokens = parser.tokenize('This is a sentence.');
console.log(tokens);
Sentence Splitting
This feature enables you to split a paragraph into individual sentences. The code sample shows how to split a paragraph into separate sentences.
const ParseLatin = require('parse-latin');
const parser = new ParseLatin();
const sentences = parser.tokenizeParagraph('This is a sentence. This is another sentence.');
console.log(sentences);
Word Segmentation
This feature allows you to segment a sentence into individual words. The code sample demonstrates how to segment a sentence into words.
const ParseLatin = require('parse-latin');
const parser = new ParseLatin();
const words = parser.tokenizeWords('This is a sentence.');
console.log(words);
Compromise is a natural language processing library for JavaScript that provides a wide range of text processing functionalities, including tokenization, part-of-speech tagging, and named entity recognition. Compared to parse-latin, Compromise offers more advanced NLP features and is more versatile.
Natural is a general natural language processing library for JavaScript. It includes functionalities such as tokenization, stemming, classification, and phonetics. Natural is more feature-rich compared to parse-latin and is suitable for a wide range of NLP tasks.
Natural language parser, for Latin-script languages, that produces nlcst.
This package exposes a parser that takes Latin-script natural language and produces a syntax tree.
If you want to handle natural language as syntax trees manually, use this.
Alternatively, you can use the retext plugin retext-latin
,
which wraps this project to also parse natural language at a higher-level
(easier) abstraction.
Whether Old-English (“þā gewearþ þǣm hlāforde and þǣm hȳrigmannum wiþ ānum penninge”), Icelandic (“Hvað er að frétta”), French (“Où sont les toilettes?”), this project does a good job at tokenizing it.
For English and Dutch, you can instead use parse-english
and
parse-dutch
.
You can somewhat use this for Latin-like scripts, such as Cyrillic (“привет”), Georgian (“გამარჯობა”), Armenian (“Բարեւ”), and such.
This package is ESM only. In Node.js (version 16+), install with npm:
npm install parse-latin
In Deno with esm.sh
:
import {ParseLatin} from 'https://esm.sh/parse-latin@7'
In browsers with esm.sh
:
<script type="module">
import {ParseLatin} from 'https://esm.sh/parse-latin@7?bundle'
</script>
import {ParseLatin} from 'parse-latin'
import {inspect} from 'unist-util-inspect'
const tree = new ParseLatin().parse('A simple sentence.')
console.log(inspect(tree))
Yields:
RootNode[1] (1:1-1:19, 0-18)
└─0 ParagraphNode[1] (1:1-1:19, 0-18)
└─0 SentenceNode[6] (1:1-1:19, 0-18)
├─0 WordNode[1] (1:1-1:2, 0-1)
│ └─0 TextNode "A" (1:1-1:2, 0-1)
├─1 WhiteSpaceNode " " (1:2-1:3, 1-2)
├─2 WordNode[1] (1:3-1:9, 2-8)
│ └─0 TextNode "simple" (1:3-1:9, 2-8)
├─3 WhiteSpaceNode " " (1:9-1:10, 8-9)
├─4 WordNode[1] (1:10-1:18, 9-17)
│ └─0 TextNode "sentence" (1:10-1:18, 9-17)
└─5 PunctuationNode "." (1:18-1:19, 17-18)
This package exports the identifier ParseLatin
.
There is no default export.
ParseLatin()
Create a new parser.
ParseLatin#parse(value)
Turn natural language into a syntax tree.
value
(string
, optional)
— value to parseTree (RootNode
).
👉 Note: The easiest way to see how
parse-latin
parses, is by using the online parser demo, which shows the syntax tree corresponding to the typed text.
parse-latin
splits text into white space, punctuation, symbol, and word
tokens:
Then, it manipulates and merges those tokens into a syntax tree, adding sentences and paragraphs where needed.
non-profit
, she’s
, G.I.
, 11:00
, N/A
, &c
, nineteenth- and…
1.
, e.g.
, id.
.)
,
."
This package is fully typed with TypeScript. It exports no additional types.
Projects maintained by me are compatible with maintained versions of Node.js.
When I cut a new major release, I drop support for unmaintained versions of
Node.
This means I try to keep the current release line, parse-latin@^7
, compatible
with Node.js 16.
This package is safe.
parse-english
— English (natural language) parserparse-dutch
— Dutch (natural language) parserYes please! See How to Contribute to Open Source.
FAQs
Latin-script (natural language) parser
The npm package parse-latin receives a total of 351,540 weekly downloads. As such, parse-latin popularity was classified as popular.
We found that parse-latin demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
We're excited to announce that Socket now supports the Java programming language.
Security News
Socket detected a malicious Python package impersonating a popular browser cookie library to steal passwords, screenshots, webcam images, and Discord tokens.
Security News
Deno 2.0 is now available with enhanced package management, full Node.js and npm compatibility, improved performance, and support for major JavaScript frameworks.