What is retext?
Retext is a natural language processing (NLP) framework for analyzing and manipulating text. It provides a variety of plugins to perform tasks such as spell checking, sentiment analysis, and readability scoring.
What are retext's main functionalities?
Spell Checking
This feature allows you to check the spelling of text. The code sample demonstrates how to use the `retext-spell` plugin with an English dictionary to identify spelling errors in a given text.
const retext = require('retext');
const retextSpell = require('retext-spell');
const dictionary = require('dictionary-en');
const report = require('vfile-reporter');
retext()
.use(retextSpell, dictionary)
.process('speling errror', function (err, file) {
console.error(report(err || file));
});
Sentiment Analysis
This feature allows you to analyze the sentiment of text. The code sample demonstrates how to use the `retext-sentiment` plugin to determine the sentiment polarity and valence of a given text.
const retext = require('retext');
const retextSentiment = require('retext-sentiment');
const report = require('vfile-reporter');
retext()
.use(retextSentiment)
.process('I love programming!', function (err, file) {
console.error(report(err || file));
console.log(file.data); // { polarity: 3, valence: 1 }
});
Readability Scoring
This feature allows you to score the readability of text. The code sample demonstrates how to use the `retext-readability` plugin to determine the readability age of a given text.
const retext = require('retext');
const retextReadability = require('retext-readability');
const report = require('vfile-reporter');
retext()
.use(retextReadability, { age: 18 })
.process('The cat sat on the mat.', function (err, file) {
console.error(report(err || file));
console.log(file.data); // { readability: { age: 6 } }
});
Other packages similar to retext
natural
Natural is a general natural language processing (NLP) library for Node.js. It provides functionalities such as tokenization, stemming, classification, and phonetics. Compared to retext, Natural offers a broader range of NLP tools but may not be as specialized in text analysis and manipulation.
compromise
Compromise is a lightweight NLP library for Node.js that focuses on fast and easy text processing. It provides features like part-of-speech tagging, named entity recognition, and text normalization. While it offers some similar functionalities to retext, it is designed to be more user-friendly and performant for common NLP tasks.
Hey all! First, thanks a lot for watching, starring, and forking retext!
Secondly, I wanted to invite you all to leave any feedback or issues you might have, to help me make retext even cooler :smile:.
retext is an extensible natural language system—by default using parse-latin to transform natural language into a TextOM object model. Retext provides a pluggable system for analysing and manipulating natural language in JavaScript. NodeJS and the browser. Tests provide 100% coverage.
Rather than being a do-all library for Natural Language Processing (such as NLTK or OpenNLP), retext aims to be useful for more practical use cases (such as censoring profane words or decoding emoticons, but the possibilities are endless) instead of more academic goals (research purposes).
retext is inherently modular—it uses plugins (similar to rework for CSS) instead of providing everything out of the box (such as Natural). This makes retext a viable tool for use on the web.
Installation
npm:
$ npm install retext
Component:
$ component install wooorm/retext
Bower:
$ bower install retext
Usage
The following example uses retext-emoji (to show emoji) and retext-smartypants (for smart punctuation).
var Retext = require('retext'),
emoji = require('retext-emoji'),
smartypants = require('retext-smartypants');
var retext = new Retext()
.use(emoji, {
'convert' : 'encode'
})
.use(smartypants);
retext.parse(
'The three wise monkeys [. . .] sometimes called the ' +
'three mystic apes--are a pictorial maxim. Together ' +
'they embody the proverbial principle to ("see no evil, ' +
'hear no evil, speak no evil"). The three monkeys are ' +
'Mizaru (:see_no_evil:), covering his eyes, who sees no ' +
'evil; Kikazaru (:hear_no_evil:), covering his ears, ' +
'who hears no evil; and Iwazaru (:speak_no_evil:), ' +
'covering his mouth, who speaks no evil.',
function (err, tree) {
if (err) {
throw err;
}
console.log(tree.toString());
}
);
API
Retext(parser?)
var Retext = require('retext'),
ParseEnglish = require('parse-english');
var retext = new Retext(new ParseEnglish());
retext.parse(, function (err, tree) {});
Return a new Retext
instance with the given parser (defaults to an instance of parse-latin).
Retext#use(function(Retext, Object), options?)
Takes a plugin—a humble function to transform the object model.
Can return a function (function(Node, Object, next)
) which is given the document as created by Retext#parse()
before its given to the user.
Retext#parse(value, options?, function(Error, Node))
Parses the given source and, when done, passes either an error (the first argument), or the (by use
d plugins, modified) document (the second argument) to the callback.
Plugins
Desired Plugins
Hey! Want to create one of the following, or any other plugin, for retext but not sure where to start? I suggest to read retext-visit’s source code to see how it’s build first (it’s probably the most straight forward to learn), and go from there.
Let me know if you still have any questions, go ahead and send me feedback or raise an issue.
- retext-date — Detect time and date in text;
- retext-emoticon — Like retext-emoji, but for general emoticons;
- retext-frequent-words — Like retext-keywords, but based on frequency and stop-words instead of a POS-tagger;
- retext-hyphen — Insert soft-hyphens where needed; this might have to be implemented with some sort of node which doesn't stringify;
- retext-location — Track the position of nodes (line, column);
- retext-no-pants — Opposite of retext-smartypants;
- retext-no-break — Inserts non-breaking spaces between things like “100 km”;
- retext-profanity — Censor profane words;
- retext-punctuation-pair — Detect which opening or initial punctuation, belongs to which closing or final punctuation mark (and vice versa);
- retext-summary — Summarise text;
- retext-sync — Detect changes in a textarea (or contenteditable?), sync the diffs over to a retext tree, let plugins modify the content, and sync the diffs back to the textarea;
- retext-typography — Applies typographic enhancements, like (or using?) retext-smartypants and retext-hyphen;
- retraverse — Like Estraverse.
Parsers
Benchmark
On a MacBook Air, it parses about 2 big articles, 22 sections, or 202 paragraphs per second.
retext.parse(source);
202 op/s » A paragraph (5 sentences, 100 words)
22 op/s » A section (10 paragraphs, 50 sentences, 1,000 words)
2 op/s » An article (100 paragraphs, 500 sentences, 10,000 words)
Related
License
MIT © Titus Wormer