
Product
Announcing Socket Fix 2.0
Socket Fix 2.0 brings targeted CVE remediation, smarter upgrade planning, and broader ecosystem support to help developers get to zero alerts.
compromise
Advanced tools
Compromise is a lightweight natural language processing (NLP) library for Node.js and the browser. It is designed to handle a variety of text processing tasks, such as part-of-speech tagging, named entity recognition, and text analysis, with a focus on ease of use and performance.
Part-of-Speech Tagging
This feature allows you to tag parts of speech in a given text. The code sample demonstrates how to use Compromise to tag parts of speech in a sentence, returning an array of tags for each word.
const nlp = require('compromise');
let doc = nlp('The quick brown fox jumps over the lazy dog.');
let tags = doc.out('tags');
console.log(tags);
Named Entity Recognition
Compromise can identify named entities such as people, places, and organizations. The code sample shows how to extract the names of people from a sentence.
const nlp = require('compromise');
let doc = nlp('Barack Obama was born in Hawaii.');
let people = doc.people().out('text');
console.log(people);
Text Analysis
This feature provides basic sentiment analysis capabilities. The code sample demonstrates how to analyze the sentiment of a sentence, returning a score that indicates the overall sentiment.
const nlp = require('compromise');
let doc = nlp('I really love the new design of your website.');
let sentiment = doc.sentiment();
console.log(sentiment);
Natural is another NLP library for Node.js that provides a wide range of functionalities, including tokenization, stemming, classification, and phonetics. Compared to Compromise, Natural offers more comprehensive NLP features but may require more setup and configuration.
Franc is a library for language detection. It is highly efficient and can detect over 175 languages. While it specializes in language detection, it does not offer the broader NLP capabilities of Compromise, which focuses on text processing and analysis.
npm install compromise
↬ᔐᖜ↬ and how hard it is to actually parse and use?
import nlp from 'compromise'
let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'
if (doc.has('simon says #Verb')) {
return true
}
let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"
and get data:
import plg from 'compromise-speech'
nlp.extend(plg)
let doc = nlp('Milwaukee has certainly had its share of visitors..')
doc.compute('syllables')
doc.places().json()
/*
[{
"text": "Milwaukee",
"terms": [{
"normal": "milwaukee",
"syllables": ["mil", "wau", "kee"]
}]
}]
*/
avoid the problems of brittle parsers:
let doc = nlp("we're not gonna take it..")
doc.has('gonna') // true
doc.has('going to') // true (implicit)
// transform
doc.contractions().expand()
doc.text()
// 'we are not going to take it..'
and whip stuff around like it's data:
let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(20)
doc.text()
// 'ninety five thousand and seventy two'
-because it actually is-
let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'
Use it on the client-side:
<script src="https://unpkg.com/compromise"></script>
<script>
var doc = nlp('two bottles of beer')
doc.numbers().minus(1)
document.body.innerHTML = doc.text()
// 'one bottle of beer'
</script>
or likewise:
import nlp from 'compromise'
var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'
compromise is ~250kb (minified):
it's pretty fast. It can run on keypress:
it works mainly by conjugating all forms of a basic word list.
The final lexicon is ~14,000 words:
you can read more about how it works, here. it's weird.
okay -
compromise/one
A tokenizer
of words, sentences, and punctuation.
import nlp from 'compromise/one'
let doc = nlp("Wayne's World, party time")
let data = doc.json()
/* [{
normal:"wayne's world party time",
terms:[{ text: "Wayne's", normal: "wayne" },
...
]
}]
*/
compromise/one splits your text up, wraps it in a handy API,
/one is quick - most sentences take a 10th of a millisecond.
It can do ~1mb of text a second - or 10 wikipedia pages.
Infinite jest takes 3s.
compromise/two
A part-of-speech
tagger, and grammar-interpreter.
import nlp from 'compromise/two'
let doc = nlp("Wayne's World, party time")
let str = doc.match('#Possessive #Noun').text()
// "Wayne's World"
this is more useful than people sometimes realize.
Light grammar helps you write cleaner templates, and get closer to the information.
compromise has 83 tags, arranged in a handsome graph.
#FirstName → #Person → #ProperNoun → #Noun
you can see the grammar of each word by running doc.debug()
you can see the reasoning for each tag with nlp.verbose('tagger')
.
if you prefer Penn tags, you can derive them with:
let doc = nlp('welcome thrillho')
doc.compute('penn')
doc.json()
compromise/three
Phrase
and sentence tooling.
import nlp from 'compromise/three'
let doc = nlp("Wayne's World, party time")
let str = doc.people().normalize().text()
// "wayne"
compromise/three is a set of tooling to zoom into and operate on parts of a text.
.numbers()
grabs all the numbers in a document, for example - and extends it with new methods, like .subtract()
.
When you have a phrase, or group of words, you can see additional metadata about it with .json()
let doc = nlp('four out of five dentists')
console.log(doc.fractions().json())
/*[{
text: 'four out of five',
terms: [ [Object], [Object], [Object], [Object] ],
fraction: { numerator: 4, denominator: 5, decimal: 0.8 }
}
]*/
let doc = nlp('$4.09CAD')
doc.money().json()
/*[{
text: '$4.09CAD',
terms: [ [Object] ],
number: { prefix: '$', num: 4.09, suffix: 'cad'}
}
]*/
(match methods use the match-syntax.)
(these methods are on the main nlp
object)
nlp.tokenize(str) - parse text without running POS-tagging
nlp.lazy(str, match) - scan through a text with minimal analysis
nlp.plugin({}) - mix in a compromise-plugin
nlp.parseMatch(str) - pre-parse any match statements into json
nlp.world() - grab or change library internals
nlp.model() - grab all current linguistic data
nlp.methods() - grab or change internal methods
nlp.hooks() - see which compute methods run automatically
nlp.verbose(mode) - log our decision-making for debugging
nlp.version - current semver version of the library
nlp.addWords(obj, isFrozen?) - add new words to the lexicon
nlp.addTags(obj) - add new tags to the tagSet
nlp.typeahead(arr) - add words to the auto-fill dictionary
nlp.buildTrie(arr) - compile a list of words into a fast lookup form
nlp.buildNet(arr) - compile a list of matches into a fast match form
'football captain' → 'football captains'
'turnovers' → 'turnover'
'will go' → 'went'
'walked' → 'walks'
'walked' → 'will walk'
'walks' → 'walk'
'walks' → 'walking'
'drive' → 'had driven'
'went' → 'did not go'
"didn't study" → 'studied'
5
five
fifth
or 5th
five
or 5
'$2.50'
he walks
-> he walked
he walked
-> he walks
he walks
-> he will walk
he walks
-> he walk
he walks
-> he didn't walk
?
!
?
or !
'quick'
'wash-out'
'(939) 555-0113'
'#nlp'
'hi@compromise.cool'
:)
💋
'@nlp_compromise'
'compromise.cool'
'he'
'but'
'of'
'Mrs.'
people()
+ places()
+ organizations()
'quickly'
'FBI'
"Spencer's"
This library comes with a considerate, common-sense baseline for english grammar.
You're free to change, or lay-waste to any settings - which is the fun part actually.
the easiest part is just to suggest tags for any given words:
let myWords = {
kermit: 'FirstName',
fozzie: 'FirstName',
}
let doc = nlp(muppetText, myWords)
or make heavier changes with a compromise-plugin.
import nlp from 'compromise'
nlp.extend({
// add new tags
tags: {
Character: {
isA: 'Person',
notA: 'Adjective',
},
},
// add or change words in the lexicon
words: {
kermit: 'Character',
gonzo: 'Character',
},
// change inflections
irregulars: {
get: {
pastTense: 'gotten',
gerund: 'gettin',
},
},
// add new methods to compromise
api: View => {
View.prototype.kermitVoice = function () {
this.sentences().prepend('well,')
this.match('i [(am|was)]').prepend('um,')
return this
}
},
})
These are some helpful extensions:
npm install compromise-dates
June 8th
or 03/03/18
2 weeks
or 5mins
4:30pm
or half past five
npm install compromise-stats
.tfidf({}) - rank words by frequency and uniqueness
.ngrams({}) - list all repeating sub-phrases, by word-count
.unigrams() - n-grams with one word
.bigrams() - n-grams with two words
.trigrams() - n-grams with three words
.startgrams() - n-grams including the first term of a phrase
.endgrams() - n-grams including the last term of a phrase
.edgegrams() - n-grams including the first or last term of a phrase
npm install compromise-syllables
npm install compromise-wikipedia
we're committed to typescript/deno support, both in main and in the official-plugins:
import nlp from 'compromise'
import stats from 'compromise-stats'
const nlpEx = nlp.extend(stats)
nlpEx('This is type safe!').ngrams({ min: 1 })
slash-support:
We currently split slashes up as different words, like we do for hyphens. so things like this don't work:
nlp('the koala eats/shoots/leaves').has('koala leaves') //false
inter-sentence match:
By default, sentences are the top-level abstraction.
Inter-sentence, or multi-sentence matches aren't supported without a plugin:
nlp("that's it. Back to Winnipeg!").has('it back')//false
nested match syntax:
the danger beauty of regex is that you can recurse indefinitely.
Our match syntax is much weaker. Things like this are not (yet) possible:
doc.match('(modern (major|minor))? general')
complex matches must be achieved with successive .match() statements.
dependency parsing: Proper sentence transformation requires understanding the syntax tree of a sentence, which we don't currently do. We should! Help wanted with this.
en-pos - very clever javascript pos-tagger by Alex Corvi
naturalNode - fancier statistical nlp in javascript
winkJS - POS-tagger, tokenizer, machine-learning in javascript
dariusk/pos-js - fastTag fork in javascript
compendium-js - POS and sentiment analysis in javascript
nodeBox linguistics - conjugation, inflection in javascript
reText - very impressive text utilities in javascript
superScript - conversation engine in js
jsPos - javascript build of the time-tested Brill-tagger
spaCy - speedy, multilingual tagger in C/python
Prose - quick tagger in Go by Joseph Kato
TextBlob - python tagger
MIT
FAQs
modest natural language processing
We found that compromise demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket Fix 2.0 brings targeted CVE remediation, smarter upgrade planning, and broader ecosystem support to help developers get to zero alerts.
Security News
Socket CEO Feross Aboukhadijeh joins Risky Business Weekly to unpack recent npm phishing attacks, their limited impact, and the risks if attackers get smarter.
Product
Socket’s new Tier 1 Reachability filters out up to 80% of irrelevant CVEs, so security teams can focus on the vulnerabilities that matter.