compromise

Dependencies

Maintainers

Versions

169

Alerts

File Explorer

Advanced tools

License

Install Socket

Detect and block malicious and high-risk dependencies

Install

compromise

modest natural language processing

13.11.4-rc7
Source
npm

Version published: 3 years ago

Weekly downloads: 52K; decreased by-5.79%

Maintainers: 3

Weekly downloads

Created: 11 years ago

Source

compromise

modest natural language processing

npm install compromise

_{by
Spencer Kelly and

many contributors}

do you find it strange, how we struggle to parse text?

↬_ᔐᖜ↬-

error-prone

tricky

_{how easy text is to make, then how difficult it is to use?}

_{how it becomes}

_{basically a dead-end}
_{for our information?}

compromise tries its best to turn text into data.

it makes limited and sensible decisions.
_{it is not as smart as you'd think.}

import nlp from 'compromise'

let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'

the idea is to be not fancy at all:

if (doc.has('simon says #Verb')) {
  return true
}

pull-out parts of a text:

let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"

match docs

compute metadata, and grab it:

import plg from 'compromise-speech'
nlp.extend(plg)

let doc = nlp('Milwaukee has certainly had its share of visitors..')
doc.compute('syllables')
doc.places().json()
/*
[{
  "text": "Milwaukee",
  "terms": [{ 
    "normal": "milwaukee",
    "syllables": ["mil", "waukee"]
  }]
}]
*/

quickly flip between parsed and unparsed forms:

let doc = nlp('soft and yielding like a nerf ball')
doc.out({ '#Adjective': (m)=>`<i>${m.text()}</i>` })
// '<i>soft</i> and <i>yielding</i> like a nerf ball'

output docs

avoid idiomatic problems, and brittle parsers:

let doc = nlp("we're not gonna take it..")

doc.has('gonna') // true
doc.has('going to') // true (implicit)

// transform
doc.contractions().expand()
dox.text()
// 'we are not going to take it..'

contraction docs

whip stuff around like it's data:

let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(20)
doc.text()
// 'ninety five thousand and seventy two'

number docs

_{because it actually is:}

let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'

noun docs

Use it on the client-side:

<script src="https://unpkg.com/compromise"></script>
<script>
  var doc = nlp('two bottles of beer')
  doc.numbers().minus(1)
  document.body.innerHTML = doc.text()
  // 'one bottle of beer'
</script>

or likewise:

import nlp from 'compromise'

var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'

compromise is ~200kb (minified):

it's pretty fast. It can run on keypress:

it works mainly by conjugating all forms of a basic word list.

The final lexicon is ~14,000 words:

you can read more about how it works, here. it's weird.

_okay,

`compromise/one`

A tokenizer of words, sentences, and punctuation.

import nlp from 'compromise/one'

let doc = nlp("Wayne's World, party time")
let data = doc.json()
/* [{ 
    normal:"wayne's world party time",
    terms:[{ text: "Wayne's", normal: "wayne" }, 
      ...
      ] 
  }]
*/

one splits your text up, wraps it in a handy API,

_{and does nothing else -}

Output

.text() - return the document as text
.json() - return the document as data
.debug() - pretty-print the interpreted document

Utils

.all() - return the whole original document ('zoom out')
.found [getter] - is this document empty?
.tagger() - (re-)run the part-of-speech tagger on this document
.wordCount() - count the # of terms in the document
.length [getter] - count the # of characters in the document (string length)
.clone() - deep-copy the document, so that no references remain
.cache({}) - freeze the current state of the document, for speed-purposes
.uncache() - un-freezes the current state of the document, so it may be transformed

Accessors

.first(n) - use only the first result(s)
.last(n) - use only the last result(s)
.slice(n,n) - grab a subset of the results
.eq(n) - use only the nth result
.terms() - split-up results by each individual term
.firstTerms() - get the first word in each match
.lastTerms() - get the end word in each match
.sentences() - get the whole sentence for each match
.termList() - return a flat list of all Term objects in match
.groups('') - grab any named capture-groups from a match

Match

(match methods use the match-syntax.)

.match('') - return a new Doc, with this one as a parent
.not('') - return all results except for this
.matchOne('') - return only the first match
.if('') - return each current phrase, only if it contains this match ('only')
.ifNo('') - Filter-out any current phrases that have this match ('notIf')
.has('') - Return a boolean if this match exists
.lookBehind('') - search through earlier terms, in the sentence
.lookAhead('') - search through following terms, in the sentence
.before('') - return all terms before a match, in each phrase
.after('') - return all terms after a match, in each phrase
.lookup([]) - quick find for an array of string matches

Case

.toLowerCase() - turn every letter of every term to lower-cse
.toUpperCase() - turn every letter of every term to upper case
.toTitleCase() - upper-case the first letter of each term
.toCamelCase() - remove whitespace and title-case each term

Whitespace

.pre('') - add this punctuation or whitespace before each match
.post('') - add this punctuation or whitespace after each match
.trim() - remove start and end whitespace
.hyphenate() - connect words with hyphen, and remove whitespace
.dehyphenate() - remove hyphens between words, and set whitespace
.toQuotations() - add quotation marks around these matches
.toParentheses() - add brackets around these matches

Loops

.map(fn) - run each phrase through a function, and create a new document
.forEach(fn) - run a function on each phrase, as an individual document
.filter(fn) - return only the phrases that return true
.find(fn) - return a document with only the first phrase that matches
.some(fn) - return true or false if there is one matching phrase
.random(fn) - sample a subset of the results

Insert

.replace(match, replace) - search and replace match with new content
.replaceWith(replace) - substitute-in new text
.delete() - fully remove these terms from the document
.append(str) - add these new terms to the end (insertAfter)
.prepend(str) - add these new terms to the front (insertBefore)
.concat() - add these new things to the end

Transform

.sort('method') - re-arrange the order of the matches (in place)
.reverse() - reverse the order of the matches, but not the words
.normalize({}) - clean-up the text in various ways
.unique() - remove any duplicate matches
.split('') - return a Document with three parts for every match ('splitOn')
.splitBefore('') - partition a phrase before each matching segment
.splitAfter('') - partition a phrase after each matching segment
.segment({}) - split a document into labeled sections
.join('') - make all phrases into one phrase

one is fast - most sentences take a 10th of a millisecond.

It can do ~1mb of text a second - or 10 wikipedia pages.

Infinite jest is takes 3s.

You can also paralellize, or stream text to it with compromise-speed.

`compromise/two`

A part-of-speech tagger, and grammar-interpreter.

import nlp from 'compromise/two'

let doc = nlp("Wayne's World, party time")
let str = doc.match('#Possessive #Noun').text()
// "Wayne's World"

two automatically calculates the very basic grammar of each word.

_{this is more useful than people sometimes realize.}

Really light grammar helps you write cleaner templates, and get closer to the information.

Part-of-speech tagging is profoundly-difficult task to get 100% on. It is also a profoundly easy task to get 85% on.

Contractions

.contractions() - things like "didn't"
.contractions().expand() - things like "didn't"

you can see the grammar of each word by running doc.debug(), and the reasoning for each tag with nlp.verbose('tagger').

compromise has 83 tags, arranged in a handsome graph.

#FirstName → #Person → #ProperNoun → #Noun

if you prefer Penn tags, you can derive them with:

let doc = nlp('welcome thrillho')
doc.compute('penn')
doc.json()

`compromise/three`

Phrase and sentence tooling.

import nlp from 'compromise/three'

let doc = nlp("Wayne's World, party time")
let str = doc.people().normalize().text()
// "wayne"

three is a set of tooling to zoom into and operate on parts of a text.

.numbers() grabs all the numbers in a document, for example - and extends it with new methods, like .subtract().

Nouns

.nouns() - return any subsequent terms tagged as a Noun
- .nouns().json() - overloaded output with noun metadata
- .nouns().adjectives() - get any adjectives describing this noun
- .nouns().toPlural() - 'football captain' → 'football captains'
- .nouns().toSingular() - 'turnovers' → 'turnover'
- .nouns().isPlural() - return only plural nouns
- .nouns().isSingular() - return only singular nouns
- .nouns().hasPlural() - return only nouns that can be inflected as plural
- .nouns().toPossessive() - add a 's to the end, in a safe manner.

Verbs

.verbs() - return any subsequent terms tagged as a Verb
- .verbs().json() - overloaded output with verb metadata
- .verbs().conjugate() - return all forms of these verbs
- .verbs().toPastTense() - 'will go' → 'went'
- .verbs().toPresentTense() - 'walked' → 'walks'
- .verbs().toFutureTense() - 'walked' → 'will walk'
- .verbs().toInfinitive() - 'walks' → 'walk'
- .verbs().toGerund() - 'walks' → 'walking'
- .verbs().toParticiple() - 'drive' → 'driven' - otherwise simple-past ('walked')
- .verbs().toNegative() - 'went' → 'did not go'
- .verbs().toPositive() - "didn't study" → 'studied'
- .verbs().isNegative() - return verbs with 'not'
- .verbs().isPositive() - only verbs without 'not'
- .verbs().isPlural() - return plural verbs like 'we walk'
- .verbs().isSingular() - return singular verbs like 'spencer walks'
- .verbs().adverbs() - return the adverbs describing this verb.
- .verbs().isImperative() - only instruction verbs like 'eat it!'

Numbers

.numbers() - grab all written and numeric values
- .numbers().get() - retrieve the parsed number(s)
- .numbers().json() - overloaded output with number metadata
- .numbers().units() - grab 'kilos' from 25 kilos'
- .numbers().fractions() - things like 1/3rd
- .numbers().toText() - convert number to five or fifth
- .numbers().toNumber() - convert number to 5 or 5th
- .numbers().toOrdinal() - convert number to fifth or 5th
- .numbers().toCardinal() - convert number to five or 5
- .numbers().set(n) - set number to n
- .numbers().add(n) - increase number by n
- .numbers().subtract(n) - decrease number by n
- .numbers().increment() - increase number by 1
- .numbers().decrement() - decrease number by 1
- .numbers().isEqual(n) - return numbers with this value
- .numbers().greaterThan(min) - return numbers bigger than n
- .numbers().lessThan(max) - return numbers smaller than n
- .numbers().between(min, max) - return numbers between min and max
- .numbers().isOrdinal() - return only ordinal numbers
- .numbers().isCardinal() - return only cardinal numbers
- .numbers().toLocaleString() - add commas, or nicer formatting for numbers
.money() - things like '$2.50'
- .money().get() - retrieve the parsed amount(s) of money
- .money().json() - currency + number info
- .money().currency() - which currency the money is in
.fractions() - like '2/3rds' or 'one out of five'
- .fractions().get() - simple numerator, denomenator data
- .fractions().json() - json method overloaded with fractions data
- .fractions().toDecimal() - '2/3' -> '0.66'
- .fractions().normalize() - 'four out of 10' -> '4/10'
- .fractions().toText() - '4/10' -> 'four tenths'
- .fractions().toPercentage() - '4/10' -> '40%'
.percentages() - like '2.5%'
- .fractions().get() - return the percentage number / 100
- .fractions().json() - json overloaded with percentage information
- .fractions().toFraction() - '80%' -> '8/10'

Sentences

.sentences() - return a sentence class with additional methods
- .sentences().json() - overloaded output with sentence metadata
- .sentences().subjects() - return the main noun of each sentence
- .sentences().toPastTense() - he walks -> he walked
- .sentences().toPresentTense() - he walked -> he walks
- .sentences().toFutureTense() -- he walks -> he will walk
- .sentences().toNegative() - - he walks -> he didn't walk
- .sentences().toPositive() - he doesn't walk -> he walks
- .sentences().isPassive() - return only sentences with a passive-voice
- .sentences().isQuestion() - return questions with a ?
- .sentences().isExclamation() - return sentences with a !
- .sentences().isStatement() - return sentences without ? or !
- .sentences().prepend() - smarter prepend that repairs whitespace + titlecasing
- .sentences().append() - smarter append that repairs sentence punctuation
- .sentences().toExclamation() - end sentence with a !
- .sentences().toQuestion() - end sentence with a ?
- .sentences().toStatement() - end sentence with a .

Misc selections

.clauses() - split-up sentences into multi-term phrases
.hyphenated() - all terms connected with a hyphen or dash like 'wash-out'
.phoneNumbers() - things like '(939) 555-0113'
.hashTags() - things like '#nlp'
.emails() - things like 'hi@compromise.cool'
.emoticons() - things like :)
.emojis() - things like 💋
.atMentions() - things like '@nlp_compromise'
.urls() - things like 'compromise.cool'
.adverbs() - things like 'quickly'
.pronouns() - things like 'he'
.conjunctions() - things like 'but'
.prepositions() - things like 'of'
.abbreviations() - things like 'Mrs.'
.people() - names like 'John F. Kennedy'
.places() - like 'Paris, France'
.organizations() - like 'Google, Inc'
.topics() - people() + places() + organizations()
.parentheses() - return anything inside (parentheses)
.possessives() - things like "Spencer's"
.quotations() - return any terms inside quotation marks
.acronyms() - things like 'FBI'
.lists() - things like 'eats, shoots, and leaves'
- .lists().items() - return the partitioned things in the list
- .lists().add() - put a new item in the list

.extend():

compromise comes with a considerate, common-sense baseline for english grammar. You're free to change, or lay-waste to any settings - which is the fun part actually.

the easiest part is just to suggest tags for any given words:

let myWords = {
  kermit: 'FirstName',
  fozzie: 'FirstName',
}
let doc = nlp(muppetText, myWords)

or make heavier changes with a compromise-plugin.

import nlp from 'compromise'
nlp.extend({
  // add new tags
  tags: {
    Character: {
      isA: 'Person',
      notA: 'Adjective',
    },
  },
  // add or change words in the lexicon
  words: {
    kermit: 'Character',
    gonzo: 'Character',
  },
  // add new methods to compromise
  api: (View) => {
    View.prototype.kermitVoice = function () {
      this.sentences().prepend('well,')
      this.match('i [(am|was)]').prepend('um,')
      return this
    }
  }
})

.extend() docs

Docs:

gentle introduction:

Documentation:

Concepts	API	Plugins
Accuracy	Accessors	Adjectives
Caching	Constructor-methods	Dates
Case	Contractions	Export
Filesize	Insert	Hash
Internals	Json	Html
Justification	Lists	Keypress
Lexicon	Loops	Ngrams
Match-syntax	Match	Numbers
Performance	Nouns	Paragraphs
Plugins	Output	Scan
Projects	Selections	Sentences
Tagger	Sorting	Syllables
Tags	Split	Pronounce
Tokenization	Text	Strict
Named-Entities	Utils	Penn-tags
Whitespace	Verbs	Typeahead
World data	Normalization
Fuzzy-matching	Typescript

Talks:

Language as an Interface - by Spencer Kelly
Coding Chat Bots - by KahWee Teng
On Typing and data - by Spencer Kelly

Articles:

Geocoding Social Conversations with NLP and JavaScript - by Microsoft
Microservice Recipe - by Eventn
Adventure Game Sentence Parsing with Compromise
Building Text-Based Games - by Matt Eland
Fun with javascript in BigQuery - by Felipe Hoffa
Natural Language Processing... in the Browser? - by Charles Landau

Some fun Applications:

Automated Bechdel Test - by The Guardian
Story generation framework - by Jose Phrocca
Tumbler blog of lists - horse-ebooks-like lists - by Michael Paulukonis
Video Editing from Transcription - by New Theory
Browser extension Fact-checking - by Alexander Kidd
Siri shortcut - by Michael Byrns
Amazon skill - by Tajddin Maghni
Tasking Slack-bot - by Kevin Suh [see more]

API:

Constructor

(these methods are on the nlp object)

.tokenize() - parse text without running POS-tagging
.plugin() - mix in a compromise-plugin
.verbose() - log our decision-making for debugging
.version - current semver version of the library
.parseMatch() - pre-parse any match statements for faster lookups
.world() - grab all current linguistic data

Plugins:

These are some helpful extensions:

Adjectives

npm install compromise-adjectives

.adjectives() - like quick
- .adjectives().json() - overloaded output with adjective metadata
- .adjectives().conjugate() - return all conjugated forms of this adjective
- .adjectives().toSuperlative() - convert quick to quickest
- .adjectives().toComparative() - convert quick to quicker
- .adjectives().toAdverb() - convert quick to quickly
- .adjectives().toVerb() - convert quick to quicken
- .adjectives().toNoun() - convert quick to quickness

Dates

npm install compromise-dates

.dates() - find dates like June 8th or 03/03/18
- .dates().get() - simple start/end json result
- .dates().json() - overloaded output with date metadata
- .dates().format('') - convert the dates to specific formats
- .dates().toShortForm() - convert 'Wednesday' to 'Wed', etc
- .dates().toLongForm() - convert 'Feb' to 'February', etc
.durations() - 2 weeks or 5mins
- .durations().get() - return simple json for duration
- .durations().json() - overloaded output with duration metadata
.times() - 4:30pm or half past five
- .durations().get() - return simple json for times
- .times().json() - overloaded output with time metadata

Export

npm install compromise-export

.export() - store a parsed document for later use
nlp.load() - re-generate a Doc object from .export() results

Html

npm install compromise-html

.html({}) - generate sanitized html from the document

Hash

npm install compromise-hash

.hash() - generate an md5 hash from the document+tags
.isEqual(doc) - compare the hash of two documents for semantic-equality

Keypress

npm install compromise-keypress

nlp.keypress('') - generate an md5 hash from the document+tags
nlp.clear('') - clean-up any cached sentences from memory

Ngrams

npm install compromise-plugin-stats

.ngrams({}) - list all repeating sub-phrases, by word-count
.unigrams() - n-grams with one word
.bigrams() - n-grams with two words
.trigrams() - n-grams with three words
.startgrams() - n-grams including the first term of a phrase
.endgrams() - n-grams including the last term of a phrase
.edgegrams() - n-grams including the first or last term of a phrase

Paragraphs

npm install compromise-paragraphs this plugin creates a wrapper around the default sentence objects.

.paragraphs() - return groups of sentences
- .paragraphs().json() - output metadata for each paragraph
- .paragraphs().sentences() - go back to a regular Doc object
- .paragraphs().terms() - return all individual terms
- .paragraphs().eq() - get the nth paragraph
- .paragraphs().first() - get the first n paragraphs
- .paragraphs().last() - get the last n paragraphs
- .paragraphs().match() -
- .paragraphs().not() -
- .paragraphs().if() -
- .paragraphs().ifNo() -
- .paragraphs().has() -
- .paragraphs().forEach() -
- .paragraphs().map() -
- .paragraphs().filter() -

Syllables

npm install compromise-syllables

.syllables() - split each term by its typical pronounciation

Penn-tags

npm install compromise-penn-tags

.pennTags() - return POS tags from the Penn Tagset

Typescript

we're committed to typescript/deno support, both in main and in the official-plugins:

import nlp from 'compromise'
import stats from 'compromise-stats'

const nlpEx = nlp.extend(stats)

nlpEx('This is type safe!').ngrams({ min: 1 })

typescript docs

Limitations:

slash-support: We currently split slashes up as different words, like we do for hyphens. so things like this don't work: nlp('the koala eats/shoots/leaves').has('koala leaves') //false
inter-sentence match: By default, sentences are the top-level abstraction. Inter-sentence, or multi-sentence matches aren't supported without a plugin: nlp("that's it. Back to Winnipeg!").has('it back')//false
nested match syntax: the ~~danger~~ beauty of regex is that you can recurse indefinitely. Our match syntax is much weaker. Things like this are not (yet) possible: doc.match('(modern (major|minor))? general') complex matches must be achieved with successive .match() statements.
dependency parsing: Proper sentence transformation requires understanding the syntax tree of a sentence, which we don't currently do. We should! Help wanted with this.

FAQ

☂️ Isn't javascript too...

here

💃 Can it run on my arduino-watch?

quick start

🌎 Compromise in other Languages?

German

French

✨ Partial builds?

compromise-tokenize

(spencer's cool)

(spencer's house)

Keywords

FAQs

What is compromise?

Is compromise popular?

Is compromise well maintained?

Package last updated on 22 Mar 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

compromise

compromise/one

Output

Utils

Accessors

Match

Tag

Case

Whitespace

Loops

Insert

Transform

compromise/two

Contractions

compromise/three

Nouns

Verbs

Numbers

Sentences

Misc selections

.extend():

Docs:

gentle introduction:

Documentation:

Talks:

Articles:

Some fun Applications:

API:

Constructor

Plugins:

Adjectives

Dates

Export

Html

Hash

Keypress

Ngrams

Paragraphs

Syllables

Penn-tags

Typescript

Limitations:

FAQ

See Also:

Keywords

Related posts

Introducing Java Support in Socket

Typosquatting on PyPI: Malicious Package Mimics Popular 'browser-cookie3' Library to Steal Sensitive Data

`compromise/one`

`compromise/two`

`compromise/three`