Socket
Socket
Sign inDemoInstall

compromise

Package Overview
Dependencies
Maintainers
3
Versions
169
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

compromise

modest natural language processing


Version published
Weekly downloads
52K
decreased by-5.79%
Maintainers
3
Weekly downloads
 
Created
Source
compromise
modest natural language processing
npm install compromise
by Spencer Kelly and many contributors
isn't it weird how we can write text, but not parse it?
    ᔐᖜ↬-
it's like we've agreed that
text is a dead-end.
and the knowledge in it
should not really be used.
compromise tries its best to parse text.
it is small, quick, and often good-enough.
it is not as smart as you'd think.

.match():

let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"
if (!doc.has('simon says #Verb')) {
  return 'do nothing'
}

.verbs():

let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'

.nouns():

let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'

.numbers():

let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(2)
doc.text()
// 'ninety five thousand and fifty four'

.contractions():

let doc = nlp("we're not gonna take it, no we ain't gonna take it.")

// match an implicit term
doc.has('going') // true

// transform
doc.contractions().expand()
dox.text()
// 'we are not going to take it, no we are not going to take it.'

Use it on the client-side:

<script src="https://unpkg.com/compromise"></script>
<script>
  var doc = nlp('two bottles of beer')
  doc.numbers().minus(1)
  document.body.innerHTML = doc.text()
  // 'one bottle of beer'
</script>

or the back:

import nlp from 'compromise'

var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'

full api:

compromise/one

A tokenizer of words, sentences, and punctuation.

import nlp from 'compromise/one'

let doc = nlp("Wayne's World, party time")
let data = doc.json()
// [{ terms:[{ text:"Wayne's", normal:"wayne"}, ...] }]
Output
  • .text() - return the document as text
  • .json({}) - return the document as data
  • .debug() - pretty-print the interpreted document
Utils
  • .all() - return the whole original document ('zoom out')
  • .found [getter] - is this document empty?
  • .tagger() - (re-)run the part-of-speech tagger on this document
  • .wordCount() - count the # of terms in the document
  • .length [getter] - count the # of characters in the document (string length)
  • .clone() - deep-copy the document, so that no references remain
  • .cache({}) - freeze the current state of the document, for speed-purposes
  • .uncache() - un-freezes the current state of the document, so it may be transformed
Accessors
Match

(all match methods use the match-syntax.)

  • .match('') - return a new Doc, with this one as a parent
  • .not('') - return all results except for this
  • .matchOne('') - return only the first match
  • .if('') - return each current phrase, only if it contains this match ('only')
  • .ifNo('') - Filter-out any current phrases that have this match ('notIf')
  • .has('') - Return a boolean if this match exists
  • .lookBehind('') - search through earlier terms, in the sentence
  • .lookAhead('') - search through following terms, in the sentence
  • .before('') - return all terms before a match, in each phrase
  • .after('') - return all terms after a match, in each phrase
  • .lookup([]) - quick find for an array of string matches
Tag
  • .tag('') - Give all terms the given tag
  • .tagSafe('') - Only apply tag to terms if it is consistent with current tags
  • .unTag('') - Remove this term from the given terms
  • .canBe('') - return only the terms that can be this tag
Case
Whitespace
  • .pre('') - add this punctuation or whitespace before each match
  • .post('') - add this punctuation or whitespace after each match
  • .trim() - remove start and end whitespace
  • .hyphenate() - connect words with hyphen, and remove whitespace
  • .dehyphenate() - remove hyphens between words, and set whitespace
  • .toQuotations() - add quotation marks around these matches
  • .toParentheses() - add brackets around these matches
Loops
  • .map(fn) - run each phrase through a function, and create a new document
  • .forEach(fn) - run a function on each phrase, as an individual document
  • .filter(fn) - return only the phrases that return true
  • .find(fn) - return a document with only the first phrase that matches
  • .some(fn) - return true or false if there is one matching phrase
  • .random(fn) - sample a subset of the results
Insert
Transform

compromise/two

A part-of-speech tagger, and grammar-interpreter.

import nlp from 'compromise/two'

let doc = nlp("Wayne's World, party time")
let str = doc.match('#Possessive #Noun').text()
// "Wayne's World"
Contractions

compromise/three

Phrase and sentence tooling.

import nlp from 'compromise/three'

let doc = nlp("Wayne's World, party time")
let str = doc.people().normalize().text()
// "wayne"
Selections
Nouns
Verbs
Numbers
Sentences

compromise is 180kb (minified):

it's pretty fast. It can run on keypress:

it works mainly by conjugating all forms of a basic word list.

The final lexicon is ~14,000 words:

you can read more about how it works, here. it's weird.

.extend():

decide how words get interpreted:

let myWords = {
  kermit: 'FirstName',
  fozzie: 'FirstName',
}
let doc = nlp(muppetText, myWords)

or make heavier changes with a compromise-plugin.

const nlp = require('compromise')

nlp.extend((Doc, world) => {
  // add new tags
  world.addTags({
    Character: {
      isA: 'Person',
      notA: 'Adjective',
    },
  })

  // add or change words in the lexicon
  world.addWords({
    kermit: 'Character',
    gonzo: 'Character',
  })

  // add methods to run after the tagger
  world.postProcess(doc => {
    doc.match('light the lights').tag('#Verb . #Plural')
  })

  // add a whole new method
  Doc.prototype.kermitVoice = function () {
    this.sentences().prepend('well,')
    this.match('i [(am|was)]').prepend('um,')
    return this
  }
})

Docs:

gentle introduction:
Documentation:

| Concepts | API | Plugins | | ------------------ | :---------------------------------: | ---------------------------: | --- | | Accuracy | Accessors | Adjectives | | Caching | Constructor-methods | Dates | | Case | Contractions | Export | | Filesize | Insert | Hash | | Internals | Json | Html | | Justification | Lists | Keypress | | Lexicon | Loops | Ngrams | | Match-syntax | Match | Numbers | | Performance | Nouns | Paragraphs | | Plugins | Output | Scan | | Projects | Selections | Sentences | | Tagger | Sorting | Syllables | | Tags | Split | Pronounce | | | Tokenization | Text | Strict | | Named-Entities | Utils | Penn-tags | | Whitespace | Verbs | Typeahead | | World data | Normalization | | | Fuzzy-matching | Typescript | |

Talks:
Articles:
Some fun Applications:

API:

Constructor

(these methods are on the nlp object)

Plugins:

These are some helpful extensions:

Adjectives

npm install compromise-adjectives

Dates

npm install compromise-dates

Export

npm install compromise-export

  • .export() - store a parsed document for later use
  • nlp.load() - re-generate a Doc object from .export() results
Html

npm install compromise-html

  • .html({}) - generate sanitized html from the document
Hash

npm install compromise-hash

  • .hash() - generate an md5 hash from the document+tags
  • .isEqual(doc) - compare the hash of two documents for semantic-equality
Keypress

npm install compromise-keypress

Ngrams

npm install compromise-plugin-stats

Paragraphs

npm install compromise-paragraphs this plugin creates a wrapper around the default sentence objects.

Strict-match

npm install compromise-strict

Syllables

npm install compromise-syllables

  • .syllables() - split each term by its typical pronounciation
Penn-tags

npm install compromise-penn-tags


Typescript

we're committed to typescript/deno support, both in main and in the official-plugins:

import nlp from 'compromise'
import ngrams from 'compromise-ngrams'
import numbers from 'compromise-numbers'

const nlpEx = nlp.extend(ngrams).extend(numbers)

nlpEx('This is type safe!').ngrams({ min: 1 })
nlpEx('This is type safe!').numbers()
Limitations:
  • slash-support: We currently split slashes up as different words, like we do for hyphens. so things like this don't work: nlp('the koala eats/shoots/leaves').has('koala leaves') //false

  • inter-sentence match: By default, sentences are the top-level abstraction. Inter-sentence, or multi-sentence matches aren't supported without a plugin: nlp("that's it. Back to Winnipeg!").has('it back')//false

  • nested match syntax: the danger beauty of regex is that you can recurse indefinitely. Our match syntax is much weaker. Things like this are not (yet) possible: doc.match('(modern (major|minor))? general') complex matches must be achieved with successive .match() statements.

  • dependency parsing: Proper sentence transformation requires understanding the syntax tree of a sentence, which we don't currently do. We should! Help wanted with this.

FAQ
See Also:

MIT

Keywords

FAQs

Package last updated on 02 Mar 2022

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc