Socket
Socket
Sign inDemoInstall

compromise-stats

Package Overview
Dependencies
4
Maintainers
1
Versions
4
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    compromise-stats

plugin for nlp-compromise


Version published
Weekly downloads
951
increased by6.49%
Maintainers
1
Install size
357 kB
Created
Weekly downloads
 

Changelog

Source

10.1.0

  • fix return format of .isPlural(), so it acts like a match filter
  • less-greedy date tagging & ambiguous month fixes

v10

  • cleanup & rename some .value() methods
  • change lumping behaviour of lexicon terms with multiple words
  • keep more former tags after a term replace method
  • new .random() method
  • new .lessThan(), .greaterThan(), .equalTo() methods
  • new prefix/suffix/infix matches with _ffix syntax
  • tag() supports a sequence of tags for a sequence of terms
  • .match 'range' queries now use a real match - #Adverb{2,4}
  • new .before() and .after() match methods
  • removes .lexicon() method for many-lexicons concept
  • changes params of .replaceWith() method to a 'keyTags' boolean
  • improved .debug() and logging on client-side

Readme

Source
nlp statistics plugin for compromise
npm install compromise-stats
TFIDF

tf-idf is a type of word-analysis that can discover the most-characteristic, or unique words in a text. It combines uniqueness of words, and their frequency in the document. This plugin comes pre-built with a standard english model, so you can fingerprint an arbitrary text with .tfidif()

  • .tfidf(opts, model?) -

alternatively, you can build your own model, from a compromise document:

  • .buildIDF() -
let model=nlp(shakespeareWords)
let doc = nlp('thou art so sus.')
doc.tfidf()
// [ [ 'sus', 5.78 ], [ 'thou', 2.3 ], [ 'art', 1.75 ], [ 'so', 0.44 ] ]

if you want to combine tfidf with other analysis, you can add numbers to individual terms, like this:

let doc = nlp('no, my son is also named Bort')
doc.compute('tfidf')
let json = doc.json()
json[0].terms[6]
// {"text":"Bort", "tags":[], "tfidf":5.78, ... }

TF-IDF values are scaled, but have an unbounded maximum. The result for 'foo foo foo foo' would increase every with repitition.

Ngrams

all methods support the same option params:

let doc = nlp('one two three. one two foo.')
doc.ngrams({ size: 2 }) // only two-word grams
/*[
  { size: 2, count: 2, normal: 'one two' },
  { size: 2, count: 1, normal: 'two three' },
  { size: 2, count: 1, normal: 'two foo' }
]
*/

or all gram-sizes under/over a limit:

let doc = nlp('one two three. one two foo.')
let res = doc.ngrams({ min: 3 }) // or max:2
/*[
  { size: 3, count: 1, normal: 'one two three' },
  { size: 3, count: 1, normal: 'one two foo' }
]
*/

MIT

FAQs

Last updated on 01 Jun 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc