Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

compromise-stats

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

compromise-stats

plugin for nlp-compromise

0.1.0
latest
Source
npm

Version published: 3 years ago

Maintainers: 1

Created: 3 years ago

Source

nlp statistics plugin for compromise

npm install compromise-stats

TFIDF

tf-idf is a type of word-analysis that can discover the most-characteristic, or unique words in a text. It combines uniqueness of words, and their frequency in the document. This plugin comes pre-built with a standard english model, so you can fingerprint an arbitrary text with .tfidif()

.tfidf(opts, model?) -

alternatively, you can build your own model, from a compromise document:

.buildIDF() -

let model=nlp(shakespeareWords)
let doc = nlp('thou art so sus.')
doc.tfidf()
// [ [ 'sus', 5.78 ], [ 'thou', 2.3 ], [ 'art', 1.75 ], [ 'so', 0.44 ] ]

if you want to combine tfidf with other analysis, you can add numbers to individual terms, like this:

let doc = nlp('no, my son is also named Bort')
doc.compute('tfidf')
let json = doc.json()
json[0].terms[6]
// {"text":"Bort", "tags":[], "tfidf":5.78, ... }

TF-IDF values are scaled, but have an unbounded maximum. The result for 'foo foo foo foo' would increase every with repitition.

Ngrams

.ngrams({}) - list all repeating sub-phrases, by word-count
.unigrams() - n-grams with one word
.bigrams() - n-grams with two words
.trigrams() - n-grams with three words
.startgrams() - n-grams including the first term of a phrase
.endgrams() - n-grams including the last term of a phrase
.edgegrams() - n-grams including the first or last term of a phrase

all methods support the same option params:

let doc = nlp('one two three. one two foo.')
doc.ngrams({ size: 2 }) // only two-word grams
/*[
  { size: 2, count: 2, normal: 'one two' },
  { size: 2, count: 1, normal: 'two three' },
  { size: 2, count: 1, normal: 'two foo' }
]
*/

or all gram-sizes under/over a limit:

let doc = nlp('one two three. one two foo.')
let res = doc.ngrams({ min: 3 }) // or max:2
/*[
  { size: 3, count: 1, normal: 'one two three' },
  { size: 3, count: 1, normal: 'one two foo' }
]
*/

MIT

FAQs

What is compromise-stats?

Is compromise-stats well maintained?

Package last updated on 01 Jun 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

compromise-stats

TFIDF

Ngrams

Related posts

PyPI on Ultralytics Supply Chain Attack: Poor CI/CD Practices to Blame, No Security Flaws in PyPI Exploited

Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm