Socket
Book a DemoInstallSign in
Socket

freql

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

freql

0.1.0
bundlerRubygems
Version published
Maintainers
1
Created
Source

Freql

aka. ((word) Freqency Lang/Lib)

A library for handling word/token frequencies.

features

  • convert cb and fpmw to zipf and other units.
  • basic lookup for word frequencies in various languages.
  • token counting tool
  • tools for building word/token frequency datasets from custom sources

Lets educate you about word frequency units real quick.

namedesciptionrangeexamples
fqfrequency represented as a proportion between 0 and 1. Occurrence count divided by total words/tokens0 to 10.053(the) 0.00000001(trella)
fpmwfrequency per million words.1 million to 053703(the) 0.01(trella)
fpbwfrequency per billion words.1 billion to 0nah
word rankFrequency rank relative to all the other words within your corpus.1+nthe #1
zipf scaleIts log10 of frequency per billion words. Named after the American linguist George Kingsley Zipf9.0 to 0.0(or less technically)1.01(the) to 7.73(trella)
cbIts a word frequency from of logarithmic centibel scale. Basically zipf optimized for storage.0 to -900(or less)-127(the) -799(trella)
nameAdvantagesDisadvantages
fqsimple...lots and lots of decimals
fpmwIts straight forward to calculated and understandIts not easy for humans to compare. for some words its less than 1
fpbwwords arn't going to be less than one.nobody uses it
zipf scaleEasy for humans to compare.requires decimals for accuracy
cbwe can safely represent it as a positive integer without sacrificing significant accuracyless human readable than zipf

Where does cb come from?

cb is the word frequency unit used by our inital dataset pulled from the wordfreq program. https://github.com/rspeer/wordfreq

0 cB represents a word that occurs with probability 1, so it is the only word in the data (this of course doesn't happen). -200 cB represents a word that occurs once per 100 tokens, -300 cB represents a word that occurs once per 1000 tokens, and so on.

Its very similar to zipf, but with a different scale and 0 point. Its always less than 0, so rare values cant cross 0. and numbers are larger, so you dont need decimils for reasonable accuracy. You can easilly save them as positive integers.

In the wordfreq program they 'bin' the data to reduce the file size further. array[ bin[ "words", ...], ... ] The index of the bin represents the positive cb frequency value. you end up with a lot of leading empty bins, but after that it gets really efficient.

Installation

Install the gem and add to the application's Gemfile by executing:

$ bundle add freql

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install freql

Usage

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/freql.

License

The gem is available as open source under the terms of the MIT License.

Credits

FAQs

Package last updated on 06 Jun 2023

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.