Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

floret

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

floret

floret Python bindings

  • 0.10.5
  • PyPI
  • Socket score

Maintainers
1

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret is an extended version of fastText that can produce word representations for any word from a compact vector table. It combines:

  • fastText's subwords to provide embeddings for any word
  • Bloom embeddings ("hashing trick") for a compact vector table

Installation

pip install floret

Usage

Train floret vectors using the options:

  • mode: "floret", storing both words and subwords in the same compact hash table
  • hashCount: store each entry in 1-4 rows in the hash table (recommended: 2)
  • bucket: in combination with hashCount>1, the size of the hash table can be greatly reduced (recommended: 25000--100000, reduced from the fastText default of 2000000)
  • minn: min length of char ngram (default: 3)
  • maxn: max length of char ngram (default: 6)
import floret

# train vectors
model = floret.train_unsupervised(
    "data.txt",
    model="cbow",
    mode="floret",
    hashCount=2,
    bucket=50000,
    minn=3,
    maxn=6,
)

# query vector
model.get_word_vector("broccoli")

# save full model
model.save_model("vectors.bin")

# export standard word-only vector table
model.save_vectors("vectors.vec")

# export floret vector table
model.save_floret_vectors("vectors.floret")

Note: with the default setting mode="fasttext", floret trains original fastText vectors.

Use floret vectors in spaCy

Import floret vectors into spaCy v3.2+:

spacy init vectors LANG vectors.floret spacy_vectors_model --mode floret

Notes

floret contains all features of the original fasttext module. See the fasttext docs for more information.

The fasttext and floret binary formats saved with model.save_model("model.bin") are not compatible.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc