floret

floret Python bindings

0.10.5
PyPI

Maintainers: 1

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret is an extended version of fastText that can produce word representations for any word from a compact vector table. It combines:

fastText's subwords to provide embeddings for any word
Bloom embeddings ("hashing trick") for a compact vector table

Installation

pip install floret

Usage

Train floret vectors using the options:

mode: "floret", storing both words and subwords in the same compact hash table
hashCount: store each entry in 1-4 rows in the hash table (recommended: 2)
bucket: in combination with hashCount>1, the size of the hash table can be greatly reduced (recommended: 25000--100000, reduced from the fastText default of 2000000)
minn: min length of char ngram (default: 3)
maxn: max length of char ngram (default: 6)

import floret

# train vectors
model = floret.train_unsupervised(
    "data.txt",
    model="cbow",
    mode="floret",
    hashCount=2,
    bucket=50000,
    minn=3,
    maxn=6,
)

# query vector
model.get_word_vector("broccoli")

# save full model
model.save_model("vectors.bin")

# export standard word-only vector table
model.save_vectors("vectors.vec")

# export floret vector table
model.save_floret_vectors("vectors.floret")

Note: with the default setting mode="fasttext", floret trains original fastText vectors.

Use floret vectors in spaCy

Import floret vectors into spaCy v3.2+:

spacy init vectors LANG vectors.floret spacy_vectors_model --mode floret

Notes

floret contains all features of the original fasttext module. See the fasttext docs for more information.

The fasttext and floret binary formats saved with model.save_model("model.bin") are not compatible.

FAQs

What is floret?

Is floret well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

floret

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

Installation

Usage

Use floret vectors in spaCy

Notes

Related posts

OpenSSF Launches Open Source Project Security Baseline to Strengthen Software Supply Chain

Michigan TypeScript Founder Successfully Runs Doom Inside TypeScript's Type System