⚡️ What is FastEmbed?
FastEmbed is a lightweight, fast, Python library built for embedding generation. We support popular text models. Please open a GitHub issue if you want us to add a new model.
The default text embedding (TextEmbedding
) model is Flag Embedding, presented in the MTEB leaderboard. It supports "query" and "passage" prefixes for the input text. Here is an example for Retrieval Embedding Generation and how to use FastEmbed with Qdrant.
📈 Why FastEmbed?
-
Light: FastEmbed is a lightweight library with few external dependencies. We don't require a GPU and don't download GBs of PyTorch dependencies, and instead use the ONNX Runtime. This makes it a great candidate for serverless runtimes like AWS Lambda.
-
Fast: FastEmbed is designed for speed. We use the ONNX Runtime, which is faster than PyTorch. We also use data parallelism for encoding large datasets.
-
Accurate: FastEmbed is better than OpenAI Ada-002. We also support an ever-expanding set of models, including a few multilingual models.
🚀 Installation
To install the FastEmbed library, pip works best. You can install it with or without GPU support:
pip install fastembed
pip install fastembed-gpu
📖 Quickstart
from fastembed import TextEmbedding
from typing import List
documents: List[str] = [
"This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
"fastembed is supported by and maintained by Qdrant.",
]
embedding_model = TextEmbedding()
print("The model BAAI/bge-small-en-v1.5 is ready to use.")
embeddings_generator = embedding_model.embed(documents)
embeddings_list = list(embedding_model.embed(documents))
len(embeddings_list[0])
Fastembed supports a variety of models for different tasks and modalities.
The list of all the available models can be found here
🎒 Dense text embeddings
from fastembed import TextEmbedding
model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
embeddings = list(model.embed(documents))
🔱 Sparse text embeddings
from fastembed import SparseTextEmbedding
model = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
embeddings = list(model.embed(documents))
🦥 Late interaction models (aka ColBERT)
from fastembed import LateInteractionTextEmbedding
model = LateInteractionTextEmbedding(model_name="colbert-ir/colbertv2.0")
embeddings = list(model.embed(documents))
🖼️ Image embeddings
from fastembed import ImageEmbedding
images = [
"./path/to/image1.jpg",
"./path/to/image2.jpg",
]
model = ImageEmbedding(model_name="Qdrant/clip-ViT-B-32-vision")
embeddings = list(model.embed(images))
🔄 Rerankers
from typing import List
from fastembed.rerank.cross_encoder import TextCrossEncoder
query = "Who is maintaining Qdrant?"
documents: List[str] = [
"This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
"fastembed is supported by and maintained by Qdrant.",
]
encoder = TextCrossEncoder(model_name="Xenova/ms-marco-MiniLM-L-6-v2")
scores = list(encoder.rerank(query, documents))
⚡️ FastEmbed on a GPU
FastEmbed supports running on GPU devices.
It requires installation of the fastembed-gpu
package.
pip install fastembed-gpu
Check our example for detailed instructions, CUDA 12.x support and troubleshooting of the common issues.
from fastembed import TextEmbedding
embedding_model = TextEmbedding(
model_name="BAAI/bge-small-en-v1.5",
providers=["CUDAExecutionProvider"]
)
print("The model BAAI/bge-small-en-v1.5 is ready to use on a GPU.")
Usage with Qdrant
Installation with Qdrant Client in Python:
pip install qdrant-client[fastembed]
or
pip install qdrant-client[fastembed-gpu]
You might have to use quotes pip install 'qdrant-client[fastembed]'
on zsh.
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
docs = ["Qdrant has Langchain integrations", "Qdrant also has Llama Index integrations"]
metadata = [
{"source": "Langchain-docs"},
{"source": "Llama-index-docs"},
]
ids = [42, 2]
client.add(
collection_name="demo_collection",
documents=docs,
metadata=metadata,
ids=ids
)
search_result = client.query(
collection_name="demo_collection",
query_text="This is a query document"
)
print(search_result)