
Product
Introducing Tier 1 Reachability: Precision CVE Triage for Enterprise Teams
Socket’s new Tier 1 Reachability filters out up to 80% of irrelevant CVEs, so security teams can focus on the vulnerabilities that matter.
voyage-embedders-haystack
Advanced tools
Haystack 2.x component to embed strings and Documents using VoyageAI Embedding models.
Custom components for Haystack for creating embeddings and reranking documents using the Voyage Models.
Voyage’s embedding models are state-of-the-art in retrieval accuracy. These models outperform top performing embedding models like intfloat/e5-mistral-7b-instruct
and OpenAI/text-embedding-3-large
on the MTEB Benchmark.
[v1.5.0 - 22/01/25]:
VoyageRanker
component can be used to rerank documents using the Voyage Reranker
models.output_dimension
and output_dtype
parameters.[v1.4.0 - 24/07/24]:
timeout
and max_retries
parameters.[v1.3.0 - 18/03/24]:
haystack_integrations.components.embedders.voyage_embedders
.
Please replace all instances of from voyage_embedders.voyage_document_embedder import VoyageDocumentEmbedder
and from voyage_embedders.voyage_text_embedder import VoyageTextEmbedder
withfrom haystack_integrations.components.embedders.voyage_embedders import VoyageDocumentEmbedder, VoyageTextEmbedder
.Secret
API for authentication. For more information please see the Secret Management Documentation.[v1.2.0 - 02/02/24]:
VoyageDocumentEmbedder
and VoyageTextEmbedder
now accept the model
parameter instead of model_name
.voyageai.Client.embed()
method instead of the deprecated get_embedding
and get_embeddings
methods of the global namespace.truncate
parameter has been added."total_tokens"
in the metadata.[v1.1.0 - 13/12/23]: Added support for input_type
parameter in VoyageTextEmbedder
and VoyageDocument Embedder
.
[v1.0.0 - 21/11/23]: Added VoyageTextEmbedder
and VoyageDocument Embedder
to embed strings and documents.
pip install voyage-embedders-haystack
You can use Voyage Embedding models with two components: VoyageTextEmbedder and VoyageDocumentEmbedder.
To create semantic embeddings for documents, use VoyageDocumentEmbedder
in your indexing pipeline. For generating embeddings for queries, use VoyageTextEmbedder
.
The Voyage Reranker models can be used with the VoyageRanker component.
Once you've selected the suitable component for your specific use case, initialize the component with the model name and VoyageAI API key. You can also
set the environment variable VOYAGE_API_KEY
instead of passing the API key as an argument.
To get an API key, please see the Voyage AI website.
Information about the supported models, can be found on the Voyage AI Documentation.
You can find all the examples in the examples
folder.
Below is the example Semantic Search pipeline that uses the Simple Wikipedia Dataset from HuggingFace.
Load the dataset:
# Install HuggingFace Datasets using "pip install datasets"
from datasets import load_dataset
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack.dataclasses import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
# Import Voyage Embedders
from haystack_integrations.components.embedders.voyage_embedders import VoyageDocumentEmbedder, VoyageTextEmbedder
# Load first 100 rows of the Simple Wikipedia Dataset from HuggingFace
dataset = load_dataset("pszemraj/simple_wikipedia", split="validation[:100]")
docs = [
Document(
content=doc["text"],
meta={
"title": doc["title"],
"url": doc["url"],
},
)
for doc in dataset
]
Index the documents to the InMemoryDocumentStore
using the VoyageDocumentEmbedder
and DocumentWriter
:
doc_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
retriever = InMemoryEmbeddingRetriever(document_store=doc_store)
doc_writer = DocumentWriter(document_store=doc_store)
doc_embedder = VoyageDocumentEmbedder(
model="voyage-2",
input_type="document",
)
text_embedder = VoyageTextEmbedder(model="voyage-2", input_type="query")
# Indexing Pipeline
indexing_pipeline = Pipeline()
indexing_pipeline.add_component(instance=doc_embedder, name="DocEmbedder")
indexing_pipeline.add_component(instance=doc_writer, name="DocWriter")
indexing_pipeline.connect("DocEmbedder", "DocWriter")
indexing_pipeline.run({"DocEmbedder": {"documents": docs}})
print(f"Number of documents in Document Store: {len(doc_store.filter_documents())}")
print(f"First Document: {doc_store.filter_documents()[0]}")
print(f"Embedding of first Document: {doc_store.filter_documents()[0].embedding}")
Query the Semantic Search Pipeline using the InMemoryEmbeddingRetriever
and VoyageTextEmbedder
:
text_embedder = VoyageTextEmbedder(model="voyage-2", input_type="query")
# Query Pipeline
query_pipeline = Pipeline()
query_pipeline.add_component(instance=text_embedder, name="TextEmbedder")
query_pipeline.add_component(instance=retriever, name="Retriever")
query_pipeline.connect("TextEmbedder.embedding", "Retriever.query_embedding")
# Search
results = query_pipeline.run({"TextEmbedder": {"text": "Which year did the Joker movie release?"}})
# Print text from top result
top_result = results["Retriever"]["documents"][0].content
print("The top search result is:")
print(top_result)
Pull requests are welcome. For major changes, please open an issue first.
voyage-embedders-haystack
is distributed under the terms of the Apache-2.0 license.
FAQs
Haystack 2.x component to embed strings and Documents using VoyageAI Embedding models.
We found that voyage-embedders-haystack demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket’s new Tier 1 Reachability filters out up to 80% of irrelevant CVEs, so security teams can focus on the vulnerabilities that matter.
Research
/Security News
Ongoing npm supply chain attack spreads to DuckDB: multiple packages compromised with the same wallet-drainer malware.
Security News
The MCP Steering Committee has launched the official MCP Registry in preview, a central hub for discovering and publishing MCP servers.