embs

embs is a powerful Python library for document retrieval, embedding, and ranking, making it easier to build Retrieval-Augmented Generation (RAG) systems, chatbots, and semantic search engines.
Why Choose embs?
🚀 Installation
Install via pip:
pip install embs
For Poetry users:
[tool.poetry.dependencies]
embs = "^0.1.8"
📖 Quick Start Guide
1️⃣ Searching Documents via DuckDuckGo (Recommended!)
Retrieve relevant web pages, convert them to Markdown, and rank them using embeddings.
🚀 Always use a splitter!
Improves ranking, reduces redundancy, and ensures better retrieval.
import asyncio
from functools import partial
from embs import Embs
split_config = {
"headers_to_split_on": [("#", "h1"), ("##", "h2"), ("###", "h3")],
"return_each_line": True,
"strip_headers": True,
"split_on_double_newline": True,
}
md_splitter = partial(Embs.markdown_splitter, config=split_config)
client = Embs()
async def run_search():
results = await client.search_documents_async(
query="Latest AI research",
limit=3,
blocklist=["youtube.com"],
splitter=md_splitter,
)
for item in results:
print(f"File: {item['filename']} | Score: {item['similarity']:.4f}")
print(f"Snippet: {item['markdown'][:80]}...\n")
asyncio.run(run_search())
For synchronous usage:
results = client.search_documents(
query="Latest AI research",
limit=3,
blocklist=["youtube.com"],
splitter=md_splitter,
model="snowflake-arctic-embed-l-v2.0",
)
for item in results:
print(f"File: {item['filename']} | Score: {item['similarity']:.4f}")
2️⃣ Multilingual Document Querying (Local & Online)
Retrieve and rank multilingual documents from local files or URLs.
async def run_query():
docs = await client.query_documents_async(
query="Explique la mécanique quantique",
files=["/path/to/quantum_theory.pdf"],
urls=["https://example.com/quantum.html"],
splitter=md_splitter,
)
for d in docs:
print(f"{d['filename']} => Score: {d['similarity']:.4f}")
print(f"Snippet: {d['markdown'][:80]}...\n")
asyncio.run(run_query())
For synchronous usage:
docs = client.query_documents(
query="Explique la mécanique quantique",
files=["/path/to/quantum_theory.pdf"],
splitter=md_splitter,
)
for d in docs:
print(d["filename"], "=> Score:", d["similarity"])
💡 Perfect for multilingual retrieval! Whether you're searching documents in English, French, Spanish, German, or other supported languages, embs ensures optimal ranking and retrieval.
⚡ Caching for Performance
Enable in-memory or disk caching to speed up repeated queries.
cache_conf = {
"enabled": True,
"type": "memory",
"prefix": "myapp",
"dir": "cache_folder",
"max_mem_items": 128,
"max_ttl_seconds": 86400
}
client = Embs(cache_config=cache_conf)
🔍 Key Features & API Methods
🔹 search_documents_async()
Search for documents via DuckDuckGo, retrieve, and rank them.
await client.search_documents_async(
query="Recent AI breakthroughs",
limit=3,
blocklist=["example.com"],
splitter=md_splitter
)
🔹 query_documents_async()
Retrieve, split, and rank local/online documents.
await client.query_documents_async(
query="Climate change effects",
files=["/path/to/report.pdf"],
urls=["https://example.com"],
splitter=md_splitter,
)
🔹 embed_async()
Generate embeddings for texts with multilingual support.
embeddings = await client.embed_async(
["Este es un ejemplo de texto.", "Ceci est un exemple de phrase."],
optimized=True
)
🔹 rank_async()
Rank candidate texts by similarity to a query.
ranked_results = await client.rank_async(
query="Machine learning",
candidates=["Deep learning is a subset of ML", "Quantum computing is unrelated"]
)
🔬 Testing
Run pytest and pytest-asyncio for automated testing:
pytest --asyncio-mode=auto
📝 Best Practices: Always Use a Splitter!
✅ How to Use the Built-in Markdown Splitter
from functools import partial
split_config = {
"headers_to_split_on": [("#", "h1"), ("##", "h2"), ("###", "h3")],
"return_each_line": True,
"strip_headers": True,
"split_on_double_newline": True,
}
md_splitter = partial(Embs.markdown_splitter, config=split_config)
docs = client.query_documents(
query="Machine Learning Basics",
files=["/path/to/ml_guide.pdf"],
splitter=md_splitter
)
📜 License
Licensed under MIT License. See LICENSE for details.
🤝 Contributing
Pull requests, issues, and discussions are welcome!
🚀 With enhanced multilingual support, embs is now even more powerful for global retrieval applications! 🌍