Security News
CISA Brings KEV Data to GitHub
CISA's KEV data is now on GitHub, offering easier access, API integration, commit history tracking, and automated updates for security teams and researchers.
elasticsearch-llm-cache
Advanced tools
A Python library to utilize Elasticsearch as a caching layer for Generative AI applications. By caching the responses from Language Models (LLMs), this library helps in reducing costs associated with LLM services and improving response speed from the user's perspective.
This library is covered in depth in the blog post Elasticsearch as a GenAI Caching Layer.
ElasticsearchLLMCache
This class provides the core functionality for creating, querying, and updating a cache index in Elasticsearch.
__init__(self, es_client: Elasticsearch, index_name: Optional[str] = None, es_model_id: Optional[str] = 'sentence-transformers__all-distilroberta-v1', create_index=True)
Constructor method to initialize the ElasticsearchLLMCache instance.
es_client
(Elasticsearch): Elasticsearch client object.index_name
(str, optional): Name for the index; defaults to 'llm_cache'.es_model_id
(str, optional): Model ID for text embedding; defaults to 'sentence-transformers__all-distilroberta-v1'.create_index
(bool, optional): Whether to create a new index; defaults to True.create_index(self, dims: Optional[int] = 768) -> Dict
Method to create a new index for caching if it does not already exist.
dims
(int, optional): The dimensionality of the vector; defaults to 768.Returns:
Mapping:
prompt
: textresponse
: textcreate_date
: datelast_hit_date
: dateprompt_vector
: dense_vectorquery(self, prompt_text: str, similarity_threshold: Optional[float] = 0.5) -> dict
Method to query the index to find similar prompts and update the last_hit_date
for that document if a hit is found.
prompt_text
(str): The text of the prompt to find similar entries for.similarity_threshold
(float, optional): The similarity threshold for filtering results; defaults to 0.5.Returns:
add(self, prompt: str, response: str, source: Optional[str] = None) -> dict
Method to add a new document to the index when there is no cache hit and a response is fetched from LLM.
prompt
(str): The user prompt.response
(str): The LLM response.source
(str, optional): Source identifier for the LLM.Returns:
from elasticsearch import Elasticsearch
from elasticsearch_llm_cache.elasticsearch_llm_cache import ElasticsearchLLMCache
# Initialize Elasticsearch client
es_client = Elasticsearch()
# Initialize Elasticsearch LLM Cache
llm_cache = ElasticsearchLLMCache(es_client)
# Query the cache
cache_response = llm_cache.query(prompt_text="Hello, how can I help?")
# If no cache hit, add new response to cache
if not cache_response:
llm_response = "I'm here to assist you!" # Assume this response is fetched from LLM
llm_cache.add(prompt="Hello, how can I help?", response=llm_response)
Sample Streamlit App using RAG with Elasticsearch
Unit tests for ElasticsearchLLMCache class
FAQs
A caching utility for Elasticsearch leveraging vector search
We found that elasticsearch-llm-cache demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISA's KEV data is now on GitHub, offering easier access, API integration, commit history tracking, and automated updates for security teams and researchers.
Security News
Opengrep forks Semgrep to preserve open source SAST in response to controversial licensing changes.
Security News
Critics call the Node.js EOL CVE a misuse of the system, sparking debate over CVE standards and the growing noise in vulnerability databases.