
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
gpt-semantic-cache
Advanced tools
An NPM package for semantic caching of GPT responses using Redis and ANN.
An NPM package for semantic caching of GPT responses using Redis and Approximate Nearest Neighbors (ANN) search.
The GPT Semantic Cache is a Node.js package that provides a semantic caching mechanism for GPT responses. By leveraging semantic embeddings and approximate nearest neighbors search, the package efficiently caches and retrieves GPT responses based on the semantic similarity of user queries. This reduces redundant API calls to GPT models, saving time and costs, and improving response times for end-users. Queries with similar meaning are retrieved from cache saving the cost associated with an API.
Here are several areas where this can be used:
npm install gpt-semantic-cache
Here's a quick example to get you started:
const { SemanticGPTCache } = require('gpt-semantic-cache');
(async () => {
const cache = new SemanticGPTCache({
embeddingOptions: {
type: 'openai',
openAIApiKey: 'YOUR_OPENAI_API_KEY',
},
gptOptions: {
openAIApiKey: 'YOUR_OPENAI_API_KEY',
model: 'gpt-3.5-turbo',
},
cacheOptions: {
redisUrl: 'redis://localhost:6379',
similarityThreshold: 0.8,
cacheTTL: 3600, // Cache Time-To-Live in seconds
embeddingSize: 1536, // OpenAI's embedding size
},
});
await cache.initialize();
const response = await cache.query('What is the capital of France?');
console.log(response);
})();
To initialize the SemanticGPTCache, you need to provide configuration options for embeddings, GPT model, and caching.
const cache = new SemanticGPTCache({
embeddingOptions: {
type: 'local', // 'openai' or 'local'
modelName: 'sentence-transformers/all-MiniLM-L6-v2', // Only for local models
openAIApiKey: 'YOUR_OPENAI_API_KEY', // Only for OpenAI embeddings
},
gptOptions: {
openAIApiKey: 'YOUR_OPENAI_API_KEY',
model: 'gpt-3.5-turbo', // GPT model to use to query gpt if cache misses
promptPrefix: 'You are an AI assistant.',
},
cacheOptions: {
redisUrl: 'redis://localhost:6379',
similarityThreshold: 0.8, // Cosine similarity threshold for cache hits
cacheTTL: 3600, // Time-to-live for cache entries in seconds
embeddingSize: 384, // Embedding size (384 for local models, 1536 for OpenAI)
},
});
await cache.initialize();
Initialization Options Explained:
embeddingOptions:
type: 'openai' or 'local'. Specifies the source of embeddings.modelName: The name of the local embedding model to use (e.g., 'sentence-transformers/all-MiniLM-L6-v2').openAIApiKey: Your OpenAI API key (required if type is 'openai').gptOptions:
openAIApiKey: Your OpenAI API key for accessing the GPT model.model: The GPT model to use (e.g., 'gpt-3.5-turbo') in case of cache miss.promptPrefix: An optional string to prepend to every prompt sent to the GPT model.cacheOptions:
redisUrl: The URL of your Redis instance (e.g., 'redis://localhost:6379').similarityThreshold: A number between 0 and 1 representing the cosine similarity threshold for cache hits.cacheTTL: The time-to-live for cache entries in seconds.embeddingSize: The dimensionality of the embeddings used (e.g., 384 for local models, 1536 for OpenAI).To query the cache and get a response:
const response = await cache.query('Your query here', 'Additional context if any');
console.log(response);
The package allows you to customize various settings to fit your needs:
Similarity Threshold: Adjust the similarityThreshold in cacheOptions to control how similar a query needs to be to hit the cache. A higher threshold means only very similar queries will hit the cache.
Cache Time-To-Live (TTL): Set cacheTTL to control how long entries remain in the cache.
Embedding Size: Ensure embeddingSize matches the size of embeddings produced by your chosen embedding model.
Semantic embeddings are vector representations of text that capture the meaning and context of the text. By converting both user queries and cached queries into embeddings, we can compare them in a high-dimensional space to find semantic similarities.
To efficiently find similar embeddings in the cache, the package uses the Hierarchical Navigable Small World (HNSW) algorithm for Approximate Nearest Neighbors search. HNSW constructs a graph of embeddings that allows for fast retrieval of nearest neighbors without comparing the query against every cached embedding.
Cosine similarity measures the cosine of the angle between two vectors in a multidimensional space. It is a commonly used metric to determine how similar two embeddings are. In this package, after retrieving the nearest neighbors using ANN search, cosine similarity is computed to ensure the retrieved embeddings meet the specified similarity threshold.
The caching mechanism works as follows:
Embedding Generation: When a query is received, it's converted into an embedding using the specified embedding model.
ANN Search: The embedding is used to search the ANN index for similar embeddings.
Similarity Check: Retrieved embeddings are compared using cosine similarity to ensure they meet the similarity threshold.
Cache Hit or Miss:
const cache = new SemanticGPTCache({
embeddingOptions: {
type: 'local',
modelName: 'sentence-transformers/all-MiniLM-L6-v2',
},
gptOptions: {
openAIApiKey: 'YOUR_OPENAI_API_KEY',
model: 'gpt-3.5-turbo',
},
cacheOptions: {
redisUrl: 'redis://localhost:6379',
similarityThreshold: 0.75,
cacheTTL: 7200, // 2 hours
embeddingSize: 384, // For MiniLM model
},
});
await cache.initialize();
const response = await cache.query('Tell me a joke.');
console.log(response);
You can adjust the similarityThreshold to control cache sensitivity:
// Higher threshold - only very similar queries will hit the cache
cache.cacheOptions.similarityThreshold = 0.9;
// Lower threshold - more queries will hit the cache, but responses may be less relevant
cache.cacheOptions.similarityThreshold = 0.6;
This project is licensed under the MIT License.
FAQs
An NPM package for semantic caching of GPT responses using Redis and ANN.
We found that gpt-semantic-cache demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.