
Security News
Feross on the 10 Minutes or Less Podcast: Nobody Reads the Code
Socket CEO Feross Aboukhadijeh joins 10 Minutes or Less, a podcast by Ali Rohde, to discuss the recent surge in open source supply chain attacks.
vectoriadb
Advanced tools
VectoriaDB - A lightweight, production-ready in-memory vector database for semantic search
A lightweight, production-ready in-memory vector database for semantic search in JavaScript/TypeScript
VectoriaDB is a fast, minimal-dependency vector database designed for in-memory semantic search. Powered by transformers.js, it's perfect for applications that need to quickly search through documents, tools, or any text-based data using natural language queries.
npm install vectoriadb
# or
yarn add vectoriadb
# or
pnpm add vectoriadb
Requirements:
Use VectoriaDB when you need:
Skip VectoriaDB if you need:
import { VectoriaDB } from 'vectoriadb';
// Create and initialize the database
const db = new VectoriaDB();
await db.initialize();
// Add documents
await db.add('doc-1', 'How to create a user account', {
id: 'doc-1',
category: 'auth',
author: 'Alice',
});
await db.add('doc-2', 'Send email notifications to users', {
id: 'doc-2',
category: 'notifications',
author: 'Bob',
});
// Search
const results = await db.search('creating new accounts');
console.log(results[0].metadata); // { id: 'doc-1', category: 'auth', ... }
console.log(results[0].score); // 0.87
Each document in VectoriaDB consists of:
VectoriaDB automatically generates embeddings (vector representations) of your documents using transformers.js. The default model is Xenova/all-MiniLM-L6-v2 (22MB, 384 dimensions), which provides a great balance of size, speed, and accuracy.
Search uses cosine similarity to find the most semantically similar documents to your query.
const db = new VectoriaDB<MetadataType>(config?)
Config Options:
interface VectoriaConfig {
modelName?: string; // Default: 'Xenova/all-MiniLM-L6-v2'
dimensions?: number; // Auto-detected from model
defaultSimilarityThreshold?: number; // Default: 0.3
defaultTopK?: number; // Default: 10
}
initialize(): Promise<void>Initialize the embedding model. Must be called before using the database.
await db.initialize();
add(id: string, text: string, metadata: T): Promise<void>Add a single document to the database.
await db.add('doc-1', 'Document content', { id: 'doc-1', category: 'tech' });
addMany(documents: Array<{id, text, metadata}>): Promise<void>Add multiple documents in batch (more efficient).
await db.addMany([
{ id: 'doc-1', text: 'Content 1', metadata: { id: 'doc-1' } },
{ id: 'doc-2', text: 'Content 2', metadata: { id: 'doc-2' } },
]);
search(query: string, options?): Promise<SearchResult<T>[]>Search for documents using semantic similarity.
const results = await db.search('machine learning', {
topK: 5, // Return top 5 results
threshold: 0.5, // Minimum similarity score
filter: (metadata) => metadata.category === 'tech', // Custom filter
includeVector: false, // Include vector in results
});
get(id: string): DocumentEmbedding<T> | undefinedGet a document by ID.
const doc = db.get('doc-1');
has(id: string): booleanCheck if a document exists.
if (db.has('doc-1')) {
// Document exists
}
remove(id: string): booleanRemove a document.
db.remove('doc-1');
removeMany(ids: string[]): numberRemove multiple documents.
const removed = db.removeMany(['doc-1', 'doc-2']);
clear(): voidRemove all documents.
db.clear();
size(): numberGet the number of documents.
const count = db.size();
filter(filterFn): DocumentEmbedding<T>[]Get documents by filter (without semantic search).
const techDocs = db.filter((metadata) => metadata.category === 'tech');
getStats(): VectoriaStatsGet database statistics.
const stats = db.getStats();
console.log(stats.totalEmbeddings);
console.log(stats.estimatedMemoryBytes);
Use TypeScript generics for type-safe metadata:
interface MyMetadata extends DocumentMetadata {
id: string;
category: 'tech' | 'business' | 'science';
author: string;
tags: string[];
}
const db = new VectoriaDB<MyMetadata>();
await db.add('doc-1', 'Content', {
id: 'doc-1',
category: 'tech', // Type-checked!
author: 'Alice',
tags: ['ai', 'ml'],
});
const results = await db.search('query', {
filter: (metadata) => {
// metadata is fully typed!
return metadata.category === 'tech' && metadata.tags.includes('ai');
},
});
Use any Hugging Face model compatible with transformers.js:
const db = new VectoriaDB({
modelName: 'Xenova/paraphrase-multilingual-MiniLM-L12-v2', // Multilingual support
});
For better performance with large datasets:
const documents = [
{ id: '1', text: 'Doc 1', metadata: { id: '1' } },
{ id: '2', text: 'Doc 2', metadata: { id: '2' } },
// ... thousands more
];
// Much faster than calling add() in a loop
await db.addMany(documents);
For production applications with large datasets (>10k documents), enable HNSW (Hierarchical Navigable Small World) indexing for faster approximate nearest neighbor search:
const db = new VectoriaDB({
useHNSW: true,
hnsw: {
M: 16, // Max connections per node (higher = better recall, more memory)
M0: 32, // Max connections at layer 0
efConstruction: 200, // Construction quality (higher = better quality, slower build)
efSearch: 50, // Search quality (higher = better recall, slower search)
},
});
await db.initialize();
// Add documents - HNSW index is built automatically
await db.addMany(documents);
// Search uses HNSW for O(log n) instead of O(n) complexity
const results = await db.search('query');
HNSW Benefits:
Parameter Tuning:
| Parameter | Lower Value | Higher Value | Default |
|---|---|---|---|
| M | Faster build, less memory | Better recall, more memory | 16 |
| efConstruction | Faster build, lower quality | Better quality, slower build | 200 |
| efSearch | Faster search, lower recall | Better recall, slower search | 50 |
When to use HNSW:
Combine semantic search with complex metadata filters:
interface SecurityMetadata extends DocumentMetadata {
id: string;
category: string;
tags: string[];
author: string;
priority: 'low' | 'medium' | 'high';
}
const db = new VectoriaDB<SecurityMetadata>();
const results = await db.search('user authentication', {
topK: 10,
threshold: 0.4,
filter: (metadata) => {
return (
metadata.category === 'security' &&
metadata.tags.includes('auth') &&
metadata.author === 'security-team' &&
metadata.priority === 'high'
);
},
});
Cache embeddings across restarts to avoid recalculation. VectoriaDB supports multiple storage backends:
No persistence - data is lost on restart:
const db = new VectoriaDB(); // Uses MemoryStorageAdapter by default
Perfect for local development - caches to disk with automatic invalidation when tools change:
import { VectoriaDB, FileStorageAdapter, SerializationUtils } from 'vectoriadb';
const documents = [
{ id: 'tool-1', text: 'Create user account', metadata: { id: 'tool-1' } },
{ id: 'tool-2', text: 'Send email notification', metadata: { id: 'tool-2' } },
];
// Create tools hash for cache invalidation
const toolsHash = SerializationUtils.createToolsHash(documents);
const db = new VectoriaDB({
storageAdapter: new FileStorageAdapter({
cacheDir: './.cache/vectoriadb',
namespace: 'my-app', // Separate cache per namespace
}),
toolsHash, // Cache invalidated when tools change
version: '1.0.0', // Cache invalidated when version changes
});
await db.initialize(); // Automatically loads from cache if valid
// Add documents (only on first run or after invalidation)
if (db.size() === 0) {
await db.addMany(documents);
await db.saveToStorage(); // Manually save to cache
}
// Subsequent runs will load from cache instantly
Share embeddings across pods in distributed environments:
import { VectoriaDB, RedisStorageAdapter, SerializationUtils } from 'vectoriadb';
import Redis from 'ioredis'; // or your Redis client
const documents = [
/* your documents */
];
const toolsHash = SerializationUtils.createToolsHash(documents);
const redis = new Redis({
host: 'localhost',
port: 6379,
});
const db = new VectoriaDB({
storageAdapter: new RedisStorageAdapter({
client: redis,
namespace: 'my-app-v1', // Namespace by app + version
ttl: 86400, // 24 hours (default)
}),
toolsHash,
version: process.env.APP_VERSION,
});
await db.initialize(); // Loads from Redis if cache is valid
if (db.size() === 0) {
await db.addMany(documents);
await db.saveToStorage();
}
// Don't forget to close when shutting down
await db.close();
Cache Invalidation:
The cache is automatically invalidated when:
toolsHash changes (documents added/removed/modified)version changes (application version updated)modelName changes (different embedding model)Best Practices:
FileStorageAdapter to speed up restartsRedisStorageAdapter for multi-pod deploymentssaveToStorage() after adding documentsUpdate documents efficiently without re-embedding when only metadata changes:
// Update metadata without re-embedding (instant operation)
db.updateMetadata('doc-1', {
id: 'doc-1',
category: 'updated-category',
priority: 'high',
lastModified: new Date(),
});
// Only re-embeds if text actually changed
const reembedded = await db.update('doc-1', {
text: 'Updated content', // If different, will re-embed
metadata: { id: 'doc-1', category: 'updated' },
});
console.log(reembedded); // true if re-embedded, false if text was same
// Update many documents - only re-embeds those with text changes
const result = await db.updateMany([
{
id: 'doc-1',
text: 'New content for doc 1', // Will re-embed
metadata: { id: 'doc-1', category: 'tech' },
},
{
id: 'doc-2',
metadata: { id: 'doc-2', category: 'food' }, // No text = no re-embedding
},
{
id: 'doc-3',
text: 'Same text as before', // Smart detection = no re-embedding
metadata: { id: 'doc-3', category: 'science' },
},
]);
console.log(`Updated ${result.updated} documents`);
console.log(`Re-embedded ${result.reembedded} documents`); // Only what changed
// Force re-embed even if text hasn't changed (e.g., new embedding model)
await db.update('doc-1', { text: 'same text' }, { forceReembed: true });
// Force re-embed all in batch
await db.updateMany(docs, { forceReembed: true });
Performance Benefits:
| Operation | Speed | Re-embedding |
|---|---|---|
updateMetadata() | Instant | Never |
update() (metadata) | Instant | No |
update() (text) | ~100-200ms | Only if changed |
updateMany() (mixed) | Batched | Only what changed |
Use Cases:
VectoriaDB provides production-ready error handling with specific error types that can be caught and handled individually.
All errors extend the base VectoriaError class with a code property for programmatic error handling:
import {
VectoriaError, // Base error class
VectoriaNotInitializedError, // DB not initialized
DocumentValidationError, // Invalid document data
DocumentNotFoundError, // Document doesn't exist
DocumentExistsError, // Document already exists
DuplicateDocumentError, // Duplicate in batch or existing
QueryValidationError, // Invalid search query/params
EmbeddingError, // Embedding generation failure
StorageError, // Storage operation failure
ConfigurationError, // Invalid configuration
} from 'vectoriadb';
Thrown when operations are attempted before calling initialize():
const db = new VectoriaDB();
try {
await db.add('doc-1', 'text', { id: 'doc-1' });
} catch (error) {
if (error instanceof VectoriaNotInitializedError) {
console.log(error.code); // 'NOT_INITIALIZED'
console.log(error.message); // 'VectoriaDB must be initialized before adding documents...'
await db.initialize(); // Fix: initialize first
}
}
Thrown when document data is invalid:
try {
// Empty text
await db.add('doc-1', '', { id: 'doc-1' });
} catch (error) {
if (error instanceof DocumentValidationError) {
console.log(error.code); // 'DOCUMENT_VALIDATION_ERROR'
console.log(error.documentId); // 'doc-1'
}
}
try {
// Metadata.id mismatch
await db.add('doc-1', 'text', { id: 'doc-2' });
} catch (error) {
if (error instanceof DocumentValidationError) {
console.log(error.message); // 'Metadata id "doc-2" does not match document id "doc-1"'
}
}
Thrown when attempting to update a non-existent document:
try {
await db.update('nonexistent', { text: 'new' });
} catch (error) {
if (error instanceof DocumentNotFoundError) {
console.log(error.code); // 'DOCUMENT_NOT_FOUND'
console.log(error.documentId); // 'nonexistent'
}
}
Thrown when adding a document with an ID that already exists:
await db.add('doc-1', 'text', { id: 'doc-1' });
try {
await db.add('doc-1', 'duplicate', { id: 'doc-1' });
} catch (error) {
if (error instanceof DocumentExistsError) {
console.log(error.code); // 'DOCUMENT_EXISTS'
console.log(error.documentId); // 'doc-1'
// Fix: use remove() first or choose different ID
db.remove('doc-1');
await db.add('doc-1', 'duplicate', { id: 'doc-1' });
}
}
Thrown when batch operations contain duplicates:
try {
await db.addMany([
{ id: 'doc-1', text: 'first', metadata: { id: 'doc-1' } },
{ id: 'doc-1', text: 'second', metadata: { id: 'doc-1' } }, // Duplicate in batch
]);
} catch (error) {
if (error instanceof DuplicateDocumentError) {
console.log(error.code); // 'DUPLICATE_DOCUMENT'
console.log(error.context); // 'batch' or 'existing'
console.log(error.documentId); // 'doc-1'
}
}
Thrown when search parameters are invalid:
try {
await db.search(''); // Empty query
} catch (error) {
if (error instanceof QueryValidationError) {
console.log(error.code); // 'QUERY_VALIDATION_ERROR'
}
}
try {
await db.search('query', { topK: -5 }); // Invalid topK
} catch (error) {
if (error instanceof QueryValidationError) {
console.log(error.message); // 'topK must be a positive number'
}
}
try {
await db.search('query', { threshold: 1.5 }); // Invalid threshold
} catch (error) {
if (error instanceof QueryValidationError) {
console.log(error.message); // 'threshold must be between 0 and 1'
}
}
Thrown when embedding generation fails:
try {
// This would only happen with internal errors
await db.addMany(documents);
} catch (error) {
if (error instanceof EmbeddingError) {
console.log(error.code); // 'EMBEDDING_ERROR'
console.log(error.details); // Additional error details
}
}
try {
await db.add('doc-1', text, metadata);
} catch (error) {
if (error instanceof DocumentExistsError) {
// Handle duplicate: maybe update instead
await db.update(error.documentId, { text, metadata });
} else if (error instanceof DocumentValidationError) {
// Handle validation: log and skip
console.error(`Invalid document ${error.documentId}:`, error.message);
} else if (error instanceof VectoriaNotInitializedError) {
// Handle initialization: retry after init
await db.initialize();
await db.add('doc-1', text, metadata);
} else {
// Unknown error: rethrow
throw error;
}
}
try {
await db.search(query);
} catch (error) {
if (error instanceof VectoriaError) {
switch (error.code) {
case 'NOT_INITIALIZED':
await db.initialize();
break;
case 'QUERY_VALIDATION_ERROR':
console.error('Invalid query:', error.message);
break;
default:
throw error;
}
}
}
async function addDocumentsSafely(documents: Array<{ id: string; text: string; metadata: T }>) {
try {
await db.addMany(documents);
} catch (error) {
if (error instanceof DuplicateDocumentError) {
// Remove duplicate and retry
const uniqueDocs = documents.filter((doc) => doc.id !== error.documentId);
await db.addMany(uniqueDocs);
console.warn(`Skipped duplicate: ${error.documentId}`);
} else if (error instanceof DocumentValidationError) {
// Log validation error and continue with valid documents
console.error(`Invalid document ${error.documentId}:`, error.message);
// Filter out invalid document and retry
const validDocs = documents.filter((doc) => doc.id !== error.documentId);
await db.addMany(validDocs);
} else {
throw error; // Unexpected error
}
}
}
async function searchWithFallback(query: string) {
try {
return await db.search(query);
} catch (error) {
if (error instanceof QueryValidationError) {
// Fallback to default search
console.warn('Invalid query, using default search');
return await db.search('default query', { threshold: 0.1 });
} else if (error instanceof VectoriaNotInitializedError) {
// Initialize and retry
await db.initialize();
return await db.search(query);
}
throw error;
}
}
| Error Class | Code | When Thrown |
|---|---|---|
VectoriaNotInitializedError | NOT_INITIALIZED | Operation before initialize() |
DocumentValidationError | DOCUMENT_VALIDATION_ERROR | Empty text, metadata mismatch |
DocumentNotFoundError | DOCUMENT_NOT_FOUND | Update/get non-existent document |
DocumentExistsError | DOCUMENT_EXISTS | Add document with existing ID |
DuplicateDocumentError | DUPLICATE_DOCUMENT | Duplicate in batch or existing document |
QueryValidationError | QUERY_VALIDATION_ERROR | Empty query, invalid topK/threshold |
EmbeddingError | EMBEDDING_ERROR | Embedding generation failure |
StorageError | STORAGE_ERROR | Storage operation failure |
ConfigurationError | CONFIGURATION_ERROR | Invalid configuration |
ErrordocumentId, context, etc.) for debuggingVectoriaNotInitializedErrorMemory efficient with Float32 arrays:
Example: 10,000 documents ≈ 25 MB
Without HNSW (brute-force):
With HNSW (approximate nearest neighbor):
interface ToolMetadata extends DocumentMetadata {
id: string;
toolName: string;
category: string;
}
const db = new VectoriaDB<ToolMetadata>();
await db.initialize();
await db.addMany([
{ id: 'tool-1', text: 'Create user accounts', metadata: { id: 'tool-1', toolName: 'create_user', category: 'auth' } },
{ id: 'tool-2', text: 'Send emails', metadata: { id: 'tool-2', toolName: 'send_email', category: 'notification' } },
]);
const results = await db.search('how to add new users');
// Returns: [{ metadata: { toolName: 'create_user', ... }, score: 0.89 }]
interface DocMetadata extends DocumentMetadata {
id: string;
title: string;
section: string;
url: string;
}
const db = new VectoriaDB<DocMetadata>();
// Add documentation pages
// Search with natural language
interface ProductMetadata extends DocumentMetadata {
id: string;
name: string;
category: string;
price: number;
}
const db = new VectoriaDB<ProductMetadata>();
// Add products with descriptions
// Search: "affordable wireless headphones"
VectoriaDB comes with comprehensive tests covering all major functionality:
# Run tests
npm test
# Run tests with coverage
npm run test:coverage
The test suite includes:
All tests use mocked transformers.js to avoid downloading models during CI/CD, making tests fast and reliable.
| Feature | VectoriaDB | Pinecone | Weaviate | ChromaDB |
|---|---|---|---|---|
| In-memory | ✅ | ❌ | ❌ | ✅ |
| Lightweight | ✅ (22MB) | ❌ | ❌ | ⚠️ |
| Type-safe | ✅ | ⚠️ | ⚠️ | ⚠️ |
| Zero config | ✅ | ❌ | ❌ | ✅ |
| Production-ready | ✅ | ✅ | ✅ | ✅ |
| Persistence | ❌ | ✅ | ✅ | ✅ |
| Distributed | ❌ | ✅ | ✅ | ❌ |
VectoriaDB is ideal for:
Contributions are welcome! Please open an issue or submit a pull request.
Apache-2.0
Built with:
FAQs
VectoriaDB - A lightweight, production-ready in-memory vector database for semantic search
We found that vectoriadb demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Socket CEO Feross Aboukhadijeh joins 10 Minutes or Less, a podcast by Ali Rohde, to discuss the recent surge in open source supply chain attacks.

Research
/Security News
Campaign of 108 extensions harvests identities, steals sessions, and adds backdoors to browsers, all tied to the same C2 infrastructure.

Security News
OpenAI rotated macOS signing certificates after a malicious Axios package reached its CI pipeline in a broader software supply chain attack.