🚨 Shai-Hulud Strikes Again:834 Packages Compromised.Technical Analysis →
Socket
Book a DemoInstallSign in
Socket

embeddb

Package Overview
Dependencies
Maintainers
0
Versions
7
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

embeddb

A vector-based tag system for efficient similarity search and retrieval

latest
Source
npmnpm
Version
0.1.10
Version published
Maintainers
0
Created
Source

EmbedDB

English | 繁體中文 | 日本語 | 한국어 | Español | Français | Deutsch

Hey there! Welcome to EmbedDB! This is a super cool vector-based tag system written in TypeScript. It makes similarity searching as easy as having an AI assistant helping you find stuff!

Features

  • Powerful vector-based similarity search
  • Weighted tags with confidence scores (You say it's important? It's important!)
  • Category weights for fine-tuned search (Control which categories matter more!)
  • Batch operations (Handle lots of data at once, super efficient!)
  • Built-in query caching (Repeated queries? Lightning fast!)
  • Full TypeScript support (Type-safe, developer-friendly!)
  • Memory-efficient sparse vector implementation (Your RAM will thank you!)
  • Import/Export functionality (Save and restore your indexes!)
  • Pagination support with filter-first approach (Get filtered results in chunks!)
  • Advanced filtering system (Filter first, sort by similarity!)

Quick Start

First, install the package:

npm install embeddb

Let's see it in action:

import { TagVectorSystem, Tag, IndexTag } from 'embeddb';

// Create a new system
const system = new TagVectorSystem();

// Define our tag universe
const tags: IndexTag[] = [
    { category: 'color', value: 'red' },    // Red is rad!
    { category: 'color', value: 'blue' },   // Blue is cool!
    { category: 'size', value: 'large' }    // Size matters!
];

// Build the tag index (important step!)
system.buildIndex(tags);

// Add an item with its tags and confidence scores
const item = {
    id: 'cool-item-1',
    tags: [
        { category: 'color', value: 'red', confidence: 1.0 },   // 100% sure it's red!
        { category: 'size', value: 'large', confidence: 0.8 }   // Pretty sure it's large
    ]
};
system.addItem(item);

// Set category weights to prioritize color matches
system.setCategoryWeight('color', 2.0); // Color matches are twice as important

// Let's find similar items
const query = {
    tags: [
        { category: 'color', value: 'red', confidence: 0.9 }
    ]
};

// Query with pagination
const results = system.query(query.tags, { page: 1, size: 10 }); // Get first 10 results

// Export the index for later use
const exportedData = system.exportIndex();

// Import the index in another instance
const newSystem = new TagVectorSystem();
newSystem.importIndex(exportedData);

API Reference

TagVectorSystem Class

This is our superhero! It handles all the operations.

Core Methods

  • buildIndex(tags: IndexTag[]): Build your tag universe

    // Define your tag world!
    system.buildIndex([
      { category: 'color', value: 'red' },
      { category: 'style', value: 'modern' }
    ]);
    
  • addItem(item: ItemTags): Add a single item

    // Add something awesome
    system.addItem({
      id: 'awesome-item',
      tags: [
        { category: 'color', value: 'red', confidence: 1.0 }
      ]
    });
    
  • addItemBatch(items: ItemTags[], batchSize?: number): Batch add items

    // Add multiple items at once for better performance!
    system.addItemBatch([item1, item2, item3], 10);
    
  • query(tags: Tag[], options?: QueryOptions): Search for similar items

    // Find similar stuff
    const results = system.query([
      { category: 'style', value: 'modern', confidence: 0.9 }
    ], { page: 1, size: 20 });
    
  • queryFirst(tags: Tag[]): Get the most similar item

    // Just get the best match
    const bestMatch = system.queryFirst([
      { category: 'color', value: 'red', confidence: 1.0 }
    ]);
    
  • getStats(): Get system statistics

    // Check out the system stats
    const stats = system.getStats();
    console.log(`Total items: ${stats.totalItems}`);
    
  • exportIndex() & importIndex(): Export/Import index data

    // Save your data for later
    const data = system.exportIndex();
    // ... later ...
    system.importIndex(data);
    
  • setCategoryWeight(category: string, weight: number): Set category weight

    // Make color matches twice as important
    system.setCategoryWeight('color', 2.0);
    

Development

Want to contribute? Awesome! Here are some handy commands:

# Install dependencies
npm install

# Build the project
npm run build

# Run tests (we love testing!)
npm test

# Check code style
npm run lint

# Make the code pretty
npm run format

How It Works

EmbedDB uses vector magic to make similarity search possible:

  • Tag Indexing:

    • Each category-value pair gets mapped to a unique vector position
    • This lets us transform tags into numerical vectors
  • Vector Transformation:

    • Item tags are converted into sparse vectors
    • Confidence scores are used as vector weights
  • Similarity Calculation:

    • Uses cosine similarity to measure vector relationships
    • This helps us find the most similar items
  • Performance Optimizations:

    • Sparse vectors for memory efficiency
    • Query caching for speed
    • Batch operations for better throughput

Technical Details

Under the hood, EmbedDB uses several clever techniques:

  • Sparse Vector Implementation

    • Only stores non-zero values
    • Reduces memory footprint
    • Perfect for tag-based systems where most values are zero
  • Cosine Similarity

    • Measures angle between vectors
    • Range: -1 to 1 (we normalize to 0 to 1)
    • Used only for sorting, not filtering
    • Ideal for high-dimensional sparse spaces
  • Filter-First Architecture

    • Filters are applied before similarity calculation
    • Results quantity determined by filters only
    • Similarity scores used purely for sorting
    • Efficient for large datasets
  • Category Weight Management

    • Fine-grained control over category importance
    • Individual and batch weight updates
    • Default weights for unknown categories
    • Automatic cache invalidation on weight changes

License

MIT License - Go wild, build awesome stuff!

Need Help?

Got questions or suggestions? We'd love to hear from you:

  • Open an Issue
  • Submit a PR

Let's make EmbedDB even more awesome!

Star Us!

If you find EmbedDB useful, give us a star! It helps others discover this project and motivates us to keep improving it!

Keywords

vector

FAQs

Package last updated on 15 Dec 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts