Semantic routing uses vector embeddings to make fast routing decisions based on semantic meaning, rather than relying on slower LLM calls or brittle keyword matching. This enables you to quickly route user queries to the appropriate handler or function based on what the query means, not just what words it contains.

Features

Fast Routing: Uses vector similarity (cosine similarity) for instant route matching
Multiple Embedding Providers: OpenAI API and local Transformers.js models
Offline Support: Use Transformers.js for completely offline, local embeddings
Type-Safe: Written in TypeScript with full type definitions
Flexible Aggregation: Max, mean, or sum aggregation strategies
Dynamic Routes: Add or remove routes at runtime
Top-K Routing: Get multiple best matches
Custom Thresholds: Set confidence thresholds per route or globally
Zero Dependencies: Embedding providers (OpenAI and Transformers.js) are optional peer dependencies

Installation

npm install semantic-node-router

Embedding Providers

Both embedding providers are optional. Install the one you want to use:

# For OpenAI (cloud-based, requires API key)
npm install openai

# For Transformers.js (local, offline, no API key needed)
npm install @huggingface/transformers

Choosing a Provider:

OpenAI: Higher accuracy, requires internet and API key, ~350ms latency, ~$0.002/1K requests
Transformers.js: Free, offline, ~10-50ms latency, ~80MB model download

Quick Start

import { Router, Route, OpenAIEncoder } from 'semantic-node-router';

// 1. Create an encoder
const encoder = new OpenAIEncoder({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small'
});

// 2. Define your routes
const routes = [
  new Route({
    name: 'greeting',
    utterances: ['hello', 'hi there', 'hey', 'good morning']
  }),
  new Route({
    name: 'farewell',
    utterances: ['goodbye', 'bye', 'see you later']
  }),
  new Route({
    name: 'technical_support',
    utterances: [
      'my app is crashing',
      'I got an error',
      'something is broken'
    ]
  })
];

// 3. Create and initialize router
const router = new Router({
  routes,
  encoder
});

await router.initialize(); // Encodes all utterances

// 4. Route queries
const result = await router.route('The app is not working');

console.log(result);
// {
//   route: 'technical_support',
//   score: 0.87
// }

API Reference

Encoders

OpenAIEncoder

import { OpenAIEncoder } from 'semantic-node-router';

const encoder = new OpenAIEncoder({
  apiKey: 'your-api-key', // Or set OPENAI_API_KEY env var
  model: 'text-embedding-3-small', // Default
  scoreThreshold: 0.3, // Default
  dimensions: undefined, // Optional, for embedding-3 models
  maxRetries: 3 // Default
});

Supported Models:

text-embedding-3-small (default) - Fast and efficient
text-embedding-3-large - Higher accuracy
text-embedding-ada-002 - Legacy model

TransformersEncoder

Use local Hugging Face models for offline, free embeddings with no API key required.

import { TransformersEncoder } from 'semantic-node-router';

const encoder = new TransformersEncoder({
  modelName: 'Xenova/all-MiniLM-L6-v2', // Default - fast and lightweight
  quantized: true, // Default - uses smaller quantized models
  scoreThreshold: 0.5, // Default
  cacheDir: './models', // Optional - custom model cache directory
  device: 'cpu' // Default - 'cpu' or 'gpu'
});

// IMPORTANT: Must initialize before use
await encoder.initialize(); // Loads model (1-10s, one-time)

Supported Models:

Xenova/all-MiniLM-L6-v2 (default) - 384-dim, ~80MB quantized, best balance
Xenova/all-mpnet-base-v2 - 768-dim, ~160MB quantized, higher accuracy
Any Sentence Transformer model from Hugging Face

Key Features:

✅ No API key needed - completely free
✅ Offline operation - works without internet after initial download
✅ Fast inference - 10-50ms vs 350ms for OpenAI
✅ Privacy - data never leaves your machine
⚠️ Model download - First run downloads ~80-160MB model
⚠️ Memory usage - Model stays loaded in memory (~200-400MB)

Example Usage:

import { Router, Route, TransformersEncoder } from 'semantic-node-router';

// Create and initialize encoder
const encoder = new TransformersEncoder();
await encoder.initialize(); // Load model first!

// Create router
const router = new Router({ routes, encoder });
await router.initialize(); // Encode utterances

// Route queries (fast!)
const result = await router.route('my query'); // ~10-50ms

Route

import { Route } from 'semantic-node-router';

const route = new Route({
  name: 'route-name', // Required: unique identifier
  utterances: ['example 1', 'example 2'], // Required: example phrases
  description: 'What this route handles', // Optional
  scoreThreshold: 0.7, // Optional: override encoder default
  metadata: { custom: 'data' } // Optional: custom metadata
});

Methods:

setEmbeddings(embeddings: number[][]) - Set pre-computed embeddings
getEmbeddings() - Get embeddings (throws if not set)
hasEmbeddings() - Check if embeddings are available
toJSON() - Serialize to JSON
Route.fromJSON(json) - Deserialize from JSON

Router

import { Router } from 'semantic-node-router';

const router = new Router({
  routes: [route1, route2], // Required: array of routes
  encoder: encoder, // Required: encoder instance
  aggregationMethod: 'max', // Optional: 'max' | 'mean' | 'sum' (default: 'max')
  topK: 1 // Optional: default number of top matches to return
});

Methods:

`initialize()`

Initialize the router by encoding all route utterances. Must be called before routing.

await router.initialize();

`route(query: string)`

Route a query to the best matching route.

const result = await router.route('my query');
// {
//   route: 'route-name' | null,
//   score: 0.87
// }

Returns null route if no match exceeds the threshold.

`routeTopK(query: string, k?: number)`

Get top K matching routes.

const matches = await router.routeTopK('my query', 3);
// [
//   { route: 'route1', score: 0.92 },
//   { route: 'route2', score: 0.78 },
//   { route: 'route3', score: 0.65 }
// ]

`addRoute(route: Route)`

Dynamically add a new route.

await router.addRoute(new Route({
  name: 'new-route',
  utterances: ['example']
}));

`removeRoute(routeName: string)`

Remove a route by name.

const removed = router.removeRoute('route-name'); // Returns boolean

`getRoutes()`

Get all routes (returns a copy).

const routes = router.getRoutes();

Configuration

Aggregation Methods

When a route has multiple utterances, how should similarities be combined?

max (default): Use the highest similarity score
- Best for: Most use cases, routes with varied examples
- Example: If utterances score 0.8, 0.6, 0.9 → route score is 0.9
mean: Average all similarity scores
- Best for: When you want consistent performance across all examples
- Example: If utterances score 0.8, 0.6, 0.9 → route score is 0.77
sum: Sum all similarity scores
- Best for: Routes with many examples (rewards coverage)
- Example: If utterances score 0.8, 0.6, 0.9 → route score is 2.3

const router = new Router({
  routes,
  encoder,
  aggregationMethod: 'mean' // or 'max' or 'sum'
});

Thresholds

Control routing confidence with thresholds:

// Global threshold (applies to all routes)
const encoder = new OpenAIEncoder({
  scoreThreshold: 0.5 // Stricter matching
});

// Per-route threshold (overrides global)
const route = new Route({
  name: 'sensitive-action',
  utterances: ['delete my account'],
  scoreThreshold: 0.9 // Require very high confidence
});

Threshold Guidelines:

0.3 - Very loose matching, many false positives
0.5 - Balanced (good default)
0.7 - Stricter, fewer false positives
0.9 - Very strict, only near-exact semantic matches

Note: These threshold values are guidelines based on OpenAI's embedding models. Different embedding models and dimensions may produce different similarity score ranges. Always experiment with your specific use case and model to find the optimal threshold.

Best Practices for Utterances

The quality of your utterances directly impacts routing accuracy. Follow these guidelines:

Include Diverse Variations

Provide multiple ways users might express the same intent:

new Route({
  name: 'check_balance',
  utterances: [
    // Formal
    'What is my account balance?',
    'Please show my current balance',

    // Informal
    'how much money do I have',
    'what's my balance',
    'check my balance',

    // Different phrasings
    'I want to see my balance',
    'Can you tell me my balance?',
    'balance inquiry'
  ]
})

Use Synonyms and Alternative Terms

Include different terminology for the same concept:

new Route({
  name: 'technical_support',
  utterances: [
    'my app is broken',
    'the application crashed',
    'software not working',
    'program has an error',
    'getting a bug',
    'experiencing a glitch',
    'system malfunction'
  ]
})

Consider Common Misspellings and Typos

While embeddings handle some typos naturally, include common variations:

new Route({
  name: 'password_reset',
  utterances: [
    'reset my password',
    'forgot password',
    'password recovery',
    'cant login',      // missing apostrophe
    'can\'t log in',
    'cannot sign in'
  ]
})

Include Edge Cases and Fragments

Add short phrases and incomplete sentences users might type:

new Route({
  name: 'help',
  utterances: [
    'help',
    'need help',
    'can you help me',
    'I need assistance',
    'support please',
    'stuck'
  ]
})

Avoid Ambiguity Between Routes

Ensure utterances are semantically distinct from other routes:

// ❌ BAD: Too similar across routes
new Route({
  name: 'order_status',
  utterances: ['check my order', 'order information']
}),
new Route({
  name: 'order_history',
  utterances: ['view my orders', 'order information'] // Duplicate!
})

// ✅ GOOD: Clear semantic differences
new Route({
  name: 'order_status',
  utterances: [
    'where is my order',
    'track my package',
    'order status',
    'has my order shipped'
  ]
}),
new Route({
  name: 'order_history',
  utterances: [
    'past orders',
    'previous purchases',
    'order history',
    'all my orders'
  ]
})

Recommended Utterance Count

Minimum: 3-5 utterances per route
Optimal: 8-15 utterances per route
Maximum: Consider splitting into multiple routes if you need more than 30

More utterances improve coverage but increase initialization time and memory usage.

Examples

See the examples/ directory for complete examples:

basic-routing.ts - Core functionality demonstration
openai-example.ts - Customer support routing with OpenAI
transformers-example.ts - Local offline routing with Transformers.js

Use Cases

Customer Support Routing

const routes = [
  new Route({ name: 'billing', utterances: ['payment issue', 'charged twice'] }),
  new Route({ name: 'technical', utterances: ['app crashed', 'error message'] }),
  new Route({ name: 'account', utterances: ['reset password', 'login problem'] })
];

Chatbot Intent Classification

const routes = [
  new Route({ name: 'book_flight', utterances: ['book a flight to Paris'] }),
  new Route({ name: 'check_weather', utterances: ['what\'s the weather'] }),
  new Route({ name: 'set_reminder', utterances: ['remind me to call'] })
];

Content Categorization

const routes = [
  new Route({ name: 'tech', utterances: ['latest smartphone', 'AI news'] }),
  new Route({ name: 'sports', utterances: ['football match', 'Olympics'] }),
  new Route({ name: 'politics', utterances: ['election results', 'policy'] })
];

Advanced Usage

Custom Encoder

Implement your own encoder:

import { BaseEncoder } from 'semantic-node-router';

class CustomEncoder extends BaseEncoder {
  name = 'my-encoder';
  scoreThreshold = 0.5;

  async encode(texts: string | string[]): Promise<number[][]> {
    // Your embedding logic
    return embeddings;
  }
}

Batch Processing

Process many queries efficiently:

await router.initialize();

const queries = ['query1', 'query2', /* ... */ 'query1000'];
const results = await Promise.all(
  queries.map(q => router.route(q))
);

Route Metadata

Store custom data with routes:

const route = new Route({
  name: 'support',
  utterances: ['help me'],
  metadata: {
    department: 'customer-service',
    priority: 'high',
    handlerFunction: 'handleSupport'
  }
});

// Access later
const match = await router.route('I need help');
const route = router.getRoutes().find(r => r.name === match.route);
console.log(route?.metadata);

Error Handling

The library provides specific error types for different failure scenarios, allowing you to handle errors appropriately.

Error Types

import {
  SemanticRouterError,        // Base error class
  RouterConfigurationError,   // Invalid router configuration
  RouterNotInitializedError,  // Router used before initialization
  EncodingError,              // Generic encoding failure
  RateLimitError,             // API rate limit exceeded
  AuthenticationError,        // API authentication failed
  ValidationError             // Invalid input
} from 'semantic-node-router';

Common Error Scenarios

1. Router Configuration Errors

import { Router, RouterConfigurationError } from 'semantic-node-router';

try {
  // Missing encoder
  const router = new Router({ routes: [], encoder: null });
} catch (error) {
  if (error instanceof RouterConfigurationError) {
    console.error('Configuration error:', error.message);
    // Handle: Check router configuration
  }
}

2. Missing API Key

import { OpenAIEncoder, AuthenticationError } from 'semantic-node-router';

try {
  const encoder = new OpenAIEncoder({ apiKey: '' });
} catch (error) {
  if (error instanceof AuthenticationError) {
    console.error('Authentication error:', error.message);
    console.error('Provider:', error.provider); // 'openai'
    // Handle: Set OPENAI_API_KEY environment variable
  }
}

3. Initialization Failures

import { Router, EncodingError, RateLimitError } from 'semantic-node-router';

try {
  await router.initialize();
} catch (error) {
  if (error instanceof RateLimitError) {
    console.error('Rate limit hit:', error.message);
    console.error('Retry after:', error.retryAfter, 'seconds');
    // Handle: Wait and retry
    await new Promise(resolve => setTimeout(resolve, error.retryAfter * 1000));
    await router.initialize();
  } else if (error instanceof EncodingError) {
    console.error('Encoding failed:', error.message);
    console.error('Provider:', error.provider);
    console.error('Original error:', error.cause);
    // Handle: Check network, API status, or retry
  }
}

4. Router Not Initialized

import { Router, RouterNotInitializedError } from 'semantic-node-router';

try {
  // Forgot to call initialize()
  const result = await router.route('my query');
} catch (error) {
  if (error instanceof RouterNotInitializedError) {
    console.error('Router not ready:', error.message);
    // Handle: Call initialize() first
    await router.initialize();
    const result = await router.route('my query');
  }
}

5. Rate Limiting with Retry Logic

async function initializeWithRetry(router: Router, maxAttempts = 3) {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      await router.initialize();
      console.log('Router initialized successfully');
      return;
    } catch (error) {
      if (error instanceof RateLimitError) {
        const delay = error.retryAfter || 5;
        console.log(`Rate limited. Retrying in ${delay}s... (attempt ${attempt}/${maxAttempts})`);
        await new Promise(resolve => setTimeout(resolve, delay * 1000));
      } else if (error instanceof AuthenticationError) {
        // Don't retry auth errors
        console.error('Authentication failed:', error.message);
        throw error;
      } else {
        console.error('Initialization failed:', error);
        if (attempt === maxAttempts) throw error;
      }
    }
  }
}

Built-in Retry Logic

The OpenAI encoder includes automatic retry with exponential backoff for:

Rate limit errors (429): Retries automatically
Server errors (503): Retries with backoff
Timeouts: Retries up to maxRetries times

Non-retryable errors:

Authentication errors (401)
Invalid requests (400)
Validation errors

Configure retry behavior:

const encoder = new OpenAIEncoder({
  apiKey: process.env.OPENAI_API_KEY,
  maxRetries: 5,  // Default: 3
});

Error Handling Best Practices

Always handle initialization errors: Network issues, rate limits, and auth failures can occur
Check for specific error types: Use instanceof to handle different errors appropriately
Don't retry authentication errors: Fix the API key instead
Log error details: The cause property contains the original error for debugging
Use try-catch blocks: Especially around initialize(), route(), and addRoute()

Performance

We benchmarked semantic routing against LLM-based routing across 56 test cases with varying difficulty levels.

Results Summary

Method	Avg Latency	P95 Latency	Accuracy	Cost/1000 req
Local embeddings (Transformers.js)	~5ms	~10ms	77%	Free
OpenAI embeddings (text-embedding-3-small)	~320ms	~565ms	89%	$0.002
LLM routing (gpt-4o-mini)	~450ms	~620ms	81%	$0.048

Key Findings

Local embeddings (Transformers.js) deliver:

⚡ ~70x faster than OpenAI embeddings
⚡ ~90x faster than LLM routing
💰 Zero API costs
✅ 100% accuracy on straightforward queries
⚠️ Lower accuracy on nuanced/ambiguous queries (77% overall)

OpenAI embeddings deliver:

🎯 Best overall accuracy (89%)
⚡ ~30% faster than LLM routing
💰 23x cheaper than LLM routing

Accuracy by Query Complexity

Complexity	Local	OpenAI	LLM
Easy (exact matches)	100%	100%	92%
Medium (paraphrases)	71%	96%	79%
Hard (ambiguous)	57%	64%	71%

Choosing the Right Approach

Use Case	Recommended	Why
High-volume, cost-sensitive	Local embeddings	Free, <5ms latency
Production with clear intents	Local embeddings	Speed + accuracy on typical queries
Complex/ambiguous routing	OpenAI embeddings	Best accuracy
Maximum accuracy on edge cases	LLM routing	Reasoning capability
Offline/edge deployment	Local embeddings	No network required

Hybrid Approach

For optimal results, consider a hybrid strategy:

async function smartRoute(query: string) {
  // Fast first-pass with local embeddings
  const localResult = await localRouter.route(query);

  // If confidence is high, use it
  if (localResult.score > 0.85) {
    return localResult;
  }

  // Fall back to OpenAI for uncertain cases
  return await openaiRouter.route(query);
}

This gives you <5ms latency for ~80% of queries while maintaining high accuracy.

Benchmarks performed with Xenova/all-MiniLM-L6-v2 (local), text-embedding-3-small (OpenAI), and gpt-4o-mini (LLM). Results may vary based on hardware, network conditions, and query distribution.

Testing

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run with UI
npm run test:ui

# Type checking
npm run type-check

Building

# Build the package
npm run build

# This creates:
# - dist/index.js (ESM)
# - dist/index.cjs (CommonJS)
# - dist/index.d.ts (TypeScript types)

Contributing

Contributions are welcome! This is a community project to bring semantic routing to Node.js.

Development Setup

git clone https://github.com/your-username/semantic-node-router.git
cd semantic-node-router
npm install
npm test

License

MIT

Roadmap

Local embedding models (Transformers.js support) ✅
Caching layer for embeddings (LRU cache) ✅
Vector database integration (Pinecone, Qdrant, Weaviate)
Dynamic routes with function calling
Multi-modal support (text + images)
Additional embedding providers (Cohere, Vertex AI)
Automatic threshold optimization
CLI tool for testing routes
Streaming support for real-time routing

Support

Keywords

FAQs

What is semantic-node-router?

Is semantic-node-router well maintained?

Package last updated on 20 Dec 2025

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

semantic-node-router