Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

@ruvector/ruvllm

Package Overview
Dependencies
Maintainers
1
Versions
14
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@ruvector/ruvllm

Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, FastGRNN routing, and SIMD inference

Source
npmnpm
Version
2.4.1
Version published
Weekly downloads
144K
24.41%
Maintainers
1
Weekly downloads
 
Created
Source

@ruvector/ruvllm

The First Purpose-Built LLM Runtime for Claude Code Agent Orchestration

100% Routing Accuracy | Sub-Millisecond Inference | Self-Learning

npm Downloads License Tests

Quick Start | RLM | Training | Models | API

What is @ruvector/ruvllm?

@ruvector/ruvllm is a TypeScript/JavaScript SDK for intelligent LLM orchestration, specifically designed for Claude Code and multi-agent systems. It provides:

  • RLM (Recursive Language Model) - Break complex queries into sub-queries, synthesize coherent answers
  • 100% Routing Accuracy - Hybrid keyword + embedding strategy for perfect agent selection
  • SONA Self-Learning - Model improves with every successful interaction
  • SIMD Acceleration - AVX2/NEON optimized inference

Why @ruvector/ruvllm?

ChallengeTraditional Approach@ruvector/ruvllm Solution
Agent selectionManual or keyword-basedSemantic + keyword hybrid = 100%
Complex queriesSingle-shot RAGRecursive decomposition + synthesis
Response latency2-5 seconds<1ms cache, 50-200ms full
LearningStatic modelsSelf-improving (SONA)
Cost per route$0.01+ (API call)$0 (local inference)

Installation

npm install @ruvector/ruvllm

Quick Start

import { RuvLLM, RlmController } from '@ruvector/ruvllm';

// Simple LLM inference
const llm = new RuvLLM({
  modelPath: '~/.ruvllm/models/ruvltra-claude-code-0.5b-q4_k_m.gguf',
  sonaEnabled: true,
});

const response = await llm.query('Explain quantum computing');
console.log(response.text);

// Recursive Language Model for complex queries
const rlm = new RlmController({ maxDepth: 5 });
const answer = await rlm.query('What are the causes AND solutions for slow API responses?');
// Automatically decomposes into sub-queries, retrieves context, synthesizes answer

Core Features

1. Claude Code Native Routing

Built by Claude Code, for Claude Code. Routes tasks to 60+ agent types:

import { RuvLLM } from '@ruvector/ruvllm';

const llm = new RuvLLM({ model: 'ruv/ruvltra' });

// Intelligent routing
const route = await llm.route('implement OAuth2 authentication');
console.log(route.agent);      // 'security-architect'
console.log(route.confidence); // 0.98
console.log(route.tier);       // 2 (Haiku-level complexity)

// Multi-agent teams for complex tasks
const team = await llm.routeComplex('build full-stack app with auth');
// Returns: [system-architect, backend-dev, coder, security-architect, tester]

2. 3-Tier Intelligent Routing

┌─────────────────────────────────────────────────────────┐
│                    User Request                         │
└─────────────────────┬───────────────────────────────────┘
                      ↓
              [RuvLTRA Routing]
                      ↓
        ┌─────────────┼─────────────┐
        ↓             ↓             ↓
┌───────────┐  ┌───────────┐  ┌───────────┐
│  Tier 1   │  │  Tier 2   │  │  Tier 3   │
│  Booster  │  │   Haiku   │  │   Opus    │
│   <1ms    │  │  ~500ms   │  │   2-5s    │
│    $0     │  │  $0.0002  │  │  $0.015   │
└───────────┘  └───────────┘  └───────────┘

3. Self-Learning (SONA)

Every successful interaction improves the model:

// First routing: Full inference
llm.route('implement OAuth2') → security-architect (97%)

// Later: Pattern hit in <25μs (learned from success)
llm.route('add OAuth2 flow') → security-architect (99%, cached pattern)

RLM (Recursive Language Model)

RLM provides recursive query decomposition - unlike traditional RAG that retrieves once, RLM breaks complex questions into sub-queries and synthesizes coherent answers.

How It Works

Query: "What are the causes AND solutions for slow API responses?"
                              ↓
                    [Decomposition]
                    /            \
    "Causes of slow API?"    "Solutions for slow API?"
           ↓                        ↓
    [Sub-answers]            [Sub-answers]
           \                        /
                    [Synthesis]
                         ↓
            Coherent combined answer with sources

Basic Usage

import { RlmController } from '@ruvector/ruvllm';

const rlm = new RlmController({
  maxDepth: 5,
  retrievalTopK: 10,
  enableCache: true,
});

// Add knowledge to memory
await rlm.addMemory('TypeScript adds static typing to JavaScript.');
await rlm.addMemory('React is a library for building user interfaces.');

// Query with recursive retrieval
const answer = await rlm.query('What are causes and solutions for type errors in React?');
console.log(answer.text);           // Comprehensive synthesized answer
console.log(answer.sources);        // Source attributions
console.log(answer.qualityScore);   // 0.0-1.0
console.log(answer.confidence);     // Routing confidence

Streaming

for await (const event of rlm.queryStream('Explain machine learning')) {
  if (event.type === 'token') {
    process.stdout.write(event.text);
  } else {
    console.log('\n\nQuality:', event.answer.qualityScore);
  }
}

With Self-Reflection

const rlm = new RlmController({
  enableReflection: true,
  maxReflectionIterations: 2,
  minQualityScore: 0.8,
});

// Answers are iteratively refined until quality >= 0.8
const answer = await rlm.query('Complex multi-part technical question...');

RLM Configuration

interface RlmConfig {
  maxDepth?: number;              // Max recursion depth (default: 3)
  maxSubQueries?: number;         // Max sub-queries per level (default: 5)
  tokenBudget?: number;           // Token budget (default: 4096)
  enableCache?: boolean;          // Enable caching (default: true)
  cacheTtl?: number;              // Cache TTL in ms (default: 300000)
  retrievalTopK?: number;         // Memory spans to retrieve (default: 10)
  minQualityScore?: number;       // Min quality threshold (default: 0.7)
  enableReflection?: boolean;     // Enable self-reflection (default: false)
  maxReflectionIterations?: number; // Max reflection loops (default: 2)
}

Unique Capabilities

1. Memory-Augmented Routing

Every successful routing is stored in HNSW-indexed memory for instant recall:

// First time: Full inference (~50ms)
route("implement OAuth2") → security-architect (97% confidence)

// Later: Memory hit (<25μs)
route("add OAuth2 flow") → security-architect (99% confidence, cached)

2. Confidence-Aware Escalation

// Low confidence automatically escalates
Confidence > 0.9Use recommended agent
Confidence 0.7-0.9Use with human confirmation
Confidence < 0.7Escalate to higher tier

3. Batch SIMD Operations

import { simd } from '@ruvector/ruvllm/simd';

// 4x faster vector operations with AVX2/NEON
const similarity = simd.batchCosineSimilarity(query, targets);
const attended = simd.flashAttention(q, k, v, scale);

4. Zero-Copy Caching

Arc-based string interning for 100-1000x faster cache hits on large responses.

Performance

Benchmarks (M4 Pro)

OperationLatencyThroughput
Query decomposition340 ns2.9M/s
Cache lookup23.5 ns42.5M/s
Embedding (384d)293 ns3.4M/s
Memory search (10k)0.4 ms2.5K/s
End-to-end routing<1 ms1K+/s
Full RLM query50-200 ms5-20/s

Routing Accuracy

StrategyRuvLTRAQwen BaseOpenAI
Embedding Only45%40%52%
Keyword Only78%78%N/A
Hybrid100%95%N/A

Test Results

145 tests passing
  - RLM Controller: 24 tests
  - Routing Accuracy: 18 tests
  - Contrastive Training: 15 tests
  - SIMD Operations: 22 tests
  - SONA Learning: 19 tests
  - Memory/HNSW: 21 tests
  - Benchmarks: 26 tests

Models

HuggingFace Repository

URL: https://huggingface.co/ruv/ruvltra

Available Models

ModelSizePurposeAccuracy
ruvltra-claude-code-0.5b-q4_k_m398 MBAgent routing100% (hybrid)
ruvltra-small-0.5b-q4_k_m~400 MBEmbeddings-
ruvltra-medium-1.1b-q4_k_m~1 GBFull inference-

Download Models

// Programmatic
import { downloadModel } from '@ruvector/ruvllm';
await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' });

// CLI
ruvllm download ruv/ruvltra

Auto-Download

Models are automatically downloaded on first use:

const llm = new RuvLLM({ model: 'ruv/ruvltra' });
// Downloads to ~/.ruvllm/models/ if not present

Training

Generate Routing Dataset

node scripts/training/routing-dataset.js
# Output: 381 examples, 793 contrastive pairs, 156 hard negatives

Contrastive Fine-tuning

import { ContrastiveTrainer } from '@ruvector/ruvllm';

const trainer = new ContrastiveTrainer({
  modelPath: './models/base.gguf',
  loraRank: 8,
  loraAlpha: 16,
  learningRate: 1e-4,
});

const pairs = [
  { anchor: 'Fix auth bug', positive: 'coder', negative: 'researcher' },
  // ... more pairs
];

await trainer.train(pairs, { epochs: 10 });
await trainer.save('./adapters/routing-lora');

Training Scripts

ScriptDescription
routing-dataset.jsGenerate 381 routing examples
claude-code-synth.jsSynthetic data generation
contrastive-finetune.jsLoRA fine-tuning pipeline
rlm-dataset.jsRLM training data (500 examples)

API Reference

RuvLLM Class

class RuvLLM {
  constructor(config?: RuvLLMConfig);

  query(prompt: string, params?: GenerateParams): Promise<Response>;
  stream(prompt: string, params?: GenerateParams): AsyncIterable<string>;
  route(task: string): Promise<RoutingResult>;
  routeComplex(task: string): Promise<AgentTeam[]>;

  loadModel(path: string): Promise<void>;
  addMemory(text: string, metadata?: object): number;
  searchMemory(query: string, topK?: number): MemoryResult[];

  sonaStats(): SonaStats | null;
  adapt(input: Float32Array, quality: number): void;
}

RlmController Class

class RlmController {
  constructor(config?: RlmConfig, engine?: RuvLLM);

  query(input: string): Promise<RlmAnswer>;
  queryStream(input: string): AsyncGenerator<StreamToken>;

  addMemory(text: string, metadata?: object): Promise<string>;
  searchMemory(query: string, topK?: number): Promise<MemorySpan[]>;

  clearCache(): void;
  getCacheStats(): { size: number; entries: number };

  updateConfig(config: Partial<RlmConfig>): void;
  getConfig(): Required<RlmConfig>;
}

All Exports

import {
  // Core
  RuvLLM, RuvLLMConfig,

  // RLM
  RlmController, RlmConfig, RlmAnswer, MemorySpan, StreamToken,

  // Training
  RlmTrainer, ContrastiveTrainer, createRlmTrainer,
  DEFAULT_RLM_CONFIG, FAST_RLM_CONFIG, THOROUGH_RLM_CONFIG,

  // SONA Learning
  SonaCoordinator, TrajectoryBuilder,

  // LoRA
  LoraAdapter, LoraManager,

  // Benchmarks
  ModelComparisonBenchmark, RoutingBenchmark, EmbeddingBenchmark,
} from '@ruvector/ruvllm';

CLI

# Route a task
ruvllm route "add unit tests for auth module"
# → Agent: tester | Confidence: 0.96 | Tier: 2

# Query with streaming
ruvllm query --stream "Explain machine learning"

# Download models
ruvllm download ruv/ruvltra

# Run benchmarks
ruvllm bench ./models/model.gguf

# Evaluate (SWE-Bench)
ruvllm eval --model ./models/model.gguf --subset lite

Platform Support

PlatformArchitectureStatus
macOSarm64 (M1-M4)Full support
macOSx64Supported
Linuxx64Supported
Linuxarm64Supported
Windowsx64Supported
ResourceURL
npmnpmjs.com/package/@ruvector/ruvllm
HuggingFacehuggingface.co/ruv/ruvltra
Crate (Rust)crates.io/crates/ruvllm
Documentationdocs.rs/ruvllm
GitHubgithub.com/ruvnet/ruvector
Claude Flowgithub.com/ruvnet/claude-flow

License

MIT OR Apache-2.0

Built for Claude Code. Optimized for agents. Designed for speed.

Get Started | View on GitHub

Keywords

ruvllm

FAQs

Package last updated on 21 Jan 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts