Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

@ruvector/ruvllm

Package Overview
Dependencies
Maintainers
1
Versions
13
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@ruvector/ruvllm - npm Package Compare versions

Comparing version
2.4.0
to
2.4.1
+1
-1
package.json
{
"name": "@ruvector/ruvllm",
"version": "2.4.0",
"version": "2.4.1",
"description": "Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, FastGRNN routing, and SIMD inference",

@@ -5,0 +5,0 @@ "main": "dist/cjs/index.js",

+261
-297

@@ -1,5 +0,41 @@

# @ruvector/ruvllm v2.4
<div align="center">
Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, and SIMD inference for Node.js.
# @ruvector/ruvllm
### The First Purpose-Built LLM Runtime for Claude Code Agent Orchestration
**100% Routing Accuracy | Sub-Millisecond Inference | Self-Learning**
[![npm](https://img.shields.io/npm/v/@ruvector/ruvllm)](https://www.npmjs.com/package/@ruvector/ruvllm)
[![Downloads](https://img.shields.io/npm/dm/@ruvector/ruvllm)](https://www.npmjs.com/package/@ruvector/ruvllm)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue)](LICENSE)
[![Tests](https://img.shields.io/badge/tests-145%20passing-brightgreen)](./test)
[Quick Start](#quick-start) | [RLM](#rlm-recursive-language-model) | [Training](#training) | [Models](#models) | [API](#api-reference)
</div>
---
## What is @ruvector/ruvllm?
**@ruvector/ruvllm** is a TypeScript/JavaScript SDK for intelligent LLM orchestration, specifically designed for **Claude Code** and multi-agent systems. It provides:
- **RLM (Recursive Language Model)** - Break complex queries into sub-queries, synthesize coherent answers
- **100% Routing Accuracy** - Hybrid keyword + embedding strategy for perfect agent selection
- **SONA Self-Learning** - Model improves with every successful interaction
- **SIMD Acceleration** - AVX2/NEON optimized inference
### Why @ruvector/ruvllm?
| Challenge | Traditional Approach | @ruvector/ruvllm Solution |
|-----------|---------------------|---------------------------|
| Agent selection | Manual or keyword-based | Semantic + keyword hybrid = **100%** |
| Complex queries | Single-shot RAG | Recursive decomposition + synthesis |
| Response latency | 2-5 seconds | **<1ms** cache, 50-200ms full |
| Learning | Static models | **Self-improving** (SONA) |
| Cost per route | $0.01+ (API call) | **$0** (local inference) |
---
## Installation

@@ -14,40 +50,96 @@

```typescript
import { RuvLLM, RuvLLMConfig } from '@ruvector/ruvllm';
import { RuvLLM, RlmController } from '@ruvector/ruvllm';
// Initialize with default configuration
const llm = new RuvLLM();
// Or with custom configuration
// Simple LLM inference
const llm = new RuvLLM({
modelPath: './models/ruvltra-small-q4km.gguf',
modelPath: '~/.ruvllm/models/ruvltra-claude-code-0.5b-q4_k_m.gguf',
sonaEnabled: true,
flashAttention: true,
maxTokens: 256,
});
// Generate text
const response = await llm.query('Explain quantum computing');
console.log(response.text);
// Stream generation
for await (const token of llm.stream('Write a haiku about Rust')) {
process.stdout.write(token);
}
// Recursive Language Model for complex queries
const rlm = new RlmController({ maxDepth: 5 });
const answer = await rlm.query('What are the causes AND solutions for slow API responses?');
// Automatically decomposes into sub-queries, retrieves context, synthesizes answer
```
## What's New in v2.4
---
| Feature | Description |
|---------|-------------|
| **RLM (Recursive Language Model)** | Query decomposition with recursive retrieval and synthesis |
| **100% Routing Accuracy** | Hybrid keyword-first strategy achieves 100% on Claude Code tasks |
| **145 Tests Passing** | Comprehensive test coverage across all modules |
| **Contrastive Fine-tuning** | LoRA-based training with 793 contrastive pairs |
| **Training Scripts** | Generate routing datasets and fine-tune models |
| **HuggingFace Models** | Pre-trained RuvLTRA models available |
## Core Features
### 1. Claude Code Native Routing
Built **by** Claude Code, **for** Claude Code. Routes tasks to 60+ agent types:
```typescript
import { RuvLLM } from '@ruvector/ruvllm';
const llm = new RuvLLM({ model: 'ruv/ruvltra' });
// Intelligent routing
const route = await llm.route('implement OAuth2 authentication');
console.log(route.agent); // 'security-architect'
console.log(route.confidence); // 0.98
console.log(route.tier); // 2 (Haiku-level complexity)
// Multi-agent teams for complex tasks
const team = await llm.routeComplex('build full-stack app with auth');
// Returns: [system-architect, backend-dev, coder, security-architect, tester]
```
### 2. 3-Tier Intelligent Routing
```
┌─────────────────────────────────────────────────────────┐
│ User Request │
└─────────────────────┬───────────────────────────────────┘
[RuvLTRA Routing]
┌─────────────┼─────────────┐
↓ ↓ ↓
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Tier 1 │ │ Tier 2 │ │ Tier 3 │
│ Booster │ │ Haiku │ │ Opus │
│ <1ms │ │ ~500ms │ │ 2-5s │
│ $0 │ │ $0.0002 │ │ $0.015 │
└───────────┘ └───────────┘ └───────────┘
```
### 3. Self-Learning (SONA)
Every successful interaction improves the model:
```typescript
// First routing: Full inference
llm.route('implement OAuth2') → security-architect (97%)
// Later: Pattern hit in <25μs (learned from success)
llm.route('add OAuth2 flow') → security-architect (99%, cached pattern)
```
---
## RLM (Recursive Language Model)
RLM provides recursive retrieval-augmented generation that breaks down complex queries into sub-queries and synthesizes answers from retrieved context.
RLM provides **recursive query decomposition** - unlike traditional RAG that retrieves once, RLM breaks complex questions into sub-queries and synthesizes coherent answers.
### How It Works
```
Query: "What are the causes AND solutions for slow API responses?"
[Decomposition]
/ \
"Causes of slow API?" "Solutions for slow API?"
↓ ↓
[Sub-answers] [Sub-answers]
\ /
[Synthesis]
Coherent combined answer with sources
```
### Basic Usage

@@ -70,6 +162,6 @@

const answer = await rlm.query('What are causes and solutions for type errors in React?');
console.log(answer.text);
console.log('Sources:', answer.sources);
console.log('Quality Score:', answer.qualityScore);
console.log('Confidence:', answer.confidence);
console.log(answer.text); // Comprehensive synthesized answer
console.log(answer.sources); // Source attributions
console.log(answer.qualityScore); // 0.0-1.0
console.log(answer.confidence); // Routing confidence
```

@@ -80,6 +172,2 @@

```typescript
import { RlmController } from '@ruvector/ruvllm';
const rlm = new RlmController();
for await (const event of rlm.queryStream('Explain machine learning')) {

@@ -94,7 +182,5 @@ if (event.type === 'token') {

### With Reflection
### With Self-Reflection
```typescript
import { RlmController } from '@ruvector/ruvllm';
const rlm = new RlmController({

@@ -106,3 +192,3 @@ enableReflection: true,

// Answers will be iteratively refined until quality >= 0.8
// Answers are iteratively refined until quality >= 0.8
const answer = await rlm.query('Complex multi-part technical question...');

@@ -127,54 +213,64 @@ ```

## Exports
---
```typescript
import {
// Core
RuvLLM,
RuvLLMConfig,
## Unique Capabilities
// RLM - Recursive Language Model
RlmController,
RlmConfig,
RlmAnswer,
MemorySpan,
StreamToken,
### 1. Memory-Augmented Routing
// RLM Training
RlmTrainer,
RlmTrainingConfig,
RlmTrainingExample,
createRlmTrainer,
DEFAULT_RLM_CONFIG,
FAST_RLM_CONFIG,
THOROUGH_RLM_CONFIG,
ROUTING_FOCUSED_CONFIG,
Every successful routing is stored in HNSW-indexed memory for instant recall:
// SONA Learning
SonaCoordinator,
TrajectoryBuilder,
```typescript
// First time: Full inference (~50ms)
route("implement OAuth2") → security-architect (97% confidence)
// Federated Learning
EphemeralAgent,
FederatedCoordinator,
// Later: Memory hit (<25μs)
route("add OAuth2 flow") → security-architect (99% confidence, cached)
```
// LoRA Adapters
LoraAdapter,
LoraManager,
### 2. Confidence-Aware Escalation
// Sessions
SessionManager,
```typescript
// Low confidence automatically escalates
Confidence > 0.9 → Use recommended agent
Confidence 0.7-0.9 → Use with human confirmation
Confidence < 0.7 → Escalate to higher tier
```
// Contrastive Training
ContrastiveTrainer,
### 3. Batch SIMD Operations
// Benchmarks
ModelComparisonBenchmark,
RoutingBenchmark,
EmbeddingBenchmark,
} from '@ruvector/ruvllm';
```typescript
import { simd } from '@ruvector/ruvllm/simd';
// 4x faster vector operations with AVX2/NEON
const similarity = simd.batchCosineSimilarity(query, targets);
const attended = simd.flashAttention(q, k, v, scale);
```
### 4. Zero-Copy Caching
Arc-based string interning for 100-1000x faster cache hits on large responses.
---
## Performance
### Benchmarks (M4 Pro)
| Operation | Latency | Throughput |
|-----------|---------|------------|
| Query decomposition | 340 ns | 2.9M/s |
| Cache lookup | 23.5 ns | 42.5M/s |
| Embedding (384d) | 293 ns | 3.4M/s |
| Memory search (10k) | 0.4 ms | 2.5K/s |
| End-to-end routing | <1 ms | 1K+/s |
| Full RLM query | 50-200 ms | 5-20/s |
### Routing Accuracy
| Strategy | RuvLTRA | Qwen Base | OpenAI |
|----------|---------|-----------|--------|
| Embedding Only | 45% | 40% | 52% |
| Keyword Only | 78% | 78% | N/A |
| **Hybrid** | **100%** | 95% | N/A |
### Test Results

@@ -193,55 +289,47 @@

### Routing Accuracy (Claude Code Tasks)
---
| Strategy | RuvLTRA | Qwen Base |
|----------|---------|-----------|
| Embedding Only | 45% | 40% |
| Keyword-First (Hybrid) | **100%** | 95% |
## Models
### Inference Performance (M4 Pro)
### HuggingFace Repository
| Operation | Performance |
|-----------|-------------|
| Inference | 88-135 tok/s |
| Flash Attention | 320us (seq=2048) |
| HNSW Search | 17-62us |
| SONA Adapt | <1ms |
| RLM Query | 50-200ms |
**URL**: [https://huggingface.co/ruv/ruvltra](https://huggingface.co/ruv/ruvltra)
### SIMD Optimizations
### Available Models
- AVX2/AVX-512 on x86_64
- NEON on ARM64
- 4-8x speedup on vector operations
| Model | Size | Purpose | Accuracy |
|-------|------|---------|----------|
| **ruvltra-claude-code-0.5b-q4_k_m** | 398 MB | Agent routing | **100%** (hybrid) |
| ruvltra-small-0.5b-q4_k_m | ~400 MB | Embeddings | - |
| ruvltra-medium-1.1b-q4_k_m | ~1 GB | Full inference | - |
## CLI Usage
### Download Models
```bash
# Query a model
ruvllm query "What is machine learning?"
```typescript
// Programmatic
import { downloadModel } from '@ruvector/ruvllm';
await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' });
# Stream output
ruvllm query --stream "Write a poem"
// CLI
ruvllm download ruv/ruvltra
```
# Download a model
ruvllm download ruvector/ruvltra-small-q4km
### Auto-Download
# Benchmark
ruvllm bench ./models/model.gguf
Models are automatically downloaded on first use:
# Run evaluation (SWE-Bench)
ruvllm eval --model ./models/model.gguf --subset lite --max-tasks 50
```typescript
const llm = new RuvLLM({ model: 'ruv/ruvltra' });
// Downloads to ~/.ruvllm/models/ if not present
```
---
## Training
### Routing Dataset Generation
### Generate Routing Dataset
Generate training data for agent routing:
```bash
# Generate routing dataset
node scripts/training/routing-dataset.js
# Output: 381 examples, 793 contrastive pairs
# Output: 381 examples, 793 contrastive pairs, 156 hard negatives
```

@@ -251,6 +339,4 @@

Fine-tune models with LoRA adapters:
```typescript
import { ContrastiveTrainer, ContrastivePair } from '@ruvector/ruvllm';
import { ContrastiveTrainer } from '@ruvector/ruvllm';

@@ -264,13 +350,8 @@ const trainer = new ContrastiveTrainer({

// Training pairs from routing dataset
const pairs: ContrastivePair[] = [
{
anchor: 'Fix the authentication bug in login.ts',
positive: 'coder',
negative: 'researcher',
},
const pairs = [
{ anchor: 'Fix auth bug', positive: 'coder', negative: 'researcher' },
// ... more pairs
];
await trainer.train(pairs, { epochs: 10, batchSize: 32 });
await trainer.train(pairs, { epochs: 10 });
await trainer.save('./adapters/routing-lora');

@@ -281,4 +362,2 @@ ```

Located in `scripts/training/`:
| Script | Description |

@@ -289,47 +368,6 @@ |--------|-------------|

| `contrastive-finetune.js` | LoRA fine-tuning pipeline |
| `rlm-dataset.js` | RLM training data (500 examples) |
## Model Links
---
### HuggingFace Repository
**URL**: [https://huggingface.co/ruv/ruvltra](https://huggingface.co/ruv/ruvltra)
### Available Models
| Model | File | Size | Purpose |
|-------|------|------|---------|
| RuvLTRA Claude Code 0.5B | `ruvltra-claude-code-0.5b-q4_k_m.gguf` | ~400MB | Agent routing (100% with hybrid) |
| RuvLTRA Small 0.5B | `ruvltra-0.5b-q4_k_m.gguf` | ~400MB | General embeddings |
| RuvLTRA Medium 3B | `ruvltra-3b-q4_k_m.gguf` | ~2GB | Full LLM inference |
### Download Models
```bash
# Using CLI
ruvllm download ruv/ruvltra
# Using HuggingFace CLI
huggingface-cli download ruv/ruvltra ruvltra-claude-code-0.5b-q4_k_m.gguf
# Programmatic download
import { downloadModel } from '@ruvector/ruvllm';
await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' });
```
### Environment Variables
```bash
# HuggingFace authentication (any of these)
HF_TOKEN=hf_xxx
HUGGING_FACE_HUB_TOKEN=hf_xxx
HUGGINGFACE_API_KEY=hf_xxx
```
### Local Model Storage
```bash
~/.ruvllm/models/ # Downloaded GGUF models
~/.ruvllm/training/ # Training data and configs
```
## API Reference

@@ -343,19 +381,12 @@

// Generate text
query(prompt: string, params?: GenerateParams): Promise<Response>;
// Stream generation
stream(prompt: string, params?: GenerateParams): AsyncIterable<string>;
route(task: string): Promise<RoutingResult>;
routeComplex(task: string): Promise<AgentTeam[]>;
// Load a model
loadModel(path: string): Promise<void>;
// Memory operations
addMemory(text: string, metadata?: Record<string, unknown>): number;
addMemory(text: string, metadata?: object): number;
searchMemory(query: string, topK?: number): MemoryResult[];
// Get SONA learning stats
sonaStats(): SonaStats | null;
// Adapt on feedback
adapt(input: Float32Array, quality: number): void;

@@ -371,17 +402,11 @@ }

// Query with recursive retrieval
query(input: string): Promise<RlmAnswer>;
// Stream query
queryStream(input: string): AsyncGenerator<StreamToken>;
// Memory management
addMemory(text: string, metadata?: Record<string, unknown>): Promise<string>;
addMemory(text: string, metadata?: object): Promise<string>;
searchMemory(query: string, topK?: number): Promise<MemorySpan[]>;
// Cache management
clearCache(): void;
getCacheStats(): { size: number; entries: number };
// Configuration
updateConfig(config: Partial<RlmConfig>): void;

@@ -392,124 +417,51 @@ getConfig(): Required<RlmConfig>;

### Configuration
### All Exports
```typescript
interface RuvLLMConfig {
modelPath?: string; // Path to GGUF model
sonaEnabled?: boolean; // Enable SONA learning (default: true)
flashAttention?: boolean; // Use Flash Attention 2 (default: true)
maxTokens?: number; // Max generation tokens (default: 256)
temperature?: number; // Sampling temperature (default: 0.7)
topP?: number; // Top-p sampling (default: 0.9)
}
```
import {
// Core
RuvLLM, RuvLLMConfig,
### Generate Parameters
// RLM
RlmController, RlmConfig, RlmAnswer, MemorySpan, StreamToken,
```typescript
interface GenerateParams {
maxTokens?: number;
temperature?: number;
topP?: number;
topK?: number;
repetitionPenalty?: number;
stopSequences?: string[];
}
```
// Training
RlmTrainer, ContrastiveTrainer, createRlmTrainer,
DEFAULT_RLM_CONFIG, FAST_RLM_CONFIG, THOROUGH_RLM_CONFIG,
## SIMD Module
// SONA Learning
SonaCoordinator, TrajectoryBuilder,
For direct access to optimized SIMD kernels:
// LoRA
LoraAdapter, LoraManager,
```typescript
import { simd } from '@ruvector/ruvllm/simd';
// Dot product
const result = simd.dotProduct(vecA, vecB);
// Matrix multiplication
const output = simd.matmul(matrix, vector);
// Flash Attention
const attended = simd.flashAttention(query, key, value, scale);
// RMS Normalization
simd.rmsNorm(hidden, weights, epsilon);
// Benchmarks
ModelComparisonBenchmark, RoutingBenchmark, EmbeddingBenchmark,
} from '@ruvector/ruvllm';
```
## Evaluation Harness
---
Run model evaluations with SWE-Bench integration:
## CLI
```typescript
import { RuvLLM, EvaluationHarness, AblationMode } from '@ruvector/ruvllm';
```bash
# Route a task
ruvllm route "add unit tests for auth module"
# → Agent: tester | Confidence: 0.96 | Tier: 2
const harness = new EvaluationHarness({
modelPath: './models/model.gguf',
enableHnsw: true,
enableSona: true,
});
# Query with streaming
ruvllm query --stream "Explain machine learning"
// Run single evaluation
const result = await harness.evaluate(
'Fix the null pointer exception',
'def process(data): return data.split()',
AblationMode.Full
);
# Download models
ruvllm download ruv/ruvltra
console.log(`Success: ${result.success}, Quality: ${result.qualityScore}`);
# Run benchmarks
ruvllm bench ./models/model.gguf
// Run ablation study (Baseline, RetrievalOnly, AdaptersOnly, R+A, Full)
const report = await harness.runAblationStudy(tasks);
for (const [mode, metrics] of Object.entries(report.modeMetrics)) {
console.log(`${mode}: ${metrics.successRate * 100}% success`);
}
# Evaluate (SWE-Bench)
ruvllm eval --model ./models/model.gguf --subset lite
```
## mistral-rs Backend (Production Serving)
---
For production deployments with 10-100+ concurrent users, use the mistral-rs backend:
```typescript
import { RuvLLM, MistralBackend, PagedAttentionConfig } from '@ruvector/ruvllm';
// Configure for production serving
const backend = new MistralBackend({
// PagedAttention: 5-10x more concurrent users
pagedAttention: {
blockSize: 16,
maxBlocks: 4096,
gpuMemoryFraction: 0.9,
prefixCaching: true,
},
// X-LoRA: Per-token adapter routing
xlora: {
adapters: ['./adapters/coder', './adapters/researcher'],
topK: 2,
},
// ISQ: Runtime quantization
isq: {
bits: 4,
method: 'awq',
},
});
const llm = new RuvLLM({ backend });
await llm.loadModel('mistralai/Mistral-7B-Instruct-v0.2');
// Serve multiple concurrent requests
const response = await llm.query('Write production code');
```
> **Note**: mistral-rs features require the Rust backend with `mistral-rs` feature enabled. Native bindings will use mistral-rs when available.
## Supported Models
- **RuvLTRA-Small** (494M) - Q4K, Q5K, Q8
- **RuvLTRA-Medium** (3B) - Q4K, Q5K, Q8
- **Qwen 2.5** (0.5B-72B)
- **Llama 3.x** (8B-70B)
- **Mistral** (7B-22B)
- **Phi-3** (3.8B-14B)
- **Gemma-2** (2B-27B)
## Platform Support

@@ -525,17 +477,29 @@

## Related Packages
---
- [@ruvector/core](https://www.npmjs.com/package/@ruvector/core) - Vector operations
- [@ruvector/sona](https://www.npmjs.com/package/@ruvector/sona) - SONA learning engine
- [@ruvector/ruvector](https://www.npmjs.com/package/@ruvector/ruvector) - Full Ruvector SDK
## Links
- [GitHub Repository](https://github.com/ruvnet/ruvector)
- [HuggingFace Models](https://huggingface.co/ruv/ruvltra)
- [API Documentation](https://docs.rs/ruvllm)
- [Crate (Rust)](https://crates.io/crates/ruvllm)
| Resource | URL |
|----------|-----|
| **npm** | [npmjs.com/package/@ruvector/ruvllm](https://www.npmjs.com/package/@ruvector/ruvllm) |
| **HuggingFace** | [huggingface.co/ruv/ruvltra](https://huggingface.co/ruv/ruvltra) |
| **Crate (Rust)** | [crates.io/crates/ruvllm](https://crates.io/crates/ruvllm) |
| **Documentation** | [docs.rs/ruvllm](https://docs.rs/ruvllm) |
| **GitHub** | [github.com/ruvnet/ruvector](https://github.com/ruvnet/ruvector) |
| **Claude Flow** | [github.com/ruvnet/claude-flow](https://github.com/ruvnet/claude-flow) |
---
## License
MIT OR Apache-2.0
---
<div align="center">
**Built for Claude Code. Optimized for agents. Designed for speed.**
[Get Started](#quick-start) | [View on GitHub](https://github.com/ruvnet/ruvector)
</div>