@ruvector/ruvllm - npm Package Compare versions

Comparing version

2.4.0

2.4.1

+1

-1

package.json

		{
		"name": "@ruvector/ruvllm",
		"version": "2.4.0",
		"version": "2.4.1",
		"description": "Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, FastGRNN routing, and SIMD inference",
		@@ -5,0 +5,0 @@ "main": "dist/cjs/index.js",

+261

-297

README.md

		@@ -1,5 +0,41 @@
		# @ruvector/ruvllm v2.4
		<div align="center">

		Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, and SIMD inference for Node.js.
		# @ruvector/ruvllm

		### The First Purpose-Built LLM Runtime for Claude Code Agent Orchestration

		100% Routing Accuracy \| Sub-Millisecond Inference \| Self-Learning

		[![npm](https://img.shields.io/npm/v/@ruvector/ruvllm)](https://www.npmjs.com/package/@ruvector/ruvllm)
		[![Downloads](https://img.shields.io/npm/dm/@ruvector/ruvllm)](https://www.npmjs.com/package/@ruvector/ruvllm)
		[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue)](LICENSE)
		[![Tests](https://img.shields.io/badge/tests-145%20passing-brightgreen)](./test)

		[Quick Start](#quick-start) \| [RLM](#rlm-recursive-language-model) \| [Training](#training) \| [Models](#models) \| [API](#api-reference)

		</div>

		---

		## What is @ruvector/ruvllm?

		@ruvector/ruvllm is a TypeScript/JavaScript SDK for intelligent LLM orchestration, specifically designed for Claude Code and multi-agent systems. It provides:

		- RLM (Recursive Language Model) - Break complex queries into sub-queries, synthesize coherent answers
		- 100% Routing Accuracy - Hybrid keyword + embedding strategy for perfect agent selection
		- SONA Self-Learning - Model improves with every successful interaction
		- SIMD Acceleration - AVX2/NEON optimized inference

		### Why @ruvector/ruvllm?

		\| Challenge \| Traditional Approach \| @ruvector/ruvllm Solution \|
		\|-----------\|---------------------\|---------------------------\|
		\| Agent selection \| Manual or keyword-based \| Semantic + keyword hybrid = 100% \|
		\| Complex queries \| Single-shot RAG \| Recursive decomposition + synthesis \|
		\| Response latency \| 2-5 seconds \| <1ms cache, 50-200ms full \|
		\| Learning \| Static models \| Self-improving (SONA) \|
		\| Cost per route \| $0.01+ (API call) \| $0 (local inference) \|

		---

		## Installation
		@@ -14,40 +50,96 @@
		```typescript
		import { RuvLLM, RuvLLMConfig } from '@ruvector/ruvllm';
		import { RuvLLM, RlmController } from '@ruvector/ruvllm';

		// Initialize with default configuration
		const llm = new RuvLLM();

		// Or with custom configuration
		// Simple LLM inference
		const llm = new RuvLLM({
		modelPath: './models/ruvltra-small-q4km.gguf',
		modelPath: '~/.ruvllm/models/ruvltra-claude-code-0.5b-q4_k_m.gguf',
		sonaEnabled: true,
		flashAttention: true,
		maxTokens: 256,
		});

		// Generate text
		const response = await llm.query('Explain quantum computing');
		console.log(response.text);

		// Stream generation
		for await (const token of llm.stream('Write a haiku about Rust')) {
		process.stdout.write(token);
		}
		// Recursive Language Model for complex queries
		const rlm = new RlmController({ maxDepth: 5 });
		const answer = await rlm.query('What are the causes AND solutions for slow API responses?');
		// Automatically decomposes into sub-queries, retrieves context, synthesizes answer
		```

		## What's New in v2.4
		---

		\| Feature \| Description \|
		\|---------\|-------------\|
		\| RLM (Recursive Language Model) \| Query decomposition with recursive retrieval and synthesis \|
		\| 100% Routing Accuracy \| Hybrid keyword-first strategy achieves 100% on Claude Code tasks \|
		\| 145 Tests Passing \| Comprehensive test coverage across all modules \|
		\| Contrastive Fine-tuning \| LoRA-based training with 793 contrastive pairs \|
		\| Training Scripts \| Generate routing datasets and fine-tune models \|
		\| HuggingFace Models \| Pre-trained RuvLTRA models available \|
		## Core Features

		### 1. Claude Code Native Routing

		Built by Claude Code, for Claude Code. Routes tasks to 60+ agent types:

		```typescript
		import { RuvLLM } from '@ruvector/ruvllm';

		const llm = new RuvLLM({ model: 'ruv/ruvltra' });

		// Intelligent routing
		const route = await llm.route('implement OAuth2 authentication');
		console.log(route.agent); // 'security-architect'
		console.log(route.confidence); // 0.98
		console.log(route.tier); // 2 (Haiku-level complexity)

		// Multi-agent teams for complex tasks
		const team = await llm.routeComplex('build full-stack app with auth');
		// Returns: [system-architect, backend-dev, coder, security-architect, tester]
		```

		### 2. 3-Tier Intelligent Routing

		```
		┌─────────────────────────────────────────────────────────┐
		│ User Request │
		└─────────────────────┬───────────────────────────────────┘
		↓
		[RuvLTRA Routing]
		↓
		┌─────────────┼─────────────┐
		↓ ↓ ↓
		┌───────────┐ ┌───────────┐ ┌───────────┐
		│ Tier 1 │ │ Tier 2 │ │ Tier 3 │
		│ Booster │ │ Haiku │ │ Opus │
		│ <1ms │ │ ~500ms │ │ 2-5s │
		│ $0 │ │ $0.0002 │ │ $0.015 │
		└───────────┘ └───────────┘ └───────────┘
		```

		### 3. Self-Learning (SONA)

		Every successful interaction improves the model:

		```typescript
		// First routing: Full inference
		llm.route('implement OAuth2') → security-architect (97%)

		// Later: Pattern hit in <25μs (learned from success)
		llm.route('add OAuth2 flow') → security-architect (99%, cached pattern)
		```

		---

		## RLM (Recursive Language Model)

		RLM provides recursive retrieval-augmented generation that breaks down complex queries into sub-queries and synthesizes answers from retrieved context.
		RLM provides recursive query decomposition - unlike traditional RAG that retrieves once, RLM breaks complex questions into sub-queries and synthesizes coherent answers.

		### How It Works

		```
		Query: "What are the causes AND solutions for slow API responses?"
		↓
		[Decomposition]
		/ \
		"Causes of slow API?" "Solutions for slow API?"
		↓ ↓
		[Sub-answers] [Sub-answers]
		\ /
		[Synthesis]
		↓
		Coherent combined answer with sources
		```

		### Basic Usage
		@@ -70,6 +162,6 @@
		const answer = await rlm.query('What are causes and solutions for type errors in React?');
		console.log(answer.text);
		console.log('Sources:', answer.sources);
		console.log('Quality Score:', answer.qualityScore);
		console.log('Confidence:', answer.confidence);
		console.log(answer.text); // Comprehensive synthesized answer
		console.log(answer.sources); // Source attributions
		console.log(answer.qualityScore); // 0.0-1.0
		console.log(answer.confidence); // Routing confidence
		```
		@@ -80,6 +172,2 @@
		```typescript
		import { RlmController } from '@ruvector/ruvllm';

		const rlm = new RlmController();

		for await (const event of rlm.queryStream('Explain machine learning')) {
		@@ -94,7 +182,5 @@ if (event.type === 'token') {

		### With Reflection
		### With Self-Reflection

		```typescript
		import { RlmController } from '@ruvector/ruvllm';

		const rlm = new RlmController({
		@@ -106,3 +192,3 @@ enableReflection: true,

		// Answers will be iteratively refined until quality >= 0.8
		// Answers are iteratively refined until quality >= 0.8
		const answer = await rlm.query('Complex multi-part technical question...');
		@@ -127,54 +213,64 @@ ```

		## Exports
		---

		```typescript
		import {
		// Core
		RuvLLM,
		RuvLLMConfig,
		## Unique Capabilities

		// RLM - Recursive Language Model
		RlmController,
		RlmConfig,
		RlmAnswer,
		MemorySpan,
		StreamToken,
		### 1. Memory-Augmented Routing

		// RLM Training
		RlmTrainer,
		RlmTrainingConfig,
		RlmTrainingExample,
		createRlmTrainer,
		DEFAULT_RLM_CONFIG,
		FAST_RLM_CONFIG,
		THOROUGH_RLM_CONFIG,
		ROUTING_FOCUSED_CONFIG,
		Every successful routing is stored in HNSW-indexed memory for instant recall:

		// SONA Learning
		SonaCoordinator,
		TrajectoryBuilder,
		```typescript
		// First time: Full inference (~50ms)
		route("implement OAuth2") → security-architect (97% confidence)

		// Federated Learning
		EphemeralAgent,
		FederatedCoordinator,
		// Later: Memory hit (<25μs)
		route("add OAuth2 flow") → security-architect (99% confidence, cached)
		```

		// LoRA Adapters
		LoraAdapter,
		LoraManager,
		### 2. Confidence-Aware Escalation

		// Sessions
		SessionManager,
		```typescript
		// Low confidence automatically escalates
		Confidence > 0.9 → Use recommended agent
		Confidence 0.7-0.9 → Use with human confirmation
		Confidence < 0.7 → Escalate to higher tier
		```

		// Contrastive Training
		ContrastiveTrainer,
		### 3. Batch SIMD Operations

		// Benchmarks
		ModelComparisonBenchmark,
		RoutingBenchmark,
		EmbeddingBenchmark,
		} from '@ruvector/ruvllm';
		```typescript
		import { simd } from '@ruvector/ruvllm/simd';

		// 4x faster vector operations with AVX2/NEON
		const similarity = simd.batchCosineSimilarity(query, targets);
		const attended = simd.flashAttention(q, k, v, scale);
		```

		### 4. Zero-Copy Caching

		Arc-based string interning for 100-1000x faster cache hits on large responses.

		---

		## Performance

		### Benchmarks (M4 Pro)

		\| Operation \| Latency \| Throughput \|
		\|-----------\|---------\|------------\|
		\| Query decomposition \| 340 ns \| 2.9M/s \|
		\| Cache lookup \| 23.5 ns \| 42.5M/s \|
		\| Embedding (384d) \| 293 ns \| 3.4M/s \|
		\| Memory search (10k) \| 0.4 ms \| 2.5K/s \|
		\| End-to-end routing \| <1 ms \| 1K+/s \|
		\| Full RLM query \| 50-200 ms \| 5-20/s \|

		### Routing Accuracy

		\| Strategy \| RuvLTRA \| Qwen Base \| OpenAI \|
		\|----------\|---------\|-----------\|--------\|
		\| Embedding Only \| 45% \| 40% \| 52% \|
		\| Keyword Only \| 78% \| 78% \| N/A \|
		\| Hybrid \| 100% \| 95% \| N/A \|

		### Test Results
		@@ -193,55 +289,47 @@

		### Routing Accuracy (Claude Code Tasks)
		---

		\| Strategy \| RuvLTRA \| Qwen Base \|
		\|----------\|---------\|-----------\|
		\| Embedding Only \| 45% \| 40% \|
		\| Keyword-First (Hybrid) \| 100% \| 95% \|
		## Models

		### Inference Performance (M4 Pro)
		### HuggingFace Repository

		\| Operation \| Performance \|
		\|-----------\|-------------\|
		\| Inference \| 88-135 tok/s \|
		\| Flash Attention \| 320us (seq=2048) \|
		\| HNSW Search \| 17-62us \|
		\| SONA Adapt \| <1ms \|
		\| RLM Query \| 50-200ms \|
		URL: [https://huggingface.co/ruv/ruvltra](https://huggingface.co/ruv/ruvltra)

		### SIMD Optimizations
		### Available Models

		- AVX2/AVX-512 on x86_64
		- NEON on ARM64
		- 4-8x speedup on vector operations
		\| Model \| Size \| Purpose \| Accuracy \|
		\|-------\|------\|---------\|----------\|
		\| ruvltra-claude-code-0.5b-q4_k_m \| 398 MB \| Agent routing \| 100% (hybrid) \|
		\| ruvltra-small-0.5b-q4_k_m \| ~400 MB \| Embeddings \| - \|
		\| ruvltra-medium-1.1b-q4_k_m \| ~1 GB \| Full inference \| - \|

		## CLI Usage
		### Download Models

		```bash
		# Query a model
		ruvllm query "What is machine learning?"
		```typescript
		// Programmatic
		import { downloadModel } from '@ruvector/ruvllm';
		await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' });

		# Stream output
		ruvllm query --stream "Write a poem"
		// CLI
		ruvllm download ruv/ruvltra
		```

		# Download a model
		ruvllm download ruvector/ruvltra-small-q4km
		### Auto-Download

		# Benchmark
		ruvllm bench ./models/model.gguf
		Models are automatically downloaded on first use:

		# Run evaluation (SWE-Bench)
		ruvllm eval --model ./models/model.gguf --subset lite --max-tasks 50
		```typescript
		const llm = new RuvLLM({ model: 'ruv/ruvltra' });
		// Downloads to ~/.ruvllm/models/ if not present
		```

		---

		## Training

		### Routing Dataset Generation
		### Generate Routing Dataset

		Generate training data for agent routing:

		```bash
		# Generate routing dataset
		node scripts/training/routing-dataset.js

		# Output: 381 examples, 793 contrastive pairs
		# Output: 381 examples, 793 contrastive pairs, 156 hard negatives
		```
		@@ -251,6 +339,4 @@

		Fine-tune models with LoRA adapters:

		```typescript
		import { ContrastiveTrainer, ContrastivePair } from '@ruvector/ruvllm';
		import { ContrastiveTrainer } from '@ruvector/ruvllm';

		@@ -264,13 +350,8 @@ const trainer = new ContrastiveTrainer({

		// Training pairs from routing dataset
		const pairs: ContrastivePair[] = [
		{
		anchor: 'Fix the authentication bug in login.ts',
		positive: 'coder',
		negative: 'researcher',
		},
		const pairs = [
		{ anchor: 'Fix auth bug', positive: 'coder', negative: 'researcher' },
		// ... more pairs
		];

		await trainer.train(pairs, { epochs: 10, batchSize: 32 });
		await trainer.train(pairs, { epochs: 10 });
		await trainer.save('./adapters/routing-lora');
		@@ -281,4 +362,2 @@ ```

		Located in `scripts/training/`:

		\| Script \| Description \|
		@@ -289,47 +368,6 @@ \|--------\|-------------\|
		\| `contrastive-finetune.js` \| LoRA fine-tuning pipeline \|
		\| `rlm-dataset.js` \| RLM training data (500 examples) \|

		## Model Links
		---

		### HuggingFace Repository

		URL: [https://huggingface.co/ruv/ruvltra](https://huggingface.co/ruv/ruvltra)

		### Available Models

		\| Model \| File \| Size \| Purpose \|
		\|-------\|------\|------\|---------\|
		\| RuvLTRA Claude Code 0.5B \| `ruvltra-claude-code-0.5b-q4_k_m.gguf` \| ~400MB \| Agent routing (100% with hybrid) \|
		\| RuvLTRA Small 0.5B \| `ruvltra-0.5b-q4_k_m.gguf` \| ~400MB \| General embeddings \|
		\| RuvLTRA Medium 3B \| `ruvltra-3b-q4_k_m.gguf` \| ~2GB \| Full LLM inference \|

		### Download Models

		```bash
		# Using CLI
		ruvllm download ruv/ruvltra

		# Using HuggingFace CLI
		huggingface-cli download ruv/ruvltra ruvltra-claude-code-0.5b-q4_k_m.gguf

		# Programmatic download
		import { downloadModel } from '@ruvector/ruvllm';
		await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' });
		```

		### Environment Variables

		```bash
		# HuggingFace authentication (any of these)
		HF_TOKEN=hf_xxx
		HUGGING_FACE_HUB_TOKEN=hf_xxx
		HUGGINGFACE_API_KEY=hf_xxx
		```

		### Local Model Storage

		```bash
		~/.ruvllm/models/ # Downloaded GGUF models
		~/.ruvllm/training/ # Training data and configs
		```

		## API Reference
		@@ -343,19 +381,12 @@

		// Generate text
		query(prompt: string, params?: GenerateParams): Promise<Response>;

		// Stream generation
		stream(prompt: string, params?: GenerateParams): AsyncIterable<string>;
		route(task: string): Promise<RoutingResult>;
		routeComplex(task: string): Promise<AgentTeam[]>;

		// Load a model
		loadModel(path: string): Promise<void>;

		// Memory operations
		addMemory(text: string, metadata?: Record<string, unknown>): number;
		addMemory(text: string, metadata?: object): number;
		searchMemory(query: string, topK?: number): MemoryResult[];

		// Get SONA learning stats
		sonaStats(): SonaStats \| null;

		// Adapt on feedback
		adapt(input: Float32Array, quality: number): void;
		@@ -371,17 +402,11 @@ }

		// Query with recursive retrieval
		query(input: string): Promise<RlmAnswer>;

		// Stream query
		queryStream(input: string): AsyncGenerator<StreamToken>;

		// Memory management
		addMemory(text: string, metadata?: Record<string, unknown>): Promise<string>;
		addMemory(text: string, metadata?: object): Promise<string>;
		searchMemory(query: string, topK?: number): Promise<MemorySpan[]>;

		// Cache management
		clearCache(): void;
		getCacheStats(): { size: number; entries: number };

		// Configuration
		updateConfig(config: Partial<RlmConfig>): void;
		@@ -392,124 +417,51 @@ getConfig(): Required<RlmConfig>;

		### Configuration
		### All Exports

		```typescript
		interface RuvLLMConfig {
		modelPath?: string; // Path to GGUF model
		sonaEnabled?: boolean; // Enable SONA learning (default: true)
		flashAttention?: boolean; // Use Flash Attention 2 (default: true)
		maxTokens?: number; // Max generation tokens (default: 256)
		temperature?: number; // Sampling temperature (default: 0.7)
		topP?: number; // Top-p sampling (default: 0.9)
		}
		```
		import {
		// Core
		RuvLLM, RuvLLMConfig,

		### Generate Parameters
		// RLM
		RlmController, RlmConfig, RlmAnswer, MemorySpan, StreamToken,

		```typescript
		interface GenerateParams {
		maxTokens?: number;
		temperature?: number;
		topP?: number;
		topK?: number;
		repetitionPenalty?: number;
		stopSequences?: string[];
		}
		```
		// Training
		RlmTrainer, ContrastiveTrainer, createRlmTrainer,
		DEFAULT_RLM_CONFIG, FAST_RLM_CONFIG, THOROUGH_RLM_CONFIG,

		## SIMD Module
		// SONA Learning
		SonaCoordinator, TrajectoryBuilder,

		For direct access to optimized SIMD kernels:
		// LoRA
		LoraAdapter, LoraManager,

		```typescript
		import { simd } from '@ruvector/ruvllm/simd';

		// Dot product
		const result = simd.dotProduct(vecA, vecB);

		// Matrix multiplication
		const output = simd.matmul(matrix, vector);

		// Flash Attention
		const attended = simd.flashAttention(query, key, value, scale);

		// RMS Normalization
		simd.rmsNorm(hidden, weights, epsilon);
		// Benchmarks
		ModelComparisonBenchmark, RoutingBenchmark, EmbeddingBenchmark,
		} from '@ruvector/ruvllm';
		```

		## Evaluation Harness
		---

		Run model evaluations with SWE-Bench integration:
		## CLI

		```typescript
		import { RuvLLM, EvaluationHarness, AblationMode } from '@ruvector/ruvllm';
		```bash
		# Route a task
		ruvllm route "add unit tests for auth module"
		# → Agent: tester \| Confidence: 0.96 \| Tier: 2

		const harness = new EvaluationHarness({
		modelPath: './models/model.gguf',
		enableHnsw: true,
		enableSona: true,
		});
		# Query with streaming
		ruvllm query --stream "Explain machine learning"

		// Run single evaluation
		const result = await harness.evaluate(
		'Fix the null pointer exception',
		'def process(data): return data.split()',
		AblationMode.Full
		);
		# Download models
		ruvllm download ruv/ruvltra

		console.log(`Success: ${result.success}, Quality: ${result.qualityScore}`);
		# Run benchmarks
		ruvllm bench ./models/model.gguf

		// Run ablation study (Baseline, RetrievalOnly, AdaptersOnly, R+A, Full)
		const report = await harness.runAblationStudy(tasks);
		for (const [mode, metrics] of Object.entries(report.modeMetrics)) {
		console.log(`${mode}: ${metrics.successRate * 100}% success`);
		}
		# Evaluate (SWE-Bench)
		ruvllm eval --model ./models/model.gguf --subset lite
		```

		## mistral-rs Backend (Production Serving)
		---

		For production deployments with 10-100+ concurrent users, use the mistral-rs backend:

		```typescript
		import { RuvLLM, MistralBackend, PagedAttentionConfig } from '@ruvector/ruvllm';

		// Configure for production serving
		const backend = new MistralBackend({
		// PagedAttention: 5-10x more concurrent users
		pagedAttention: {
		blockSize: 16,
		maxBlocks: 4096,
		gpuMemoryFraction: 0.9,
		prefixCaching: true,
		},
		// X-LoRA: Per-token adapter routing
		xlora: {
		adapters: ['./adapters/coder', './adapters/researcher'],
		topK: 2,
		},
		// ISQ: Runtime quantization
		isq: {
		bits: 4,
		method: 'awq',
		},
		});

		const llm = new RuvLLM({ backend });
		await llm.loadModel('mistralai/Mistral-7B-Instruct-v0.2');

		// Serve multiple concurrent requests
		const response = await llm.query('Write production code');
		```

		> Note: mistral-rs features require the Rust backend with `mistral-rs` feature enabled. Native bindings will use mistral-rs when available.

		## Supported Models

		- RuvLTRA-Small (494M) - Q4K, Q5K, Q8
		- RuvLTRA-Medium (3B) - Q4K, Q5K, Q8
		- Qwen 2.5 (0.5B-72B)
		- Llama 3.x (8B-70B)
		- Mistral (7B-22B)
		- Phi-3 (3.8B-14B)
		- Gemma-2 (2B-27B)

		## Platform Support
		@@ -525,17 +477,29 @@

		## Related Packages
		---

		- [@ruvector/core](https://www.npmjs.com/package/@ruvector/core) - Vector operations
		- [@ruvector/sona](https://www.npmjs.com/package/@ruvector/sona) - SONA learning engine
		- [@ruvector/ruvector](https://www.npmjs.com/package/@ruvector/ruvector) - Full Ruvector SDK

		## Links

		- [GitHub Repository](https://github.com/ruvnet/ruvector)
		- [HuggingFace Models](https://huggingface.co/ruv/ruvltra)
		- [API Documentation](https://docs.rs/ruvllm)
		- [Crate (Rust)](https://crates.io/crates/ruvllm)
		\| Resource \| URL \|
		\|----------\|-----\|
		\| npm \| [npmjs.com/package/@ruvector/ruvllm](https://www.npmjs.com/package/@ruvector/ruvllm) \|
		\| HuggingFace \| [huggingface.co/ruv/ruvltra](https://huggingface.co/ruv/ruvltra) \|
		\| Crate (Rust) \| [crates.io/crates/ruvllm](https://crates.io/crates/ruvllm) \|
		\| Documentation \| [docs.rs/ruvllm](https://docs.rs/ruvllm) \|
		\| GitHub \| [github.com/ruvnet/ruvector](https://github.com/ruvnet/ruvector) \|
		\| Claude Flow \| [github.com/ruvnet/claude-flow](https://github.com/ruvnet/claude-flow) \|

		---

		## License

		MIT OR Apache-2.0

		---

		<div align="center">

		Built for Claude Code. Optimized for agents. Designed for speed.

		[Get Started](#quick-start) \| [View on GitHub](https://github.com/ruvnet/ruvector)

		</div>

@ruvector/ruvllm - npm Package Compare versions

New alerts

Fixed alerts

Improved metrics

Worsened metrics