
Research
/Security News
GlassWASM: WebAssembly Malware Found in Trojanized Open VSX Extensions
The trojanized extensions use TinyGo-compiled WebAssembly and Solana transaction memos to resolve command-and-control infrastructure.
@ruvector/ruvllm
Advanced tools
Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, FastGRNN routing, and SIMD inference
100% Routing Accuracy | Sub-Millisecond Inference | Self-Learning
Quick Start | RLM | Training | Models | API
@ruvector/ruvllm is a TypeScript/JavaScript SDK for intelligent LLM orchestration, specifically designed for Claude Code and multi-agent systems. It provides:
| Challenge | Traditional Approach | @ruvector/ruvllm Solution |
|---|---|---|
| Agent selection | Manual or keyword-based | Semantic + keyword hybrid = 100% |
| Complex queries | Single-shot RAG | Recursive decomposition + synthesis |
| Response latency | 2-5 seconds | <1ms cache, 50-200ms full |
| Learning | Static models | Self-improving (SONA) |
| Cost per route | $0.01+ (API call) | $0 (local inference) |
npm install @ruvector/ruvllm
import { RuvLLM, RlmController } from '@ruvector/ruvllm';
// Simple LLM inference
const llm = new RuvLLM({
modelPath: '~/.ruvllm/models/ruvltra-claude-code-0.5b-q4_k_m.gguf',
sonaEnabled: true,
});
const response = await llm.query('Explain quantum computing');
console.log(response.text);
// Recursive Language Model for complex queries
const rlm = new RlmController({ maxDepth: 5 });
const answer = await rlm.query('What are the causes AND solutions for slow API responses?');
// Automatically decomposes into sub-queries, retrieves context, synthesizes answer
Built by Claude Code, for Claude Code. Routes tasks to 60+ agent types:
import { RuvLLM } from '@ruvector/ruvllm';
const llm = new RuvLLM({ model: 'ruv/ruvltra' });
// Intelligent routing
const route = await llm.route('implement OAuth2 authentication');
console.log(route.agent); // 'security-architect'
console.log(route.confidence); // 0.98
console.log(route.tier); // 2 (Haiku-level complexity)
// Multi-agent teams for complex tasks
const team = await llm.routeComplex('build full-stack app with auth');
// Returns: [system-architect, backend-dev, coder, security-architect, tester]
┌─────────────────────────────────────────────────────────┐
│ User Request │
└─────────────────────┬───────────────────────────────────┘
↓
[RuvLTRA Routing]
↓
┌─────────────┼─────────────┐
↓ ↓ ↓
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Tier 1 │ │ Tier 2 │ │ Tier 3 │
│ Booster │ │ Haiku │ │ Opus │
│ <1ms │ │ ~500ms │ │ 2-5s │
│ $0 │ │ $0.0002 │ │ $0.015 │
└───────────┘ └───────────┘ └───────────┘
Every successful interaction improves the model:
// First routing: Full inference
llm.route('implement OAuth2') → security-architect (97%)
// Later: Pattern hit in <25μs (learned from success)
llm.route('add OAuth2 flow') → security-architect (99%, cached pattern)
RLM provides recursive query decomposition - unlike traditional RAG that retrieves once, RLM breaks complex questions into sub-queries and synthesizes coherent answers.
Query: "What are the causes AND solutions for slow API responses?"
↓
[Decomposition]
/ \
"Causes of slow API?" "Solutions for slow API?"
↓ ↓
[Sub-answers] [Sub-answers]
\ /
[Synthesis]
↓
Coherent combined answer with sources
import { RlmController } from '@ruvector/ruvllm';
const rlm = new RlmController({
maxDepth: 5,
retrievalTopK: 10,
enableCache: true,
});
// Add knowledge to memory
await rlm.addMemory('TypeScript adds static typing to JavaScript.');
await rlm.addMemory('React is a library for building user interfaces.');
// Query with recursive retrieval
const answer = await rlm.query('What are causes and solutions for type errors in React?');
console.log(answer.text); // Comprehensive synthesized answer
console.log(answer.sources); // Source attributions
console.log(answer.qualityScore); // 0.0-1.0
console.log(answer.confidence); // Routing confidence
for await (const event of rlm.queryStream('Explain machine learning')) {
if (event.type === 'token') {
process.stdout.write(event.text);
} else {
console.log('\n\nQuality:', event.answer.qualityScore);
}
}
const rlm = new RlmController({
enableReflection: true,
maxReflectionIterations: 2,
minQualityScore: 0.8,
});
// Answers are iteratively refined until quality >= 0.8
const answer = await rlm.query('Complex multi-part technical question...');
interface RlmConfig {
maxDepth?: number; // Max recursion depth (default: 3)
maxSubQueries?: number; // Max sub-queries per level (default: 5)
tokenBudget?: number; // Token budget (default: 4096)
enableCache?: boolean; // Enable caching (default: true)
cacheTtl?: number; // Cache TTL in ms (default: 300000)
retrievalTopK?: number; // Memory spans to retrieve (default: 10)
minQualityScore?: number; // Min quality threshold (default: 0.7)
enableReflection?: boolean; // Enable self-reflection (default: false)
maxReflectionIterations?: number; // Max reflection loops (default: 2)
}
Every successful routing is stored in HNSW-indexed memory for instant recall:
// First time: Full inference (~50ms)
route("implement OAuth2") → security-architect (97% confidence)
// Later: Memory hit (<25μs)
route("add OAuth2 flow") → security-architect (99% confidence, cached)
// Low confidence automatically escalates
Confidence > 0.9 → Use recommended agent
Confidence 0.7-0.9 → Use with human confirmation
Confidence < 0.7 → Escalate to higher tier
import { simd } from '@ruvector/ruvllm/simd';
// 4x faster vector operations with AVX2/NEON
const similarity = simd.batchCosineSimilarity(query, targets);
const attended = simd.flashAttention(q, k, v, scale);
Arc-based string interning for 100-1000x faster cache hits on large responses.
| Operation | Latency | Throughput |
|---|---|---|
| Query decomposition | 340 ns | 2.9M/s |
| Cache lookup | 23.5 ns | 42.5M/s |
| Embedding (384d) | 293 ns | 3.4M/s |
| Memory search (10k) | 0.4 ms | 2.5K/s |
| End-to-end routing | <1 ms | 1K+/s |
| Full RLM query | 50-200 ms | 5-20/s |
| Strategy | RuvLTRA | Qwen Base | OpenAI |
|---|---|---|---|
| Embedding Only | 45% | 40% | 52% |
| Keyword Only | 78% | 78% | N/A |
| Hybrid | 100% | 95% | N/A |
145 tests passing
- RLM Controller: 24 tests
- Routing Accuracy: 18 tests
- Contrastive Training: 15 tests
- SIMD Operations: 22 tests
- SONA Learning: 19 tests
- Memory/HNSW: 21 tests
- Benchmarks: 26 tests
URL: https://huggingface.co/ruv/ruvltra
| Model | Size | Purpose | Accuracy |
|---|---|---|---|
| ruvltra-claude-code-0.5b-q4_k_m | 398 MB | Agent routing | 100% (hybrid) |
| ruvltra-small-0.5b-q4_k_m | ~400 MB | Embeddings | - |
| ruvltra-medium-1.1b-q4_k_m | ~1 GB | Full inference | - |
// Programmatic
import { downloadModel } from '@ruvector/ruvllm';
await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' });
// CLI
ruvllm download ruv/ruvltra
Models are automatically downloaded on first use:
const llm = new RuvLLM({ model: 'ruv/ruvltra' });
// Downloads to ~/.ruvllm/models/ if not present
node scripts/training/routing-dataset.js
# Output: 381 examples, 793 contrastive pairs, 156 hard negatives
import { ContrastiveTrainer } from '@ruvector/ruvllm';
const trainer = new ContrastiveTrainer({
modelPath: './models/base.gguf',
loraRank: 8,
loraAlpha: 16,
learningRate: 1e-4,
});
const pairs = [
{ anchor: 'Fix auth bug', positive: 'coder', negative: 'researcher' },
// ... more pairs
];
await trainer.train(pairs, { epochs: 10 });
await trainer.save('./adapters/routing-lora');
| Script | Description |
|---|---|
routing-dataset.js | Generate 381 routing examples |
claude-code-synth.js | Synthetic data generation |
contrastive-finetune.js | LoRA fine-tuning pipeline |
rlm-dataset.js | RLM training data (500 examples) |
class RuvLLM {
constructor(config?: RuvLLMConfig);
query(prompt: string, params?: GenerateParams): Promise<Response>;
stream(prompt: string, params?: GenerateParams): AsyncIterable<string>;
route(task: string): Promise<RoutingResult>;
routeComplex(task: string): Promise<AgentTeam[]>;
loadModel(path: string): Promise<void>;
addMemory(text: string, metadata?: object): number;
searchMemory(query: string, topK?: number): MemoryResult[];
sonaStats(): SonaStats | null;
adapt(input: Float32Array, quality: number): void;
}
class RlmController {
constructor(config?: RlmConfig, engine?: RuvLLM);
query(input: string): Promise<RlmAnswer>;
queryStream(input: string): AsyncGenerator<StreamToken>;
addMemory(text: string, metadata?: object): Promise<string>;
searchMemory(query: string, topK?: number): Promise<MemorySpan[]>;
clearCache(): void;
getCacheStats(): { size: number; entries: number };
updateConfig(config: Partial<RlmConfig>): void;
getConfig(): Required<RlmConfig>;
}
import {
// Core
RuvLLM, RuvLLMConfig,
// RLM
RlmController, RlmConfig, RlmAnswer, MemorySpan, StreamToken,
// Training
RlmTrainer, ContrastiveTrainer, createRlmTrainer,
DEFAULT_RLM_CONFIG, FAST_RLM_CONFIG, THOROUGH_RLM_CONFIG,
// SONA Learning
SonaCoordinator, TrajectoryBuilder,
// LoRA
LoraAdapter, LoraManager,
// Benchmarks
ModelComparisonBenchmark, RoutingBenchmark, EmbeddingBenchmark,
} from '@ruvector/ruvllm';
# Route a task
ruvllm route "add unit tests for auth module"
# → Agent: tester | Confidence: 0.96 | Tier: 2
# Query with streaming
ruvllm query --stream "Explain machine learning"
# Download models
ruvllm download ruv/ruvltra
# Run benchmarks
ruvllm bench ./models/model.gguf
# Evaluate (SWE-Bench)
ruvllm eval --model ./models/model.gguf --subset lite
| Platform | Architecture | Status |
|---|---|---|
| macOS | arm64 (M1-M4) | Full support |
| macOS | x64 | Supported |
| Linux | x64 | Supported |
| Linux | arm64 | Supported |
| Windows | x64 | Supported |
| Resource | URL |
|---|---|
| npm | npmjs.com/package/@ruvector/ruvllm |
| HuggingFace | huggingface.co/ruv/ruvltra |
| Crate (Rust) | crates.io/crates/ruvllm |
| Documentation | docs.rs/ruvllm |
| GitHub | github.com/ruvnet/ruvector |
| Claude Flow | github.com/ruvnet/claude-flow |
MIT OR Apache-2.0
Built for Claude Code. Optimized for agents. Designed for speed.
FAQs
Self-learning LLM runtime — TurboQuant KV-cache (6-8x compression), SONA adaptive learning, FlashAttention, speculative decoding, GGUF inference
The npm package @ruvector/ruvllm receives a total of 126,085 weekly downloads. As such, @ruvector/ruvllm popularity was classified as popular.
We found that @ruvector/ruvllm demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
/Security News
The trojanized extensions use TinyGo-compiled WebAssembly and Solana transaction memos to resolve command-and-control infrastructure.

Security News
Anthropic says the directive cited national security concerns over a narrow jailbreak, but offered no specific technical details.

Security News
A network of 152 Chrome live wallpaper extensions hid ad tracking and made extension-driven traffic look like Google search clicks.