
Security News
US Government Forces Anthropic to Pull Claude Fable Days After Launch
Anthropic says the directive cited national security concerns over a narrow jailbreak, but offered no specific technical details.
@ruvector/ruvllm
Advanced tools
Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, FastGRNN routing, and SIMD inference
Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, and SIMD inference for Node.js.
npm install @ruvector/ruvllm
import { RuvLLM, RuvLLMConfig } from '@ruvector/ruvllm';
// Initialize with default configuration
const llm = new RuvLLM();
// Or with custom configuration
const llm = new RuvLLM({
modelPath: './models/ruvltra-small-q4km.gguf',
sonaEnabled: true,
flashAttention: true,
maxTokens: 256,
});
// Generate text
const response = await llm.query('Explain quantum computing');
console.log(response.text);
// Stream generation
for await (const token of llm.stream('Write a haiku about Rust')) {
process.stdout.write(token);
}
| Feature | Description |
|---|---|
| RLM (Recursive Language Model) | Query decomposition with recursive retrieval and synthesis |
| 100% Routing Accuracy | Hybrid keyword-first strategy achieves 100% on Claude Code tasks |
| 145 Tests Passing | Comprehensive test coverage across all modules |
| Contrastive Fine-tuning | LoRA-based training with 793 contrastive pairs |
| Training Scripts | Generate routing datasets and fine-tune models |
| HuggingFace Models | Pre-trained RuvLTRA models available |
RLM provides recursive retrieval-augmented generation that breaks down complex queries into sub-queries and synthesizes answers from retrieved context.
import { RlmController } from '@ruvector/ruvllm';
const rlm = new RlmController({
maxDepth: 5,
retrievalTopK: 10,
enableCache: true,
});
// Add knowledge to memory
await rlm.addMemory('TypeScript adds static typing to JavaScript.');
await rlm.addMemory('React is a library for building user interfaces.');
// Query with recursive retrieval
const answer = await rlm.query('What are causes and solutions for type errors in React?');
console.log(answer.text);
console.log('Sources:', answer.sources);
console.log('Quality Score:', answer.qualityScore);
console.log('Confidence:', answer.confidence);
import { RlmController } from '@ruvector/ruvllm';
const rlm = new RlmController();
for await (const event of rlm.queryStream('Explain machine learning')) {
if (event.type === 'token') {
process.stdout.write(event.text);
} else {
console.log('\n\nQuality:', event.answer.qualityScore);
}
}
import { RlmController } from '@ruvector/ruvllm';
const rlm = new RlmController({
enableReflection: true,
maxReflectionIterations: 2,
minQualityScore: 0.8,
});
// Answers will be iteratively refined until quality >= 0.8
const answer = await rlm.query('Complex multi-part technical question...');
interface RlmConfig {
maxDepth?: number; // Max recursion depth (default: 3)
maxSubQueries?: number; // Max sub-queries per level (default: 5)
tokenBudget?: number; // Token budget (default: 4096)
enableCache?: boolean; // Enable caching (default: true)
cacheTtl?: number; // Cache TTL in ms (default: 300000)
retrievalTopK?: number; // Memory spans to retrieve (default: 10)
minQualityScore?: number; // Min quality threshold (default: 0.7)
enableReflection?: boolean; // Enable self-reflection (default: false)
maxReflectionIterations?: number; // Max reflection loops (default: 2)
}
import {
// Core
RuvLLM,
RuvLLMConfig,
// RLM - Recursive Language Model
RlmController,
RlmConfig,
RlmAnswer,
MemorySpan,
StreamToken,
// RLM Training
RlmTrainer,
RlmTrainingConfig,
RlmTrainingExample,
createRlmTrainer,
DEFAULT_RLM_CONFIG,
FAST_RLM_CONFIG,
THOROUGH_RLM_CONFIG,
ROUTING_FOCUSED_CONFIG,
// SONA Learning
SonaCoordinator,
TrajectoryBuilder,
// Federated Learning
EphemeralAgent,
FederatedCoordinator,
// LoRA Adapters
LoraAdapter,
LoraManager,
// Sessions
SessionManager,
// Contrastive Training
ContrastiveTrainer,
// Benchmarks
ModelComparisonBenchmark,
RoutingBenchmark,
EmbeddingBenchmark,
} from '@ruvector/ruvllm';
145 tests passing
- RLM Controller: 24 tests
- Routing Accuracy: 18 tests
- Contrastive Training: 15 tests
- SIMD Operations: 22 tests
- SONA Learning: 19 tests
- Memory/HNSW: 21 tests
- Benchmarks: 26 tests
| Strategy | RuvLTRA | Qwen Base |
|---|---|---|
| Embedding Only | 45% | 40% |
| Keyword-First (Hybrid) | 100% | 95% |
| Operation | Performance |
|---|---|
| Inference | 88-135 tok/s |
| Flash Attention | 320us (seq=2048) |
| HNSW Search | 17-62us |
| SONA Adapt | <1ms |
| RLM Query | 50-200ms |
# Query a model
ruvllm query "What is machine learning?"
# Stream output
ruvllm query --stream "Write a poem"
# Download a model
ruvllm download ruvector/ruvltra-small-q4km
# Benchmark
ruvllm bench ./models/model.gguf
# Run evaluation (SWE-Bench)
ruvllm eval --model ./models/model.gguf --subset lite --max-tasks 50
Generate training data for agent routing:
# Generate routing dataset
node scripts/training/routing-dataset.js
# Output: 381 examples, 793 contrastive pairs
Fine-tune models with LoRA adapters:
import { ContrastiveTrainer, ContrastivePair } from '@ruvector/ruvllm';
const trainer = new ContrastiveTrainer({
modelPath: './models/base.gguf',
loraRank: 8,
loraAlpha: 16,
learningRate: 1e-4,
});
// Training pairs from routing dataset
const pairs: ContrastivePair[] = [
{
anchor: 'Fix the authentication bug in login.ts',
positive: 'coder',
negative: 'researcher',
},
// ... more pairs
];
await trainer.train(pairs, { epochs: 10, batchSize: 32 });
await trainer.save('./adapters/routing-lora');
Located in scripts/training/:
| Script | Description |
|---|---|
routing-dataset.js | Generate 381 routing examples |
claude-code-synth.js | Synthetic data generation |
contrastive-finetune.js | LoRA fine-tuning pipeline |
URL: https://huggingface.co/ruv/ruvltra
| Model | File | Size | Purpose |
|---|---|---|---|
| RuvLTRA Claude Code 0.5B | ruvltra-claude-code-0.5b-q4_k_m.gguf | ~400MB | Agent routing (100% with hybrid) |
| RuvLTRA Small 0.5B | ruvltra-0.5b-q4_k_m.gguf | ~400MB | General embeddings |
| RuvLTRA Medium 3B | ruvltra-3b-q4_k_m.gguf | ~2GB | Full LLM inference |
# Using CLI
ruvllm download ruv/ruvltra
# Using HuggingFace CLI
huggingface-cli download ruv/ruvltra ruvltra-claude-code-0.5b-q4_k_m.gguf
# Programmatic download
import { downloadModel } from '@ruvector/ruvllm';
await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' });
# HuggingFace authentication (any of these)
HF_TOKEN=hf_xxx
HUGGING_FACE_HUB_TOKEN=hf_xxx
HUGGINGFACE_API_KEY=hf_xxx
~/.ruvllm/models/ # Downloaded GGUF models
~/.ruvllm/training/ # Training data and configs
class RuvLLM {
constructor(config?: RuvLLMConfig);
// Generate text
query(prompt: string, params?: GenerateParams): Promise<Response>;
// Stream generation
stream(prompt: string, params?: GenerateParams): AsyncIterable<string>;
// Load a model
loadModel(path: string): Promise<void>;
// Memory operations
addMemory(text: string, metadata?: Record<string, unknown>): number;
searchMemory(query: string, topK?: number): MemoryResult[];
// Get SONA learning stats
sonaStats(): SonaStats | null;
// Adapt on feedback
adapt(input: Float32Array, quality: number): void;
}
class RlmController {
constructor(config?: RlmConfig, engine?: RuvLLM);
// Query with recursive retrieval
query(input: string): Promise<RlmAnswer>;
// Stream query
queryStream(input: string): AsyncGenerator<StreamToken>;
// Memory management
addMemory(text: string, metadata?: Record<string, unknown>): Promise<string>;
searchMemory(query: string, topK?: number): Promise<MemorySpan[]>;
// Cache management
clearCache(): void;
getCacheStats(): { size: number; entries: number };
// Configuration
updateConfig(config: Partial<RlmConfig>): void;
getConfig(): Required<RlmConfig>;
}
interface RuvLLMConfig {
modelPath?: string; // Path to GGUF model
sonaEnabled?: boolean; // Enable SONA learning (default: true)
flashAttention?: boolean; // Use Flash Attention 2 (default: true)
maxTokens?: number; // Max generation tokens (default: 256)
temperature?: number; // Sampling temperature (default: 0.7)
topP?: number; // Top-p sampling (default: 0.9)
}
interface GenerateParams {
maxTokens?: number;
temperature?: number;
topP?: number;
topK?: number;
repetitionPenalty?: number;
stopSequences?: string[];
}
For direct access to optimized SIMD kernels:
import { simd } from '@ruvector/ruvllm/simd';
// Dot product
const result = simd.dotProduct(vecA, vecB);
// Matrix multiplication
const output = simd.matmul(matrix, vector);
// Flash Attention
const attended = simd.flashAttention(query, key, value, scale);
// RMS Normalization
simd.rmsNorm(hidden, weights, epsilon);
Run model evaluations with SWE-Bench integration:
import { RuvLLM, EvaluationHarness, AblationMode } from '@ruvector/ruvllm';
const harness = new EvaluationHarness({
modelPath: './models/model.gguf',
enableHnsw: true,
enableSona: true,
});
// Run single evaluation
const result = await harness.evaluate(
'Fix the null pointer exception',
'def process(data): return data.split()',
AblationMode.Full
);
console.log(`Success: ${result.success}, Quality: ${result.qualityScore}`);
// Run ablation study (Baseline, RetrievalOnly, AdaptersOnly, R+A, Full)
const report = await harness.runAblationStudy(tasks);
for (const [mode, metrics] of Object.entries(report.modeMetrics)) {
console.log(`${mode}: ${metrics.successRate * 100}% success`);
}
For production deployments with 10-100+ concurrent users, use the mistral-rs backend:
import { RuvLLM, MistralBackend, PagedAttentionConfig } from '@ruvector/ruvllm';
// Configure for production serving
const backend = new MistralBackend({
// PagedAttention: 5-10x more concurrent users
pagedAttention: {
blockSize: 16,
maxBlocks: 4096,
gpuMemoryFraction: 0.9,
prefixCaching: true,
},
// X-LoRA: Per-token adapter routing
xlora: {
adapters: ['./adapters/coder', './adapters/researcher'],
topK: 2,
},
// ISQ: Runtime quantization
isq: {
bits: 4,
method: 'awq',
},
});
const llm = new RuvLLM({ backend });
await llm.loadModel('mistralai/Mistral-7B-Instruct-v0.2');
// Serve multiple concurrent requests
const response = await llm.query('Write production code');
Note: mistral-rs features require the Rust backend with
mistral-rsfeature enabled. Native bindings will use mistral-rs when available.
| Platform | Architecture | Status |
|---|---|---|
| macOS | arm64 (M1-M4) | Full support |
| macOS | x64 | Supported |
| Linux | x64 | Supported |
| Linux | arm64 | Supported |
| Windows | x64 | Supported |
MIT OR Apache-2.0
FAQs
Self-learning LLM runtime — TurboQuant KV-cache (6-8x compression), SONA adaptive learning, FlashAttention, speculative decoding, GGUF inference
The npm package @ruvector/ruvllm receives a total of 126,085 weekly downloads. As such, @ruvector/ruvllm popularity was classified as popular.
We found that @ruvector/ruvllm demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Anthropic says the directive cited national security concerns over a narrow jailbreak, but offered no specific technical details.

Security News
A network of 152 Chrome live wallpaper extensions hid ad tracking and made extension-driven traffic look like Google search clicks.

Company News
Socket’s first CISO brings deep experience securing high-growth SaaS companies as open source supply chain threats accelerate.