@ruvector/ruvllm
Advanced tools
+1
-1
| { | ||
| "name": "@ruvector/ruvllm", | ||
| "version": "2.4.0", | ||
| "version": "2.4.1", | ||
| "description": "Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, FastGRNN routing, and SIMD inference", | ||
@@ -5,0 +5,0 @@ "main": "dist/cjs/index.js", |
+261
-297
@@ -1,5 +0,41 @@ | ||
| # @ruvector/ruvllm v2.4 | ||
| <div align="center"> | ||
| Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, and SIMD inference for Node.js. | ||
| # @ruvector/ruvllm | ||
| ### The First Purpose-Built LLM Runtime for Claude Code Agent Orchestration | ||
| **100% Routing Accuracy | Sub-Millisecond Inference | Self-Learning** | ||
| [](https://www.npmjs.com/package/@ruvector/ruvllm) | ||
| [](https://www.npmjs.com/package/@ruvector/ruvllm) | ||
| [](LICENSE) | ||
| [](./test) | ||
| [Quick Start](#quick-start) | [RLM](#rlm-recursive-language-model) | [Training](#training) | [Models](#models) | [API](#api-reference) | ||
| </div> | ||
| --- | ||
| ## What is @ruvector/ruvllm? | ||
| **@ruvector/ruvllm** is a TypeScript/JavaScript SDK for intelligent LLM orchestration, specifically designed for **Claude Code** and multi-agent systems. It provides: | ||
| - **RLM (Recursive Language Model)** - Break complex queries into sub-queries, synthesize coherent answers | ||
| - **100% Routing Accuracy** - Hybrid keyword + embedding strategy for perfect agent selection | ||
| - **SONA Self-Learning** - Model improves with every successful interaction | ||
| - **SIMD Acceleration** - AVX2/NEON optimized inference | ||
| ### Why @ruvector/ruvllm? | ||
| | Challenge | Traditional Approach | @ruvector/ruvllm Solution | | ||
| |-----------|---------------------|---------------------------| | ||
| | Agent selection | Manual or keyword-based | Semantic + keyword hybrid = **100%** | | ||
| | Complex queries | Single-shot RAG | Recursive decomposition + synthesis | | ||
| | Response latency | 2-5 seconds | **<1ms** cache, 50-200ms full | | ||
| | Learning | Static models | **Self-improving** (SONA) | | ||
| | Cost per route | $0.01+ (API call) | **$0** (local inference) | | ||
| --- | ||
| ## Installation | ||
@@ -14,40 +50,96 @@ | ||
| ```typescript | ||
| import { RuvLLM, RuvLLMConfig } from '@ruvector/ruvllm'; | ||
| import { RuvLLM, RlmController } from '@ruvector/ruvllm'; | ||
| // Initialize with default configuration | ||
| const llm = new RuvLLM(); | ||
| // Or with custom configuration | ||
| // Simple LLM inference | ||
| const llm = new RuvLLM({ | ||
| modelPath: './models/ruvltra-small-q4km.gguf', | ||
| modelPath: '~/.ruvllm/models/ruvltra-claude-code-0.5b-q4_k_m.gguf', | ||
| sonaEnabled: true, | ||
| flashAttention: true, | ||
| maxTokens: 256, | ||
| }); | ||
| // Generate text | ||
| const response = await llm.query('Explain quantum computing'); | ||
| console.log(response.text); | ||
| // Stream generation | ||
| for await (const token of llm.stream('Write a haiku about Rust')) { | ||
| process.stdout.write(token); | ||
| } | ||
| // Recursive Language Model for complex queries | ||
| const rlm = new RlmController({ maxDepth: 5 }); | ||
| const answer = await rlm.query('What are the causes AND solutions for slow API responses?'); | ||
| // Automatically decomposes into sub-queries, retrieves context, synthesizes answer | ||
| ``` | ||
| ## What's New in v2.4 | ||
| --- | ||
| | Feature | Description | | ||
| |---------|-------------| | ||
| | **RLM (Recursive Language Model)** | Query decomposition with recursive retrieval and synthesis | | ||
| | **100% Routing Accuracy** | Hybrid keyword-first strategy achieves 100% on Claude Code tasks | | ||
| | **145 Tests Passing** | Comprehensive test coverage across all modules | | ||
| | **Contrastive Fine-tuning** | LoRA-based training with 793 contrastive pairs | | ||
| | **Training Scripts** | Generate routing datasets and fine-tune models | | ||
| | **HuggingFace Models** | Pre-trained RuvLTRA models available | | ||
| ## Core Features | ||
| ### 1. Claude Code Native Routing | ||
| Built **by** Claude Code, **for** Claude Code. Routes tasks to 60+ agent types: | ||
| ```typescript | ||
| import { RuvLLM } from '@ruvector/ruvllm'; | ||
| const llm = new RuvLLM({ model: 'ruv/ruvltra' }); | ||
| // Intelligent routing | ||
| const route = await llm.route('implement OAuth2 authentication'); | ||
| console.log(route.agent); // 'security-architect' | ||
| console.log(route.confidence); // 0.98 | ||
| console.log(route.tier); // 2 (Haiku-level complexity) | ||
| // Multi-agent teams for complex tasks | ||
| const team = await llm.routeComplex('build full-stack app with auth'); | ||
| // Returns: [system-architect, backend-dev, coder, security-architect, tester] | ||
| ``` | ||
| ### 2. 3-Tier Intelligent Routing | ||
| ``` | ||
| ┌─────────────────────────────────────────────────────────┐ | ||
| │ User Request │ | ||
| └─────────────────────┬───────────────────────────────────┘ | ||
| ↓ | ||
| [RuvLTRA Routing] | ||
| ↓ | ||
| ┌─────────────┼─────────────┐ | ||
| ↓ ↓ ↓ | ||
| ┌───────────┐ ┌───────────┐ ┌───────────┐ | ||
| │ Tier 1 │ │ Tier 2 │ │ Tier 3 │ | ||
| │ Booster │ │ Haiku │ │ Opus │ | ||
| │ <1ms │ │ ~500ms │ │ 2-5s │ | ||
| │ $0 │ │ $0.0002 │ │ $0.015 │ | ||
| └───────────┘ └───────────┘ └───────────┘ | ||
| ``` | ||
| ### 3. Self-Learning (SONA) | ||
| Every successful interaction improves the model: | ||
| ```typescript | ||
| // First routing: Full inference | ||
| llm.route('implement OAuth2') → security-architect (97%) | ||
| // Later: Pattern hit in <25μs (learned from success) | ||
| llm.route('add OAuth2 flow') → security-architect (99%, cached pattern) | ||
| ``` | ||
| --- | ||
| ## RLM (Recursive Language Model) | ||
| RLM provides recursive retrieval-augmented generation that breaks down complex queries into sub-queries and synthesizes answers from retrieved context. | ||
| RLM provides **recursive query decomposition** - unlike traditional RAG that retrieves once, RLM breaks complex questions into sub-queries and synthesizes coherent answers. | ||
| ### How It Works | ||
| ``` | ||
| Query: "What are the causes AND solutions for slow API responses?" | ||
| ↓ | ||
| [Decomposition] | ||
| / \ | ||
| "Causes of slow API?" "Solutions for slow API?" | ||
| ↓ ↓ | ||
| [Sub-answers] [Sub-answers] | ||
| \ / | ||
| [Synthesis] | ||
| ↓ | ||
| Coherent combined answer with sources | ||
| ``` | ||
| ### Basic Usage | ||
@@ -70,6 +162,6 @@ | ||
| const answer = await rlm.query('What are causes and solutions for type errors in React?'); | ||
| console.log(answer.text); | ||
| console.log('Sources:', answer.sources); | ||
| console.log('Quality Score:', answer.qualityScore); | ||
| console.log('Confidence:', answer.confidence); | ||
| console.log(answer.text); // Comprehensive synthesized answer | ||
| console.log(answer.sources); // Source attributions | ||
| console.log(answer.qualityScore); // 0.0-1.0 | ||
| console.log(answer.confidence); // Routing confidence | ||
| ``` | ||
@@ -80,6 +172,2 @@ | ||
| ```typescript | ||
| import { RlmController } from '@ruvector/ruvllm'; | ||
| const rlm = new RlmController(); | ||
| for await (const event of rlm.queryStream('Explain machine learning')) { | ||
@@ -94,7 +182,5 @@ if (event.type === 'token') { | ||
| ### With Reflection | ||
| ### With Self-Reflection | ||
| ```typescript | ||
| import { RlmController } from '@ruvector/ruvllm'; | ||
| const rlm = new RlmController({ | ||
@@ -106,3 +192,3 @@ enableReflection: true, | ||
| // Answers will be iteratively refined until quality >= 0.8 | ||
| // Answers are iteratively refined until quality >= 0.8 | ||
| const answer = await rlm.query('Complex multi-part technical question...'); | ||
@@ -127,54 +213,64 @@ ``` | ||
| ## Exports | ||
| --- | ||
| ```typescript | ||
| import { | ||
| // Core | ||
| RuvLLM, | ||
| RuvLLMConfig, | ||
| ## Unique Capabilities | ||
| // RLM - Recursive Language Model | ||
| RlmController, | ||
| RlmConfig, | ||
| RlmAnswer, | ||
| MemorySpan, | ||
| StreamToken, | ||
| ### 1. Memory-Augmented Routing | ||
| // RLM Training | ||
| RlmTrainer, | ||
| RlmTrainingConfig, | ||
| RlmTrainingExample, | ||
| createRlmTrainer, | ||
| DEFAULT_RLM_CONFIG, | ||
| FAST_RLM_CONFIG, | ||
| THOROUGH_RLM_CONFIG, | ||
| ROUTING_FOCUSED_CONFIG, | ||
| Every successful routing is stored in HNSW-indexed memory for instant recall: | ||
| // SONA Learning | ||
| SonaCoordinator, | ||
| TrajectoryBuilder, | ||
| ```typescript | ||
| // First time: Full inference (~50ms) | ||
| route("implement OAuth2") → security-architect (97% confidence) | ||
| // Federated Learning | ||
| EphemeralAgent, | ||
| FederatedCoordinator, | ||
| // Later: Memory hit (<25μs) | ||
| route("add OAuth2 flow") → security-architect (99% confidence, cached) | ||
| ``` | ||
| // LoRA Adapters | ||
| LoraAdapter, | ||
| LoraManager, | ||
| ### 2. Confidence-Aware Escalation | ||
| // Sessions | ||
| SessionManager, | ||
| ```typescript | ||
| // Low confidence automatically escalates | ||
| Confidence > 0.9 → Use recommended agent | ||
| Confidence 0.7-0.9 → Use with human confirmation | ||
| Confidence < 0.7 → Escalate to higher tier | ||
| ``` | ||
| // Contrastive Training | ||
| ContrastiveTrainer, | ||
| ### 3. Batch SIMD Operations | ||
| // Benchmarks | ||
| ModelComparisonBenchmark, | ||
| RoutingBenchmark, | ||
| EmbeddingBenchmark, | ||
| } from '@ruvector/ruvllm'; | ||
| ```typescript | ||
| import { simd } from '@ruvector/ruvllm/simd'; | ||
| // 4x faster vector operations with AVX2/NEON | ||
| const similarity = simd.batchCosineSimilarity(query, targets); | ||
| const attended = simd.flashAttention(q, k, v, scale); | ||
| ``` | ||
| ### 4. Zero-Copy Caching | ||
| Arc-based string interning for 100-1000x faster cache hits on large responses. | ||
| --- | ||
| ## Performance | ||
| ### Benchmarks (M4 Pro) | ||
| | Operation | Latency | Throughput | | ||
| |-----------|---------|------------| | ||
| | Query decomposition | 340 ns | 2.9M/s | | ||
| | Cache lookup | 23.5 ns | 42.5M/s | | ||
| | Embedding (384d) | 293 ns | 3.4M/s | | ||
| | Memory search (10k) | 0.4 ms | 2.5K/s | | ||
| | End-to-end routing | <1 ms | 1K+/s | | ||
| | Full RLM query | 50-200 ms | 5-20/s | | ||
| ### Routing Accuracy | ||
| | Strategy | RuvLTRA | Qwen Base | OpenAI | | ||
| |----------|---------|-----------|--------| | ||
| | Embedding Only | 45% | 40% | 52% | | ||
| | Keyword Only | 78% | 78% | N/A | | ||
| | **Hybrid** | **100%** | 95% | N/A | | ||
| ### Test Results | ||
@@ -193,55 +289,47 @@ | ||
| ### Routing Accuracy (Claude Code Tasks) | ||
| --- | ||
| | Strategy | RuvLTRA | Qwen Base | | ||
| |----------|---------|-----------| | ||
| | Embedding Only | 45% | 40% | | ||
| | Keyword-First (Hybrid) | **100%** | 95% | | ||
| ## Models | ||
| ### Inference Performance (M4 Pro) | ||
| ### HuggingFace Repository | ||
| | Operation | Performance | | ||
| |-----------|-------------| | ||
| | Inference | 88-135 tok/s | | ||
| | Flash Attention | 320us (seq=2048) | | ||
| | HNSW Search | 17-62us | | ||
| | SONA Adapt | <1ms | | ||
| | RLM Query | 50-200ms | | ||
| **URL**: [https://huggingface.co/ruv/ruvltra](https://huggingface.co/ruv/ruvltra) | ||
| ### SIMD Optimizations | ||
| ### Available Models | ||
| - AVX2/AVX-512 on x86_64 | ||
| - NEON on ARM64 | ||
| - 4-8x speedup on vector operations | ||
| | Model | Size | Purpose | Accuracy | | ||
| |-------|------|---------|----------| | ||
| | **ruvltra-claude-code-0.5b-q4_k_m** | 398 MB | Agent routing | **100%** (hybrid) | | ||
| | ruvltra-small-0.5b-q4_k_m | ~400 MB | Embeddings | - | | ||
| | ruvltra-medium-1.1b-q4_k_m | ~1 GB | Full inference | - | | ||
| ## CLI Usage | ||
| ### Download Models | ||
| ```bash | ||
| # Query a model | ||
| ruvllm query "What is machine learning?" | ||
| ```typescript | ||
| // Programmatic | ||
| import { downloadModel } from '@ruvector/ruvllm'; | ||
| await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' }); | ||
| # Stream output | ||
| ruvllm query --stream "Write a poem" | ||
| // CLI | ||
| ruvllm download ruv/ruvltra | ||
| ``` | ||
| # Download a model | ||
| ruvllm download ruvector/ruvltra-small-q4km | ||
| ### Auto-Download | ||
| # Benchmark | ||
| ruvllm bench ./models/model.gguf | ||
| Models are automatically downloaded on first use: | ||
| # Run evaluation (SWE-Bench) | ||
| ruvllm eval --model ./models/model.gguf --subset lite --max-tasks 50 | ||
| ```typescript | ||
| const llm = new RuvLLM({ model: 'ruv/ruvltra' }); | ||
| // Downloads to ~/.ruvllm/models/ if not present | ||
| ``` | ||
| --- | ||
| ## Training | ||
| ### Routing Dataset Generation | ||
| ### Generate Routing Dataset | ||
| Generate training data for agent routing: | ||
| ```bash | ||
| # Generate routing dataset | ||
| node scripts/training/routing-dataset.js | ||
| # Output: 381 examples, 793 contrastive pairs | ||
| # Output: 381 examples, 793 contrastive pairs, 156 hard negatives | ||
| ``` | ||
@@ -251,6 +339,4 @@ | ||
| Fine-tune models with LoRA adapters: | ||
| ```typescript | ||
| import { ContrastiveTrainer, ContrastivePair } from '@ruvector/ruvllm'; | ||
| import { ContrastiveTrainer } from '@ruvector/ruvllm'; | ||
@@ -264,13 +350,8 @@ const trainer = new ContrastiveTrainer({ | ||
| // Training pairs from routing dataset | ||
| const pairs: ContrastivePair[] = [ | ||
| { | ||
| anchor: 'Fix the authentication bug in login.ts', | ||
| positive: 'coder', | ||
| negative: 'researcher', | ||
| }, | ||
| const pairs = [ | ||
| { anchor: 'Fix auth bug', positive: 'coder', negative: 'researcher' }, | ||
| // ... more pairs | ||
| ]; | ||
| await trainer.train(pairs, { epochs: 10, batchSize: 32 }); | ||
| await trainer.train(pairs, { epochs: 10 }); | ||
| await trainer.save('./adapters/routing-lora'); | ||
@@ -281,4 +362,2 @@ ``` | ||
| Located in `scripts/training/`: | ||
| | Script | Description | | ||
@@ -289,47 +368,6 @@ |--------|-------------| | ||
| | `contrastive-finetune.js` | LoRA fine-tuning pipeline | | ||
| | `rlm-dataset.js` | RLM training data (500 examples) | | ||
| ## Model Links | ||
| --- | ||
| ### HuggingFace Repository | ||
| **URL**: [https://huggingface.co/ruv/ruvltra](https://huggingface.co/ruv/ruvltra) | ||
| ### Available Models | ||
| | Model | File | Size | Purpose | | ||
| |-------|------|------|---------| | ||
| | RuvLTRA Claude Code 0.5B | `ruvltra-claude-code-0.5b-q4_k_m.gguf` | ~400MB | Agent routing (100% with hybrid) | | ||
| | RuvLTRA Small 0.5B | `ruvltra-0.5b-q4_k_m.gguf` | ~400MB | General embeddings | | ||
| | RuvLTRA Medium 3B | `ruvltra-3b-q4_k_m.gguf` | ~2GB | Full LLM inference | | ||
| ### Download Models | ||
| ```bash | ||
| # Using CLI | ||
| ruvllm download ruv/ruvltra | ||
| # Using HuggingFace CLI | ||
| huggingface-cli download ruv/ruvltra ruvltra-claude-code-0.5b-q4_k_m.gguf | ||
| # Programmatic download | ||
| import { downloadModel } from '@ruvector/ruvllm'; | ||
| await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' }); | ||
| ``` | ||
| ### Environment Variables | ||
| ```bash | ||
| # HuggingFace authentication (any of these) | ||
| HF_TOKEN=hf_xxx | ||
| HUGGING_FACE_HUB_TOKEN=hf_xxx | ||
| HUGGINGFACE_API_KEY=hf_xxx | ||
| ``` | ||
| ### Local Model Storage | ||
| ```bash | ||
| ~/.ruvllm/models/ # Downloaded GGUF models | ||
| ~/.ruvllm/training/ # Training data and configs | ||
| ``` | ||
| ## API Reference | ||
@@ -343,19 +381,12 @@ | ||
| // Generate text | ||
| query(prompt: string, params?: GenerateParams): Promise<Response>; | ||
| // Stream generation | ||
| stream(prompt: string, params?: GenerateParams): AsyncIterable<string>; | ||
| route(task: string): Promise<RoutingResult>; | ||
| routeComplex(task: string): Promise<AgentTeam[]>; | ||
| // Load a model | ||
| loadModel(path: string): Promise<void>; | ||
| // Memory operations | ||
| addMemory(text: string, metadata?: Record<string, unknown>): number; | ||
| addMemory(text: string, metadata?: object): number; | ||
| searchMemory(query: string, topK?: number): MemoryResult[]; | ||
| // Get SONA learning stats | ||
| sonaStats(): SonaStats | null; | ||
| // Adapt on feedback | ||
| adapt(input: Float32Array, quality: number): void; | ||
@@ -371,17 +402,11 @@ } | ||
| // Query with recursive retrieval | ||
| query(input: string): Promise<RlmAnswer>; | ||
| // Stream query | ||
| queryStream(input: string): AsyncGenerator<StreamToken>; | ||
| // Memory management | ||
| addMemory(text: string, metadata?: Record<string, unknown>): Promise<string>; | ||
| addMemory(text: string, metadata?: object): Promise<string>; | ||
| searchMemory(query: string, topK?: number): Promise<MemorySpan[]>; | ||
| // Cache management | ||
| clearCache(): void; | ||
| getCacheStats(): { size: number; entries: number }; | ||
| // Configuration | ||
| updateConfig(config: Partial<RlmConfig>): void; | ||
@@ -392,124 +417,51 @@ getConfig(): Required<RlmConfig>; | ||
| ### Configuration | ||
| ### All Exports | ||
| ```typescript | ||
| interface RuvLLMConfig { | ||
| modelPath?: string; // Path to GGUF model | ||
| sonaEnabled?: boolean; // Enable SONA learning (default: true) | ||
| flashAttention?: boolean; // Use Flash Attention 2 (default: true) | ||
| maxTokens?: number; // Max generation tokens (default: 256) | ||
| temperature?: number; // Sampling temperature (default: 0.7) | ||
| topP?: number; // Top-p sampling (default: 0.9) | ||
| } | ||
| ``` | ||
| import { | ||
| // Core | ||
| RuvLLM, RuvLLMConfig, | ||
| ### Generate Parameters | ||
| // RLM | ||
| RlmController, RlmConfig, RlmAnswer, MemorySpan, StreamToken, | ||
| ```typescript | ||
| interface GenerateParams { | ||
| maxTokens?: number; | ||
| temperature?: number; | ||
| topP?: number; | ||
| topK?: number; | ||
| repetitionPenalty?: number; | ||
| stopSequences?: string[]; | ||
| } | ||
| ``` | ||
| // Training | ||
| RlmTrainer, ContrastiveTrainer, createRlmTrainer, | ||
| DEFAULT_RLM_CONFIG, FAST_RLM_CONFIG, THOROUGH_RLM_CONFIG, | ||
| ## SIMD Module | ||
| // SONA Learning | ||
| SonaCoordinator, TrajectoryBuilder, | ||
| For direct access to optimized SIMD kernels: | ||
| // LoRA | ||
| LoraAdapter, LoraManager, | ||
| ```typescript | ||
| import { simd } from '@ruvector/ruvllm/simd'; | ||
| // Dot product | ||
| const result = simd.dotProduct(vecA, vecB); | ||
| // Matrix multiplication | ||
| const output = simd.matmul(matrix, vector); | ||
| // Flash Attention | ||
| const attended = simd.flashAttention(query, key, value, scale); | ||
| // RMS Normalization | ||
| simd.rmsNorm(hidden, weights, epsilon); | ||
| // Benchmarks | ||
| ModelComparisonBenchmark, RoutingBenchmark, EmbeddingBenchmark, | ||
| } from '@ruvector/ruvllm'; | ||
| ``` | ||
| ## Evaluation Harness | ||
| --- | ||
| Run model evaluations with SWE-Bench integration: | ||
| ## CLI | ||
| ```typescript | ||
| import { RuvLLM, EvaluationHarness, AblationMode } from '@ruvector/ruvllm'; | ||
| ```bash | ||
| # Route a task | ||
| ruvllm route "add unit tests for auth module" | ||
| # → Agent: tester | Confidence: 0.96 | Tier: 2 | ||
| const harness = new EvaluationHarness({ | ||
| modelPath: './models/model.gguf', | ||
| enableHnsw: true, | ||
| enableSona: true, | ||
| }); | ||
| # Query with streaming | ||
| ruvllm query --stream "Explain machine learning" | ||
| // Run single evaluation | ||
| const result = await harness.evaluate( | ||
| 'Fix the null pointer exception', | ||
| 'def process(data): return data.split()', | ||
| AblationMode.Full | ||
| ); | ||
| # Download models | ||
| ruvllm download ruv/ruvltra | ||
| console.log(`Success: ${result.success}, Quality: ${result.qualityScore}`); | ||
| # Run benchmarks | ||
| ruvllm bench ./models/model.gguf | ||
| // Run ablation study (Baseline, RetrievalOnly, AdaptersOnly, R+A, Full) | ||
| const report = await harness.runAblationStudy(tasks); | ||
| for (const [mode, metrics] of Object.entries(report.modeMetrics)) { | ||
| console.log(`${mode}: ${metrics.successRate * 100}% success`); | ||
| } | ||
| # Evaluate (SWE-Bench) | ||
| ruvllm eval --model ./models/model.gguf --subset lite | ||
| ``` | ||
| ## mistral-rs Backend (Production Serving) | ||
| --- | ||
| For production deployments with 10-100+ concurrent users, use the mistral-rs backend: | ||
| ```typescript | ||
| import { RuvLLM, MistralBackend, PagedAttentionConfig } from '@ruvector/ruvllm'; | ||
| // Configure for production serving | ||
| const backend = new MistralBackend({ | ||
| // PagedAttention: 5-10x more concurrent users | ||
| pagedAttention: { | ||
| blockSize: 16, | ||
| maxBlocks: 4096, | ||
| gpuMemoryFraction: 0.9, | ||
| prefixCaching: true, | ||
| }, | ||
| // X-LoRA: Per-token adapter routing | ||
| xlora: { | ||
| adapters: ['./adapters/coder', './adapters/researcher'], | ||
| topK: 2, | ||
| }, | ||
| // ISQ: Runtime quantization | ||
| isq: { | ||
| bits: 4, | ||
| method: 'awq', | ||
| }, | ||
| }); | ||
| const llm = new RuvLLM({ backend }); | ||
| await llm.loadModel('mistralai/Mistral-7B-Instruct-v0.2'); | ||
| // Serve multiple concurrent requests | ||
| const response = await llm.query('Write production code'); | ||
| ``` | ||
| > **Note**: mistral-rs features require the Rust backend with `mistral-rs` feature enabled. Native bindings will use mistral-rs when available. | ||
| ## Supported Models | ||
| - **RuvLTRA-Small** (494M) - Q4K, Q5K, Q8 | ||
| - **RuvLTRA-Medium** (3B) - Q4K, Q5K, Q8 | ||
| - **Qwen 2.5** (0.5B-72B) | ||
| - **Llama 3.x** (8B-70B) | ||
| - **Mistral** (7B-22B) | ||
| - **Phi-3** (3.8B-14B) | ||
| - **Gemma-2** (2B-27B) | ||
| ## Platform Support | ||
@@ -525,17 +477,29 @@ | ||
| ## Related Packages | ||
| --- | ||
| - [@ruvector/core](https://www.npmjs.com/package/@ruvector/core) - Vector operations | ||
| - [@ruvector/sona](https://www.npmjs.com/package/@ruvector/sona) - SONA learning engine | ||
| - [@ruvector/ruvector](https://www.npmjs.com/package/@ruvector/ruvector) - Full Ruvector SDK | ||
| ## Links | ||
| - [GitHub Repository](https://github.com/ruvnet/ruvector) | ||
| - [HuggingFace Models](https://huggingface.co/ruv/ruvltra) | ||
| - [API Documentation](https://docs.rs/ruvllm) | ||
| - [Crate (Rust)](https://crates.io/crates/ruvllm) | ||
| | Resource | URL | | ||
| |----------|-----| | ||
| | **npm** | [npmjs.com/package/@ruvector/ruvllm](https://www.npmjs.com/package/@ruvector/ruvllm) | | ||
| | **HuggingFace** | [huggingface.co/ruv/ruvltra](https://huggingface.co/ruv/ruvltra) | | ||
| | **Crate (Rust)** | [crates.io/crates/ruvllm](https://crates.io/crates/ruvllm) | | ||
| | **Documentation** | [docs.rs/ruvllm](https://docs.rs/ruvllm) | | ||
| | **GitHub** | [github.com/ruvnet/ruvector](https://github.com/ruvnet/ruvector) | | ||
| | **Claude Flow** | [github.com/ruvnet/claude-flow](https://github.com/ruvnet/claude-flow) | | ||
| --- | ||
| ## License | ||
| MIT OR Apache-2.0 | ||
| --- | ||
| <div align="center"> | ||
| **Built for Claude Code. Optimized for agents. Designed for speed.** | ||
| [Get Started](#quick-start) | [View on GitHub](https://github.com/ruvnet/ruvector) | ||
| </div> |
URL strings
Supply chain riskPackage contains fragments of external URLs or IP addresses, which the package may be accessing at runtime.
Found 1 instance in 1 package
URL strings
Supply chain riskPackage contains fragments of external URLs or IP addresses, which the package may be accessing at runtime.
Found 1 instance in 1 package
2778098
0.03%491
-6.83%