@ruvector/ruvllm
Advanced tools
+14
-3
| { | ||
| "name": "@ruvector/ruvllm", | ||
| "version": "2.5.3", | ||
| "description": "Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, FastGRNN routing, and SIMD inference", | ||
| "version": "2.5.4", | ||
| "description": "Self-learning LLM runtime — TurboQuant KV-cache (6-8x compression), SONA adaptive learning, FlashAttention, speculative decoding, GGUF inference", | ||
| "main": "dist/cjs/index.js", | ||
@@ -95,3 +95,14 @@ "module": "dist/esm/index.js", | ||
| "rust", | ||
| "ruvector" | ||
| "ruvector", | ||
| "turboquant", | ||
| "kv-cache", | ||
| "quantization", | ||
| "flash-attention", | ||
| "speculative-decoding", | ||
| "gguf", | ||
| "mamba", | ||
| "transformer", | ||
| "edge-ai", | ||
| "local-llm", | ||
| "model-compression" | ||
| ], | ||
@@ -98,0 +109,0 @@ "author": "rUv Team <team@ruv.io>", |
+37
-5
@@ -1,5 +0,12 @@ | ||
| # @ruvector/ruvllm v2.3 | ||
| # @ruvector/ruvllm | ||
| Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, and SIMD inference for Node.js. | ||
| [](https://www.npmjs.com/package/@ruvector/ruvllm) | ||
| [](https://www.npmjs.com/package/@ruvector/ruvllm) | ||
| [](https://opensource.org/licenses/MIT) | ||
| [](https://github.com/ruvnet/ruvector) | ||
| **Self-learning LLM runtime for Node.js** — GGUF inference, TurboQuant KV-cache compression (6-8x memory savings), SONA adaptive learning, FlashAttention, speculative decoding, and SIMD-optimized kernels. Built in Rust, runs everywhere. | ||
| > Inference at **88-135 tok/s** on M4 Pro | **<1ms** SONA adaptation | **6-8x** KV-cache compression via TurboQuant | ||
| ## Installation | ||
@@ -37,6 +44,10 @@ | ||
| ## What's New in v2.3 | ||
| ## What's New in v2.5 | ||
| | Feature | Description | | ||
| |---------|-------------| | ||
| | **TurboQuant KV-Cache** | 2-4 bit asymmetric quantization with per-channel scale/zero-point — 6-8x memory reduction, <0.5% perplexity loss | | ||
| | **TurboQuant Embedding Store** | Quantized vector storage with compressed search — 10-30x memory savings | | ||
| | **H2O / PyramidKV Eviction** | Intelligent cache eviction policies for long-context inference | | ||
| | **Optimized Inner Product** | Asymmetric distance on quantized data — skip decompression for 2-4x faster search | | ||
| | **RuvLTRA Models** | Purpose-built 0.5B & 3B models for Claude Flow | | ||
@@ -48,5 +59,26 @@ | **Task-Specific LoRA** | 5 pre-trained adapters (coder, researcher, security, architect, reviewer) | | ||
| | **Evaluation Harness** | SWE-Bench testing with 5 ablation modes | | ||
| | **Auto-Dimension** | HNSW auto-detects model embedding size | | ||
| | **mistral-rs Backend** | Production serving with PagedAttention, X-LoRA, ISQ (5-10x concurrent users) | | ||
| | **mistral-rs Backend** | Production serving with PagedAttention, X-LoRA, ISQ | | ||
| ## TurboQuant — KV-Cache Compression | ||
| Reduce inference memory by 6-8x with <0.5% quality loss: | ||
| ```typescript | ||
| import { simd } from '@ruvector/ruvllm/simd'; | ||
| // TurboQuant compresses KV-cache entries at 2-4 bit precision | ||
| // with per-channel asymmetric quantization (scale + zero-point). | ||
| // Eviction policies (H2O, Sliding Window, PyramidKV) keep the | ||
| // most important tokens in cache during long-context generation. | ||
| // Supported bit widths: 2-bit (32x), 3-bit (10.7x), 4-bit (8x), 8-bit (4x) | ||
| ``` | ||
| | Bits | Compression | Perplexity Loss | Use Case | | ||
| |------|-------------|-----------------|----------| | ||
| | 2-bit | 32x | ~2% | Maximum compression, edge devices | | ||
| | 3-bit | 10.7x | <1% | Balanced — recommended for most uses | | ||
| | 4-bit | 8x | <0.5% | High quality, long-context inference | | ||
| | 8-bit | 4x | ~0% | Baseline quantization | | ||
| ## CLI Usage | ||
@@ -53,0 +85,0 @@ |
URL strings
Supply chain riskPackage contains fragments of external URLs or IP addresses, which the package may be accessing at runtime.
Found 1 instance in 1 package
URL strings
Supply chain riskPackage contains fragments of external URLs or IP addresses, which the package may be accessing at runtime.
Found 1 instance in 1 package
2406681
0.09%283
12.75%