Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

@ruvector/ruvllm

Package Overview
Dependencies
Maintainers
1
Versions
13
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@ruvector/ruvllm - npm Package Compare versions

Comparing version
2.5.3
to
2.5.4
+14
-3
package.json
{
"name": "@ruvector/ruvllm",
"version": "2.5.3",
"description": "Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, FastGRNN routing, and SIMD inference",
"version": "2.5.4",
"description": "Self-learning LLM runtime — TurboQuant KV-cache (6-8x compression), SONA adaptive learning, FlashAttention, speculative decoding, GGUF inference",
"main": "dist/cjs/index.js",

@@ -95,3 +95,14 @@ "module": "dist/esm/index.js",

"rust",
"ruvector"
"ruvector",
"turboquant",
"kv-cache",
"quantization",
"flash-attention",
"speculative-decoding",
"gguf",
"mamba",
"transformer",
"edge-ai",
"local-llm",
"model-compression"
],

@@ -98,0 +109,0 @@ "author": "rUv Team <team@ruv.io>",

@@ -1,5 +0,12 @@

# @ruvector/ruvllm v2.3
# @ruvector/ruvllm
Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, and SIMD inference for Node.js.
[![npm version](https://img.shields.io/npm/v/@ruvector/ruvllm.svg)](https://www.npmjs.com/package/@ruvector/ruvllm)
[![Downloads](https://img.shields.io/npm/dm/@ruvector/ruvllm)](https://www.npmjs.com/package/@ruvector/ruvllm)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![GitHub Stars](https://img.shields.io/github/stars/ruvnet/ruvector?style=social)](https://github.com/ruvnet/ruvector)
**Self-learning LLM runtime for Node.js** — GGUF inference, TurboQuant KV-cache compression (6-8x memory savings), SONA adaptive learning, FlashAttention, speculative decoding, and SIMD-optimized kernels. Built in Rust, runs everywhere.
> Inference at **88-135 tok/s** on M4 Pro | **<1ms** SONA adaptation | **6-8x** KV-cache compression via TurboQuant
## Installation

@@ -37,6 +44,10 @@

## What's New in v2.3
## What's New in v2.5
| Feature | Description |
|---------|-------------|
| **TurboQuant KV-Cache** | 2-4 bit asymmetric quantization with per-channel scale/zero-point — 6-8x memory reduction, <0.5% perplexity loss |
| **TurboQuant Embedding Store** | Quantized vector storage with compressed search — 10-30x memory savings |
| **H2O / PyramidKV Eviction** | Intelligent cache eviction policies for long-context inference |
| **Optimized Inner Product** | Asymmetric distance on quantized data — skip decompression for 2-4x faster search |
| **RuvLTRA Models** | Purpose-built 0.5B & 3B models for Claude Flow |

@@ -48,5 +59,26 @@ | **Task-Specific LoRA** | 5 pre-trained adapters (coder, researcher, security, architect, reviewer) |

| **Evaluation Harness** | SWE-Bench testing with 5 ablation modes |
| **Auto-Dimension** | HNSW auto-detects model embedding size |
| **mistral-rs Backend** | Production serving with PagedAttention, X-LoRA, ISQ (5-10x concurrent users) |
| **mistral-rs Backend** | Production serving with PagedAttention, X-LoRA, ISQ |
## TurboQuant — KV-Cache Compression
Reduce inference memory by 6-8x with <0.5% quality loss:
```typescript
import { simd } from '@ruvector/ruvllm/simd';
// TurboQuant compresses KV-cache entries at 2-4 bit precision
// with per-channel asymmetric quantization (scale + zero-point).
// Eviction policies (H2O, Sliding Window, PyramidKV) keep the
// most important tokens in cache during long-context generation.
// Supported bit widths: 2-bit (32x), 3-bit (10.7x), 4-bit (8x), 8-bit (4x)
```
| Bits | Compression | Perplexity Loss | Use Case |
|------|-------------|-----------------|----------|
| 2-bit | 32x | ~2% | Maximum compression, edge devices |
| 3-bit | 10.7x | <1% | Balanced — recommended for most uses |
| 4-bit | 8x | <0.5% | High quality, long-context inference |
| 8-bit | 4x | ~0% | Baseline quantization |
## CLI Usage

@@ -53,0 +85,0 @@