Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

@ruvector/ruvllm

Package Overview
Dependencies
Maintainers
1
Versions
13
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@ruvector/ruvllm

Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, FastGRNN routing, and SIMD inference

Source
npmnpm
Version
2.5.1
Version published
Weekly downloads
113K
-22.78%
Maintainers
1
Weekly downloads
 
Created
Source

@ruvector/ruvllm v2.3

Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, and SIMD inference for Node.js.

Installation

npm install @ruvector/ruvllm

Quick Start

import { RuvLLM, RuvLLMConfig } from '@ruvector/ruvllm';

// Initialize with default configuration
const llm = new RuvLLM();

// Or with custom configuration
const llm = new RuvLLM({
  modelPath: './models/ruvltra-small-q4km.gguf',
  sonaEnabled: true,
  flashAttention: true,
  maxTokens: 256,
});

// Generate text
const response = await llm.query('Explain quantum computing');
console.log(response.text);

// Stream generation
for await (const token of llm.stream('Write a haiku about Rust')) {
  process.stdout.write(token);
}

What's New in v2.3

FeatureDescription
RuvLTRA ModelsPurpose-built 0.5B & 3B models for Claude Flow
Task-Specific LoRA5 pre-trained adapters (coder, researcher, security, architect, reviewer)
HuggingFace HubDownload/upload models directly
Adapter MergingTIES, DARE, SLERP strategies
HNSW Routing150x faster semantic matching
Evaluation HarnessSWE-Bench testing with 5 ablation modes
Auto-DimensionHNSW auto-detects model embedding size
mistral-rs BackendProduction serving with PagedAttention, X-LoRA, ISQ (5-10x concurrent users)

CLI Usage

# Query a model
ruvllm query "What is machine learning?"

# Stream output
ruvllm query --stream "Write a poem"

# Download a model
ruvllm download ruvector/ruvltra-small-q4km

# Benchmark
ruvllm bench ./models/model.gguf

# Run evaluation (SWE-Bench)
ruvllm eval --model ./models/model.gguf --subset lite --max-tasks 50

API Reference

RuvLLM Class

class RuvLLM {
  constructor(config?: RuvLLMConfig);

  // Generate text
  query(prompt: string, params?: GenerateParams): Promise<Response>;

  // Stream generation
  stream(prompt: string, params?: GenerateParams): AsyncIterable<string>;

  // Load a model
  loadModel(path: string): Promise<void>;

  // Get SONA learning stats
  sonaStats(): SonaStats | null;

  // Adapt on feedback
  adapt(input: Float32Array, quality: number): void;
}

Configuration

interface RuvLLMConfig {
  modelPath?: string;       // Path to GGUF model
  sonaEnabled?: boolean;    // Enable SONA learning (default: true)
  flashAttention?: boolean; // Use Flash Attention 2 (default: true)
  maxTokens?: number;       // Max generation tokens (default: 256)
  temperature?: number;     // Sampling temperature (default: 0.7)
  topP?: number;            // Top-p sampling (default: 0.9)
}

Generate Parameters

interface GenerateParams {
  maxTokens?: number;
  temperature?: number;
  topP?: number;
  topK?: number;
  repetitionPenalty?: number;
  stopSequences?: string[];
}

SIMD Module

For direct access to optimized SIMD kernels:

import { simd } from '@ruvector/ruvllm/simd';

// Dot product
const result = simd.dotProduct(vecA, vecB);

// Matrix multiplication
const output = simd.matmul(matrix, vector);

// Flash Attention
const attended = simd.flashAttention(query, key, value, scale);

// RMS Normalization
simd.rmsNorm(hidden, weights, epsilon);

Performance (M4 Pro)

OperationPerformance
Inference88-135 tok/s
Flash Attention320µs (seq=2048)
HNSW Search17-62µs
SONA Adapt<1ms
Evaluation5 ablation modes

Evaluation Harness

Run model evaluations with SWE-Bench integration:

import { RuvLLM, EvaluationHarness, AblationMode } from '@ruvector/ruvllm';

const harness = new EvaluationHarness({
  modelPath: './models/model.gguf',
  enableHnsw: true,
  enableSona: true,
});

// Run single evaluation
const result = await harness.evaluate(
  'Fix the null pointer exception',
  'def process(data): return data.split()',
  AblationMode.Full
);

console.log(`Success: ${result.success}, Quality: ${result.qualityScore}`);

// Run ablation study (Baseline, RetrievalOnly, AdaptersOnly, R+A, Full)
const report = await harness.runAblationStudy(tasks);
for (const [mode, metrics] of Object.entries(report.modeMetrics)) {
  console.log(`${mode}: ${metrics.successRate * 100}% success`);
}

mistral-rs Backend (Production Serving)

For production deployments with 10-100+ concurrent users, use the mistral-rs backend:

import { RuvLLM, MistralBackend, PagedAttentionConfig } from '@ruvector/ruvllm';

// Configure for production serving
const backend = new MistralBackend({
  // PagedAttention: 5-10x more concurrent users
  pagedAttention: {
    blockSize: 16,
    maxBlocks: 4096,
    gpuMemoryFraction: 0.9,
    prefixCaching: true,
  },
  // X-LoRA: Per-token adapter routing
  xlora: {
    adapters: ['./adapters/coder', './adapters/researcher'],
    topK: 2,
  },
  // ISQ: Runtime quantization
  isq: {
    bits: 4,
    method: 'awq',
  },
});

const llm = new RuvLLM({ backend });
await llm.loadModel('mistralai/Mistral-7B-Instruct-v0.2');

// Serve multiple concurrent requests
const response = await llm.query('Write production code');

Note: mistral-rs features require the Rust backend with mistral-rs feature enabled. Native bindings will use mistral-rs when available.

Supported Models

  • RuvLTRA-Small (494M) - Q4K, Q5K, Q8
  • RuvLTRA-Medium (3B) - Q4K, Q5K, Q8
  • Qwen 2.5 (0.5B-72B)
  • Llama 3.x (8B-70B)
  • Mistral (7B-22B)
  • Phi-3 (3.8B-14B)
  • Gemma-2 (2B-27B)

Platform Support

PlatformArchitectureStatus
macOSarm64 (M1-M4)✅ Full support
macOSx64✅ Supported
Linuxx64✅ Supported
Linuxarm64✅ Supported
Windowsx64✅ Supported
  • @ruvector/core - Vector operations
  • @ruvector/sona - SONA learning engine
  • @ruvector/ruvector - Full Ruvector SDK

License

MIT OR Apache-2.0

Keywords

ruvllm

FAQs

Package last updated on 21 Feb 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts