
Security News
Risky Biz Podcast: Making Reachability Analysis Work in Real-World Codebases
This episode explores the hard problem of reachability analysis, from static analysis limits to handling dynamic languages and massive dependency trees.
A complete Ruby implementation of Retrieval-Augmented Generation (RAG) pipeline using native Ruby ML/NLP gems.
Ragnar provides a production-ready RAG pipeline for Ruby applications, integrating:
graph TB
subgraph "Indexing Pipeline"
A[Documents] --> B[Chunker<br/>baran]
B --> C[Embedder<br/>red-candle]
C --> D[Vector DB<br/>lancelot]
D --> E[UMAP Training<br/>annembed]
E --> F[Reduced Embeddings]
end
subgraph "Query Pipeline"
LLMCache[LLM Manager<br/>Cached Instance]
Q[User Query] --> QR[Query Rewriter<br/>red-candle LLM]
QR --> QE[Query Embedder<br/>red-candle]
QE --> VS[Vector Search<br/>lancelot]
VS --> RRF[RRF Fusion]
RRF --> RR[Reranker<br/>red-candle]
RR --> RP[Context Repacker<br/>Deduplication & Organization]
RP --> LLM[Response Generation<br/>red-candle LLM]
LLM --> R[Answer]
LLMCache -.-> QR
LLMCache -.-> LLM
end
D -.-> VS
F -.-> VS
sequenceDiagram
participant User
participant CLI
participant Indexer
participant Chunker
participant Embedder
participant Database
User->>CLI: ragnar index ./documents
CLI->>Indexer: index_path(path)
loop For each file
Indexer->>Indexer: Read file
Indexer->>Chunker: split_text(content)
Chunker-->>Indexer: chunks[]
loop For each chunk
Indexer->>Embedder: embed(text)
Embedder-->>Indexer: embedding[768]
Indexer->>Database: add_document(chunk, embedding)
end
end
Database-->>CLI: stats
CLI-->>User: Indexed N documents
flowchart LR
A[High-Dim Embeddings<br/>768D] --> B[UMAP Training]
B --> C[Model]
C --> D[Low-Dim Embeddings<br/>2-50D]
B --> E[Parameters]
E --> F[n_neighbors]
E --> G[n_components]
E --> H[min_dist]
D --> I[Benefits]
I --> J[Faster Search]
I --> K[Less Memory]
I --> L[Visualization]
flowchart TB
Q[User Query] --> QA[Query Analysis<br/>w/ Cached LLM]
QA --> CI[Clarified Intent]
QA --> SQ[Sub-queries]
QA --> KT[Key Terms]
SQ --> EMB[Embed Each Query]
EMB --> VS[Vector Search]
VS --> RRF[RRF Fusion]
RRF --> RANK[Reranking]
RANK --> TOP[Top-K Documents]
TOP --> CTX[Context Preparation]
CTX --> REPACK[Context Repacking<br/>Deduplication<br/>Summarization<br/>Organization]
REPACK --> GEN[LLM Generation<br/>w/ Same Cached LLM]
CI --> GEN
GEN --> ANS[Final Answer]
gem install ragnar
git clone https://github.com/yourusername/ragnar.git
cd ragnar
bundle install
gem build ragnar.gemspec
gem install ./ragnar-*.gem
# Index a directory of text files
ragnar index ./documents
# Index with custom settings
ragnar index ./documents \
--chunk-size 1000 \
--chunk-overlap 100
Reduce embedding dimensions for faster search:
# Train UMAP model (auto-adjusts parameters based on data)
ragnar train-umap \
--n-components 50 \
--n-neighbors 15
# Apply to all embeddings
ragnar apply-umap
# Basic query
ragnar query "What is the main purpose of this project?"
# Verbose mode shows all intermediate processing steps
ragnar query "How does the chunking process work?" --verbose
# Or use short form
ragnar query "How does the chunking process work?" -v
# JSON output for programmatic use
ragnar query "Explain the embedding model" --json
# Adjust number of retrieved documents
ragnar query "What are the key features?" --top-k 5
# Combine options for detailed analysis
ragnar query "Compare Ruby with Python" -v --top-k 5
When using --verbose
or -v
, you'll see:
ragnar stats
DEFAULT_DB_PATH = "ragnar_database"
DEFAULT_CHUNK_SIZE = 512
DEFAULT_CHUNK_OVERLAP = 50
DEFAULT_EMBEDDING_MODEL = "jinaai/jina-embeddings-v2-base-en"
Embedding Models (via red-candle):
LLM Models (via red-candle):
Reranker Models (via red-candle):
require 'ragnar'
# Initialize components
indexer = Ragnar::Indexer.new(
db_path: "my_database",
chunk_size: 1000
)
# Index documents
stats = indexer.index_path("./documents")
# Query the system
processor = Ragnar::QueryProcessor.new(db_path: "my_database")
result = processor.query(
"What is Ruby?",
top_k: 5,
verbose: true
)
puts result[:answer]
puts "Confidence: #{result[:confidence]}%"
chunker = Ragnar::Chunker.new(
chunk_size: 1000,
chunk_overlap: 200,
separators: ["\n\n", "\n", ". ", " "]
)
chunks = chunker.chunk_text(document_text)
# For small datasets (<100 documents)
processor = Ragnar::UmapProcessor.new
processor.train(
n_components: 10, # Fewer components
n_neighbors: 5, # Fewer neighbors
min_dist: 0.05 # Tighter clusters
)
# For large datasets (>10,000 documents)
processor.train(
n_components: 50, # More components
n_neighbors: 30, # More neighbors
min_dist: 0.1 # Standard distance
)
UMAP fails with "index out of bounds"
Slow indexing performance
Poor query results
# Install dependencies
bundle install
# Run tests
bundle exec rspec
# Build gem
gem build ragnar.gemspec
Component | Purpose | Key Methods |
---|---|---|
Chunker | Split text into semantic chunks | chunk_text() |
Embedder | Generate vector embeddings | embed_text() , embed_batch() |
Database | Store and search vectors | add_document() , search_similar() |
LLMManager | Cache and manage LLM instances | get_llm() , default_llm() |
ContextRepacker | Optimize retrieved context | repack() , repack_with_summary() |
QueryRewriter | Analyze and expand queries | rewrite() |
QueryProcessor | Orchestrate query pipeline | query() |
UmapProcessor | Reduce embedding dimensions | train() , apply() |
Contributions are welcome! Please:
MIT License - see LICENSE file for details
This project integrates several excellent Ruby gems:
FAQs
Unknown package
We found that ragnar-cli demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
This episode explores the hard problem of reachability analysis, from static analysis limits to handling dynamic languages and massive dependency trees.
Security News
/Research
Malicious Nx npm versions stole secrets and wallet info using AI CLI tools; Socket’s AI scanner detected the supply chain attack and flagged the malware.
Security News
CISA’s 2025 draft SBOM guidance adds new fields like hashes, licenses, and tool metadata to make software inventories more actionable.