Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

raggrep

Package Overview
Dependencies
Maintainers
1
Versions
48
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

raggrep

Local filesystem-based RAG system for codebases - semantic search using local embeddings

latest
Source
npmnpm
Version
0.18.0
Version published
Weekly downloads
24
-46.67%
Maintainers
1
Weekly downloads
 
Created
Source

RAGgrep

Local semantic search for codebases — find code using natural language queries.

RAGgrep indexes your code and lets you search it using natural language. Everything runs locally — no external API calls required.

Features

  • Zero-config search — Just run raggrep query and it works. Index is created and updated automatically.
  • Multi-language support — Deep understanding of TypeScript, JavaScript, Python, Go, and Rust with AST-aware parsing.
  • Vocabulary-based search — Search user to find getUserById, fetchUserData, UserService, etc. Natural language queries like "where is user session validated" find validateUserSession().
  • Local-first — All indexing and search happens on your machine. No cloud dependencies.
  • Incremental — Only re-indexes files that have changed. Instant search when nothing changed.
  • Watch mode — Keep the index fresh in real-time as you code.
  • Hybrid search — Combines semantic similarity, keyword matching, and exact text matching for best results.
  • Structured vs semantic — Each hit shows Structured and Semantic match strength. Result order defaults to structured-first; use raggrep query --rank-by semantic (or combined for fused score only) to change ordering.
  • Exact match track — Finds identifiers in ANY file type (YAML, .env, config, not just code) with grep-like precision.
  • Fusion boosting — Semantic results containing exact matches get boosted (1.5x) for better ranking.
  • Literal boosting — Exact identifier matches get priority. Use backticks for precise matching: `AuthService`.
  • Phrase matching — Exact phrases in documentation are found even when semantic similarity is low.
  • Semantic expansion — Domain-specific synonyms improve recall (function ↔ method, auth ↔ authentication).

Installation

# Install globally
npm install -g raggrep

# Or use without installing
npx raggrep query "your search"

Usage

Search Your Code

cd your-project
raggrep query "user authentication"

That's it. The first query creates the index automatically. Subsequent queries are instant if files haven't changed. Modified files are re-indexed on the fly.

Example Output

Natural Language Query:

Index updated: 42 indexed

RAGgrep Search
=============

Searching for: "user authentication"

Found 3 results:

1. src/auth/authService.ts:24-55 (login)
   Score: 34.4% | Structured: 42.0% | Semantic: 31.0% | Type: function | via TypeScript | exported
      export async function login(credentials: LoginCredentials): Promise<AuthResult> {
        const { email, password } = credentials;

2. src/auth/session.ts:10-25 (createSession)
   Score: 28.2% | Structured: 35.0% | Semantic: 22.0% | Type: function | via TypeScript | exported
      export function createSession(user: User): Session {

3. src/users/types.ts:3-12 (User)
   Score: 26.0% | Structured: 30.0% | Semantic: 23.0% | Type: interface | via TypeScript | exported
      export interface User {
        id: string;

Exact Identifier Query (shows both tracks):

Index updated: 42 indexed

Searching for: "AUTH_SERVICE_URL"

┌─ Exact Matches (4 files, 6 matches) ─┐
│  Query: "AUTH_SERVICE_URL"
└─────────────────────────────────────────────────────────────────────┘

  1. config.yaml (2 matches)
     8 │   auth:
     9 │     url: AUTH_SERVICE_URL
  ► 10 │     grpc_url: AUTH_SERVICE_GRPC_URL
     11 │     timeout: 5000

  2. .env.example (1 match)
     2 │ AUTH_SERVICE_URL=https://auth.example.com
  ►  3 │ AUTH_SERVICE_GRPC_URL=grpc://auth.example.com:9000

┌─ Semantic Results (boosted by exact matches) ─┐
└─────────────────────────────────────────────────────────────────────┘

1. src/auth/authService.ts:2-10 (AuthService)
   Score: 45.2% | Structured: 48.0% | Semantic: 43.0% | Type: class | via TypeScript | exported | exact match
      export class AuthService {
        private baseUrl = AUTH_SERVICE_URL;

Watch Mode

Keep your index fresh in real-time while you code:

raggrep index --watch

This monitors file changes and re-indexes automatically. Useful during active development when you want instant search results.

┌─────────────────────────────────────────┐
│  Watching for changes... (Ctrl+C to stop) │
└─────────────────────────────────────────┘

[Watch] language/typescript: 2 indexed, 0 errors

CLI Reference

Commands

raggrep query <query>    # Search the codebase
raggrep index            # Build/update the index
raggrep status           # Show index status
raggrep reset            # Clear the index

Query Options

raggrep query "user login"                    # Natural language query
raggrep query -C ~/projects/my-app "login"    # Search a project without cd
raggrep query "AUTH_SERVICE_URL"             # Exact identifier (auto-triggers exact match)
raggrep query "\`AuthService\`"              # Backticks force exact match
raggrep query "error handling" --top 5        # Limit results
raggrep query "database" --min-score 0.2      # Set minimum score threshold
raggrep query "login flow" --rank-by semantic  # Order by semantic similarity first
raggrep query "auth" --rank-by combined       # Order by fused score only
raggrep query "debug" --timing                # Print timing breakdown
raggrep query "interface" --type ts           # Filter by file extension
raggrep query "auth" --filter src/auth        # Filter by path
raggrep query "api" -f src/api -f src/routes  # Multiple path filters
FlagShortDescription
--dir <path>-CProject directory to search (default: current directory)
--top <n>-kNumber of results to return (default: 10)
--min-score <n>-sMinimum similarity score 0–1 (default: 0.15)
--rank-by <mode>Sort order: structured (default), semantic, or combined
--timing-TPrint timing breakdown for profiling
--type <ext>-tFilter by file extension (e.g., ts, tsx, js)
--filter <path>-fFilter by path or glob pattern (can be used multiple times)
--help-hShow help message

Filtering by File Type

Use glob patterns with --filter to search specific file types:

# Search only source code files
raggrep query "service controller" --filter "*.ts"
raggrep query "component state" --filter "*.tsx"

# Search only documentation
raggrep query "deployment workflow" --filter "*.md"

# Search test files
raggrep query "mock setup" --filter "*.test.ts"

# Combine with path prefix
raggrep query "api handler" --filter "src/**/*.ts"

Multiple Filters (OR Logic)

Use multiple --filter flags to match files that match any of the patterns:

# Search TypeScript OR TSX files
raggrep query "component" --filter "*.ts" --filter "*.tsx"

# Search in multiple directories
raggrep query "api" --filter src/api --filter src/routes

# Mix glob patterns and path prefixes
raggrep query "config" --filter "*.json" --filter "*.yaml" --filter config/

This is useful when you know whether you're looking for code or documentation.

For identifier-like queries (SCREAMING_SNAKE_CASE, camelCase, PascalCase), RAGgrep automatically runs exact match search:

# Finds AUTH_SERVICE_URL in ALL file types (YAML, .env, config, etc.)
raggrep query "AUTH_SERVICE_URL"

# Finds the function by exact name
raggrep query "getUserById"

# Use backticks for explicit exact matching (even natural words)
raggrep query "`configuration`"

What Gets Searched:

  • Source code: .ts, .js, .py, .go, .rs
  • Config files: .yaml, .yml, .json, .toml, .env
  • Documentation: .md, .txt

Ignored: node_modules, .git, dist, build, .cache, etc.

Exact matches are shown in a separate section with line numbers and context. Semantic results containing the same identifier get boosted (1.5x score multiplier).

Index Options

raggrep index                        # Index current directory
raggrep index --dir ../other-repo    # Index another path without cd
raggrep index --watch                # Watch mode - re-index on file changes
raggrep index --verbose              # Show detailed progress
raggrep index --concurrency 8        # Set parallel workers (default: auto)
raggrep index --model bge-small-en-v1.5  # Use specific embedding model
FlagShortDescription
--dir <path>-CProject directory to index (default: current directory)
--watch-wWatch for file changes and re-index automatically
--verbose-vShow detailed progress
--concurrency <n>-cNumber of parallel workers (default: auto based on CPU)
--model <name>-mOverride TypeScript module embedding model (saved config otherwise)
--help-hShow help message

Other Commands

raggrep status                    # Show index status and statistics
raggrep status --dir ./packages/api
raggrep reset                     # Clear the index for the current directory
raggrep reset -C ~/projects/my-app
raggrep --version                 # Show version

How It Works

  • First query — Creates the index (takes 1-2 min for ~1000 files)
  • Subsequent queries — Uses cached index (instant if no changes)
  • Files changed — Re-indexes only modified files automatically
  • Files deleted — Stale entries cleaned up automatically

The index is stored under .raggrep/ in the project directory you index or pass with --dir / -C (by default, the current working directory). Add .raggrep/ to .gitignore if you do not want index files in version control.

Embeddings and benchmarks

Indexing uses Transformers.js–style local ONNX models. Unless you change .raggrep/config.json or pass raggrep index --model, a fresh install uses this stack:

Default
Runtimehuggingface (@huggingface/transformers). Set embeddingRuntime to "xenova" on a module in .raggrep/config.json to use @xenova/transformers instead.
Modelbge-small-en-v1.5 on each embedding-backed module (TypeScript, Python, Go, Rust, JSON, markdown).

Benchmarks (clone next-convex-starter-app at a pinned commit; see each script for options):

CommandWhat it measuresSource
bun run bench:embeddingsEmbedding throughput (runtime × model matrix; nomic omitted from the harness for now)research/bench/benchmark-embedding-runtimes.ts
bun run bench:retrievalIndex + hybrid search time and accuracy vs golden queriesresearch/bench/benchmark-retrieval-quality.ts
bun run eval:goldenAccuracy-only golden eval against a checkoutresearch/eval/run-golden-queries.ts
bun run bench:golden-hillclimbParameter tuning sweep vs golden setresearch/bench/benchmark-raggrep-hillclimb.ts
bun run bench:golden-convexWave-style benchmark vs Convex starter (--fresh, --passes, etc.)research/bench/benchmark-raggrep-golden-queries.ts

Golden query sets: research/eval/golden-queries-next-convex.json (10 queries), research/eval/golden-queries-next-convex-50.json (50 queries). Benchmark scripts write research/results/<name>.result.md (versioned in git for reference) and resumable research/results/*.cache.json (ignored).

What Gets Indexed

Supported Languages

TypeScript/JavaScript (.ts, .tsx, .js, .jsx, .mjs, .cjs)

  • AST-parsed for functions, classes, interfaces, types, enums
  • Full file chunks for broad context
  • JSDoc and comment association

Python (.py)

  • AST-parsed for functions, classes, decorators
  • Docstring extraction and association
  • Fallback regex parsing for robustness

Go (.go)

  • AST-parsed for functions, methods, structs, interfaces
  • Doc comment extraction (// style)
  • Exported symbol detection

Rust (.rs)

  • AST-parsed for functions, structs, traits, impls, enums
  • Doc comment extraction (/// and //! style)
  • Visibility detection (pub)

Markdown (.md)

  • Hierarchical chunking at multiple heading levels (H1-H5)
  • Each heading level creates separate searchable chunks
  • Nested content included for context

JSON (.json)

  • Structure-aware with key/value extraction
  • Path-based indexing

Other formats: .yaml, .yml, .toml, .sql, .txt — Keyword search and full-text indexing

Automatically Ignored

node_modules, dist, build, .git, .next, .cache, __pycache__, target, and other common build/dependency directories

OpenCode Integration

RAGgrep can be integrated with OpenCode to provide semantic code search capabilities within the AI coding assistant.

Installation

RAGgrep supports two installation types for OpenCode:

raggrep opencode install

This installs RAGgrep as a tool by default, which provides the best activation rates in OpenCode.

Explicit Installation Types

Force Tool Installation:

raggrep opencode install --tool
  • Installs to: ~/.config/opencode/tool/raggrep.ts
  • Direct tool execution with full raggrep functionality

Force Skill Installation:

raggrep opencode install --skill
  • Installs to: ~/.config/opencode/skill/raggrep/SKILL.md
  • Skill-based integration for modern OpenCode versions

Mutual Exclusivity

Installing one type will prompt to remove the other (default: yes):

# Installing skill will prompt to remove existing tool
raggrep opencode install --skill

# Installing tool will prompt to remove existing skill  
raggrep opencode install --tool

Usage in OpenCode

Tool Usage

Once installed as a tool, RAGgrep provides direct search functionality:

  • Natural language queries: "user authentication flow"
  • All CLI options: --top, --min-score, --rank-by, --type, --filter, --timing
  • Context-aware results with scores and file locations

Skill Usage

Load the skill in your OpenCode conversation:

skill({ name: "raggrep" })

Then follow the skill's guidance to:

  • Install RAGgrep: npm install -g raggrep
  • Index your codebase: raggrep index
  • Use semantic search: raggrep query "your search term"

Why Tool by Default?

The tool installation is the default because it:

  • Has higher activation rates in OpenCode agents
  • Provides immediate search capabilities
  • Works consistently across all OpenCode versions
  • Offers direct integration without skill loading steps

The skill installation is available for users who prefer the modern skill-based approach or need specific skill integration features.

Documentation

Requirements

  • Node.js 18+ or Bun 1.0+
  • ~50MB disk space for models (cached at ~/.cache/raggrep/models/)

License

MIT

Keywords

rag

FAQs

Package last updated on 30 Apr 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts