Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement →

raggrep

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

raggrep

Local filesystem-based RAG system for codebases - semantic search using local embeddings

latest

Source

npm

Version: 0.18.0

Version published: last month

Weekly downloads: 24

Maintainers: 1

Weekly downloads

Created: 7 months ago

Source

RAGgrep

Local semantic search for codebases — find code using natural language queries.

RAGgrep indexes your code and lets you search it using natural language. Everything runs locally — no external API calls required.

Features

Zero-config search — Just run raggrep query and it works. Index is created and updated automatically.
Multi-language support — Deep understanding of TypeScript, JavaScript, Python, Go, and Rust with AST-aware parsing.
Vocabulary-based search — Search user to find getUserById, fetchUserData, UserService, etc. Natural language queries like "where is user session validated" find validateUserSession().
Local-first — All indexing and search happens on your machine. No cloud dependencies.
Incremental — Only re-indexes files that have changed. Instant search when nothing changed.
Watch mode — Keep the index fresh in real-time as you code.
Hybrid search — Combines semantic similarity, keyword matching, and exact text matching for best results.
Structured vs semantic — Each hit shows Structured and Semantic match strength. Result order defaults to structured-first; use raggrep query --rank-by semantic (or combined for fused score only) to change ordering.
Exact match track — Finds identifiers in ANY file type (YAML, .env, config, not just code) with grep-like precision.
Fusion boosting — Semantic results containing exact matches get boosted (1.5x) for better ranking.
Literal boosting — Exact identifier matches get priority. Use backticks for precise matching: `AuthService`.
Phrase matching — Exact phrases in documentation are found even when semantic similarity is low.
Semantic expansion — Domain-specific synonyms improve recall (function ↔ method, auth ↔ authentication).

Installation

# Install globally
npm install -g raggrep

# Or use without installing
npx raggrep query "your search"

Usage

Search Your Code

cd your-project
raggrep query "user authentication"

That's it. The first query creates the index automatically. Subsequent queries are instant if files haven't changed. Modified files are re-indexed on the fly.

Example Output

Natural Language Query:

Index updated: 42 indexed

RAGgrep Search
=============

Searching for: "user authentication"

Found 3 results:

1. src/auth/authService.ts:24-55 (login)
   Score: 34.4% | Structured: 42.0% | Semantic: 31.0% | Type: function | via TypeScript | exported
      export async function login(credentials: LoginCredentials): Promise<AuthResult> {
        const { email, password } = credentials;

2. src/auth/session.ts:10-25 (createSession)
   Score: 28.2% | Structured: 35.0% | Semantic: 22.0% | Type: function | via TypeScript | exported
      export function createSession(user: User): Session {

3. src/users/types.ts:3-12 (User)
   Score: 26.0% | Structured: 30.0% | Semantic: 23.0% | Type: interface | via TypeScript | exported
      export interface User {
        id: string;

Exact Identifier Query (shows both tracks):

Index updated: 42 indexed

Searching for: "AUTH_SERVICE_URL"

┌─ Exact Matches (4 files, 6 matches) ─┐
│  Query: "AUTH_SERVICE_URL"
└─────────────────────────────────────────────────────────────────────┘

  1. config.yaml (2 matches)
     8 │   auth:
     9 │     url: AUTH_SERVICE_URL
  ► 10 │     grpc_url: AUTH_SERVICE_GRPC_URL
     11 │     timeout: 5000

  2. .env.example (1 match)
     2 │ AUTH_SERVICE_URL=https://auth.example.com
  ►  3 │ AUTH_SERVICE_GRPC_URL=grpc://auth.example.com:9000

┌─ Semantic Results (boosted by exact matches) ─┐
└─────────────────────────────────────────────────────────────────────┘

1. src/auth/authService.ts:2-10 (AuthService)
   Score: 45.2% | Structured: 48.0% | Semantic: 43.0% | Type: class | via TypeScript | exported | exact match
      export class AuthService {
        private baseUrl = AUTH_SERVICE_URL;

Watch Mode

Keep your index fresh in real-time while you code:

raggrep index --watch

This monitors file changes and re-indexes automatically. Useful during active development when you want instant search results.

┌─────────────────────────────────────────┐
│  Watching for changes... (Ctrl+C to stop) │
└─────────────────────────────────────────┘

[Watch] language/typescript: 2 indexed, 0 errors

CLI Reference

Commands

raggrep query <query>    # Search the codebase
raggrep index            # Build/update the index
raggrep status           # Show index status
raggrep reset            # Clear the index

Query Options

raggrep query "user login"                    # Natural language query
raggrep query -C ~/projects/my-app "login"    # Search a project without cd
raggrep query "AUTH_SERVICE_URL"             # Exact identifier (auto-triggers exact match)
raggrep query "\`AuthService\`"              # Backticks force exact match
raggrep query "error handling" --top 5        # Limit results
raggrep query "database" --min-score 0.2      # Set minimum score threshold
raggrep query "login flow" --rank-by semantic  # Order by semantic similarity first
raggrep query "auth" --rank-by combined       # Order by fused score only
raggrep query "debug" --timing                # Print timing breakdown
raggrep query "interface" --type ts           # Filter by file extension
raggrep query "auth" --filter src/auth        # Filter by path
raggrep query "api" -f src/api -f src/routes  # Multiple path filters

Flag	Short	Description
`--dir <path>`	`-C`	Project directory to search (default: current directory)
`--top <n>`	`-k`	Number of results to return (default: 10)
`--min-score <n>`	`-s`	Minimum similarity score 0–1 (default: 0.15)
`--rank-by <mode>`		Sort order: `structured` (default), `semantic`, or `combined`
`--timing`	`-T`	Print timing breakdown for profiling
`--type <ext>`	`-t`	Filter by file extension (e.g., ts, tsx, js)
`--filter <path>`	`-f`	Filter by path or glob pattern (can be used multiple times)
`--help`	`-h`	Show help message

Filtering by File Type

Use glob patterns with --filter to search specific file types:

# Search only source code files
raggrep query "service controller" --filter "*.ts"
raggrep query "component state" --filter "*.tsx"

# Search only documentation
raggrep query "deployment workflow" --filter "*.md"

# Search test files
raggrep query "mock setup" --filter "*.test.ts"

# Combine with path prefix
raggrep query "api handler" --filter "src/**/*.ts"

Multiple Filters (OR Logic)

Use multiple --filter flags to match files that match any of the patterns:

# Search TypeScript OR TSX files
raggrep query "component" --filter "*.ts" --filter "*.tsx"

# Search in multiple directories
raggrep query "api" --filter src/api --filter src/routes

# Mix glob patterns and path prefixes
raggrep query "config" --filter "*.json" --filter "*.yaml" --filter config/

This is useful when you know whether you're looking for code or documentation.

Exact Match Search

For identifier-like queries (SCREAMING_SNAKE_CASE, camelCase, PascalCase), RAGgrep automatically runs exact match search:

# Finds AUTH_SERVICE_URL in ALL file types (YAML, .env, config, etc.)
raggrep query "AUTH_SERVICE_URL"

# Finds the function by exact name
raggrep query "getUserById"

# Use backticks for explicit exact matching (even natural words)
raggrep query "`configuration`"

What Gets Searched:

Source code: .ts, .js, .py, .go, .rs
Config files: .yaml, .yml, .json, .toml, .env
Documentation: .md, .txt

Ignored: node_modules, .git, dist, build, .cache, etc.

Exact matches are shown in a separate section with line numbers and context. Semantic results containing the same identifier get boosted (1.5x score multiplier).

Index Options

raggrep index                        # Index current directory
raggrep index --dir ../other-repo    # Index another path without cd
raggrep index --watch                # Watch mode - re-index on file changes
raggrep index --verbose              # Show detailed progress
raggrep index --concurrency 8        # Set parallel workers (default: auto)
raggrep index --model bge-small-en-v1.5  # Use specific embedding model

Flag	Short	Description
`--dir <path>`	`-C`	Project directory to index (default: current directory)
`--watch`	`-w`	Watch for file changes and re-index automatically
`--verbose`	`-v`	Show detailed progress
`--concurrency <n>`	`-c`	Number of parallel workers (default: auto based on CPU)
`--model <name>`	`-m`	Override TypeScript module embedding model (saved config otherwise)
`--help`	`-h`	Show help message

Other Commands

raggrep status                    # Show index status and statistics
raggrep status --dir ./packages/api
raggrep reset                     # Clear the index for the current directory
raggrep reset -C ~/projects/my-app
raggrep --version                 # Show version

How It Works

First query — Creates the index (takes 1-2 min for ~1000 files)
Subsequent queries — Uses cached index (instant if no changes)
Files changed — Re-indexes only modified files automatically
Files deleted — Stale entries cleaned up automatically

The index is stored under .raggrep/ in the project directory you index or pass with --dir / -C (by default, the current working directory). Add .raggrep/ to .gitignore if you do not want index files in version control.

Embeddings and benchmarks

Indexing uses Transformers.js–style local ONNX models. Unless you change .raggrep/config.json or pass raggrep index --model, a fresh install uses this stack:

	Default
Runtime	`huggingface` (`@huggingface/transformers`). Set `embeddingRuntime` to `"xenova"` on a module in `.raggrep/config.json` to use `@xenova/transformers` instead.
Model	`bge-small-en-v1.5` on each embedding-backed module (TypeScript, Python, Go, Rust, JSON, markdown).

Benchmarks (clone next-convex-starter-app at a pinned commit; see each script for options):

Command	What it measures	Source
`bun run bench:embeddings`	Embedding throughput (runtime × model matrix; nomic omitted from the harness for now)	`research/bench/benchmark-embedding-runtimes.ts`
`bun run bench:retrieval`	Index + hybrid search time and accuracy vs golden queries	`research/bench/benchmark-retrieval-quality.ts`
`bun run eval:golden`	Accuracy-only golden eval against a checkout	`research/eval/run-golden-queries.ts`
`bun run bench:golden-hillclimb`	Parameter tuning sweep vs golden set	`research/bench/benchmark-raggrep-hillclimb.ts`
`bun run bench:golden-convex`	Wave-style benchmark vs Convex starter (`--fresh`, `--passes`, etc.)	`research/bench/benchmark-raggrep-golden-queries.ts`

Golden query sets: research/eval/golden-queries-next-convex.json (10 queries), research/eval/golden-queries-next-convex-50.json (50 queries). Benchmark scripts write research/results/<name>.result.md (versioned in git for reference) and resumable research/results/*.cache.json (ignored).

What Gets Indexed

Supported Languages

TypeScript/JavaScript (.ts, .tsx, .js, .jsx, .mjs, .cjs)

AST-parsed for functions, classes, interfaces, types, enums
Full file chunks for broad context
JSDoc and comment association

Python (.py)

AST-parsed for functions, classes, decorators
Docstring extraction and association
Fallback regex parsing for robustness

Go (.go)

AST-parsed for functions, methods, structs, interfaces
Doc comment extraction (// style)
Exported symbol detection

Rust (.rs)

AST-parsed for functions, structs, traits, impls, enums
Doc comment extraction (/// and //! style)
Visibility detection (pub)

Markdown (.md)

Hierarchical chunking at multiple heading levels (H1-H5)
Each heading level creates separate searchable chunks
Nested content included for context

JSON (.json)

Structure-aware with key/value extraction
Path-based indexing

Other formats: .yaml, .yml, .toml, .sql, .txt — Keyword search and full-text indexing

Automatically Ignored

node_modules, dist, build, .git, .next, .cache, __pycache__, target, and other common build/dependency directories

OpenCode Integration

RAGgrep can be integrated with OpenCode to provide semantic code search capabilities within the AI coding assistant.

Installation

RAGgrep supports two installation types for OpenCode:

Default Installation (Recommended)

raggrep opencode install

This installs RAGgrep as a tool by default, which provides the best activation rates in OpenCode.

Explicit Installation Types

Force Tool Installation:

raggrep opencode install --tool

Installs to: ~/.config/opencode/tool/raggrep.ts
Direct tool execution with full raggrep functionality

Force Skill Installation:

raggrep opencode install --skill

Installs to: ~/.config/opencode/skill/raggrep/SKILL.md
Skill-based integration for modern OpenCode versions

Mutual Exclusivity

Installing one type will prompt to remove the other (default: yes):

# Installing skill will prompt to remove existing tool
raggrep opencode install --skill

# Installing tool will prompt to remove existing skill  
raggrep opencode install --tool

Usage in OpenCode

Tool Usage

Once installed as a tool, RAGgrep provides direct search functionality:

Natural language queries: "user authentication flow"
All CLI options: --top, --min-score, --rank-by, --type, --filter, --timing
Context-aware results with scores and file locations

Skill Usage

Load the skill in your OpenCode conversation:

skill({ name: "raggrep" })

Then follow the skill's guidance to:

Install RAGgrep: npm install -g raggrep
Index your codebase: raggrep index
Use semantic search: raggrep query "your search term"

Why Tool by Default?

The tool installation is the default because it:

Has higher activation rates in OpenCode agents
Provides immediate search capabilities
Works consistently across all OpenCode versions
Offers direct integration without skill loading steps

The skill installation is available for users who prefer the modern skill-based approach or need specific skill integration features.

Documentation

Getting Started — Installation options and first steps
CLI Reference — All commands and options
SDK Reference — Programmatic API for Node.js/Bun
Advanced — Configuration, maintenance commands
Architecture — How RAGgrep works internally

Requirements

Node.js 18+ or Bun 1.0+
~50MB disk space for models (cached at ~/.cache/raggrep/models/)

License

MIT

Keywords

FAQs

What is raggrep?

Is raggrep popular?

Is raggrep well maintained?

Package last updated on 30 Apr 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

raggrep

RAGgrep

Features

Installation

Usage

Search Your Code

Example Output

Watch Mode

CLI Reference

Commands

Query Options

Filtering by File Type

Multiple Filters (OR Logic)

Exact Match Search

Index Options

Other Commands

How It Works

Embeddings and benchmarks

What Gets Indexed

Supported Languages

Automatically Ignored

OpenCode Integration

Installation

Default Installation (Recommended)

Explicit Installation Types

Mutual Exclusivity

Usage in OpenCode

Tool Usage

Skill Usage

Why Tool by Default?

Documentation

Requirements

License

Keywords

Related posts

Socket Partners with Replit to Block Malicious Packages in AI-Powered Development

npm Tooling Bug Incorrectly Marks One-Character Packages as Security Holders