Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement →

@devpuccino/mcp-git-codebase

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@devpuccino/mcp-git-codebase

MCP server providing semantic code search and indexing for git repositories

latest

npm

Version: 1.0.1

Version published: 3 months ago

Maintainers: 1

Created: 3 months ago

Source

@devpuccino/mcp-git-codebase

An MCP (Model Context Protocol) server that provides semantic code search and intelligent indexing for git repositories. Enables AI-powered semantic search across codebases using vector embeddings to find relevant code snippets by intent, not just keywords.

Features

✨ Semantic Search - Find code by meaning, not just keywords
🔍 Multi-Language Support - TypeScript, JavaScript, Python, Go, Java, Rust, and more
📊 Multiple Vector Databases - Qdrant, Pinecone, Chroma, Milvus, PostgreSQL with pgvector
🚀 Scalable Indexing - Handle repositories with 1M+ files and 100GB+ of code
⚙️ Background Processing - Queue indexing jobs via Redis/Bull
🌿 Branch-Aware - Search across specific branches or track changes over time
🎯 Precise Code Retrieval - Get exact code snippets with line-level precision.

Installation

Prerequisites

Node.js ≥ 18.0.0
Git (for repository operations)
One of the supported vector databases (Qdrant, Pinecone, Chroma, Milvus, or PostgreSQL)

Install Package

npm install @devpuccino/mcp-git-codebase

Quick Start

1. Configure Vector Database

Set your preferred vector database and its connection details:

# Qdrant (recommended for local development)
export VECTOR_DB_PROVIDER=qdrant
export QDRANT_URL=http://localhost:6333

# Or Pinecone
export VECTOR_DB_PROVIDER=pinecone
export PINECONE_API_KEY=your-api-key
export PINECONE_ENVIRONMENT=your-environment
export PINECONE_INDEX=your-index

# Or PostgreSQL with pgvector
export VECTOR_DB_PROVIDER=postgres
export DATABASE_URL=postgresql://user:password@localhost:5432/codebase

# Or Chroma
export VECTOR_DB_PROVIDER=chroma
export CHROMA_URL=http://localhost
export CHROMA_PORT=8000

# Or Milvus
export VECTOR_DB_PROVIDER=milvus
export MILVUS_HOST=localhost
export MILVUS_PORT=19530

2. Configure Embedding Model

# Ollama (default, local)
export EMBEDDING_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDING_MODEL=bge-base-en-v1.5

# Or OpenAI (cloud)
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

3. Use with Claude Code

Add to your Claude Code configuration (settings.json or settings.local.json):

Minimal Configuration (Qdrant + Ollama):

{
  "mcpServers": {
    "mcp-git-codebase": {
      "command": "npx",
      "args": ["@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "qdrant",
        "QDRANT_URL": "http://localhost:6333",
        "EMBEDDING_PROVIDER": "ollama",
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_EMBEDDING_MODEL": "nomic-embed-text"
      }
    }
  }
}

Full Configuration Example:

{
  "mcpServers": {
    "git-codebase": {
      "command": "npx",
      "args": ["--legacy-peer-deps", "@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "qdrant",
        "QDRANT_URL": "http://your-qdrant-host:6333",
        "QDRANT_API_KEY": "your-api-key-if-needed",
        "VECTOR_DB_COLLECTION_PREFIX": "codebase_",
        
        "EMBEDDING_PROVIDER": "ollama",
        "OLLAMA_BASE_URL": "http://your-ollama-host:11434",
        "OLLAMA_EMBEDDING_MODEL": "bge-base-en-v1.5",
        "EMBEDDING_TIMEOUT": "30000",
        
        "LLM_PROVIDER": "ollama",
        "OLLAMA_MODEL": "qwen2.5-coder:7b",
        "OLLAMA_TIMEOUT": "30000",
        "OLLAMA_MAX_RETRIES": "3",
        "INDEXING_LLM_ENABLED": "true",
        
        "REDIS_HOST": "your-redis-host",
        "REDIS_PORT": "6379",
        "REDIS_PASSWORD": "your-redis-password",
        "REDIS_DB": "0",
        
        "ENABLE_RERANKING": "true",
        "RERANKER_TYPE": "bm25",
        
        "CONSUMER_CONCURRENCY": "2",
        "STARTUP_BATCH_ENABLED": "true",
        "STARTUP_BATCH_LIMIT": "50",
        
        "LOG_LEVEL": "info"
      }
    }
  }
}

Production Configuration (Pinecone + OpenAI):

{
  "mcpServers": {
    "mcp-git-codebase": {
      "command": "npx",
      "args": ["@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "pinecone",
        "PINECONE_API_KEY": "your-pinecone-api-key",
        "PINECONE_ENVIRONMENT": "us-east-1",
        "PINECONE_INDEX": "your-index-name",
        
        "EMBEDDING_PROVIDER": "openai",
        "OPENAI_API_KEY": "your-openai-api-key",
        "OPENAI_EMBEDDING_MODEL": "text-embedding-3-small",
        
        "LLM_PROVIDER": "openai",
        "OPENAI_LLM_MODEL": "gpt-4o-mini",
        
        "LOG_LEVEL": "warn"
      }
    }
  }
}

Tools

`query_codebase`

Perform semantic search across a git repository to find relevant code snippets by meaning.

Parameters:

query_sentence (required): Natural language search query or code snippet
project_path (required): Root directory of the git repository
branch (optional): Specific branch to search (default: current branch)
limit (optional): Max results to return, 1-20 (default: 5)
similarity_threshold (optional): Minimum similarity score, 0-1 (default: 0.6)
file_extensions (optional): Filter by file extensions (e.g., [".ts", ".tsx"])

Example:

{
  "query_sentence": "function to authenticate users with JWT tokens",
  "project_path": "/workspace/myapp",
  "limit": 5,
  "file_extensions": [".ts", ".tsx"]
}

`get_code_snippet`

Retrieve a specific code snippet from a file with line-level precision.

Parameters:

project_path (required): Root directory of the git repository
filepath (required): Relative path to the file
start_line (optional): Starting line number (1-indexed)
end_line (optional): Ending line number
include_line_numbers (optional): Show line numbers (default: true)

Example:

{
  "project_path": "/workspace/myapp",
  "filepath": "src/auth/index.ts",
  "start_line": 10,
  "end_line": 45,
  "include_line_numbers": true
}

`sync_codebase`

Index or re-index a git repository into the vector database.

Parameters:

project_path (required): Root directory of the git repository
branch (optional): Branch to sync (default: current branch)
file_extensions (optional): Only sync specific file types
background (optional): Queue as background job (default: false)
force (optional): Force full re-index from scratch (default: false)

Example:

{
  "project_path": "/workspace/myapp",
  "force": false,
  "background": true
}

`update_codebase`

Trigger indexing after code changes. Optionally commits to git.

Parameters:

project_path (required): Root directory of the git repository
commit_message (required): Message summarizing changes
changed_files (required): Array of changed files with change type
trigger_type (required): One of manual, post_generation, post_merge
skip_git_commit (optional): Skip git commit (default: false)
background (optional): Queue as background job (default: false)

Example:

{
  "project_path": "/workspace/myapp",
  "commit_message": "Update authentication module",
  "changed_files": [
    { "path": "src/auth/index.ts", "change_type": "modified" },
    { "path": "src/auth/jwt.ts", "change_type": "added" }
  ],
  "trigger_type": "manual",
  "background": false
}

Environment Variables

General Vector Database Configuration

Variable	Default	Description
`VECTOR_DB_PROVIDER`	`qdrant`	Vector database type: `qdrant`, `pinecone`, `chroma`, `milvus`, `postgres`
`EMBEDDING_DIMENSION`	`1536`	Dimension of embedding vectors (auto-detected from model, rarely needed)
`VECTOR_DB_COLLECTION_PREFIX`	-	Optional prefix for collection names (useful for multi-tenant setups)

Qdrant Configuration

Variable	Default	Description
`QDRANT_URL`	`http://localhost:6333`	Qdrant server URL
`QDRANT_API_KEY`	-	Qdrant API key (for cloud/managed instances)
`QDRANT_COLLECTION`	`code_snippets`	Collection name for storing embeddings

Pinecone Configuration

Variable	Default	Description
`PINECONE_API_KEY`	-	Pinecone API key (required)
`PINECONE_ENVIRONMENT`	-	Pinecone environment/region (required)
`PINECONE_INDEX`	`code-snippets`	Pinecone index name

Chroma Configuration

Variable	Default	Description
`CHROMA_URL`	`http://localhost`	Chroma server URL
`CHROMA_PORT`	`8000`	Chroma server port
`CHROMA_COLLECTION`	`code_snippets`	Collection name for storing embeddings

Milvus Configuration

Variable	Default	Description
`MILVUS_HOST`	`localhost`	Milvus server host
`MILVUS_PORT`	`19530`	Milvus server port
`MILVUS_COLLECTION`	`code_snippets`	Collection name for storing embeddings

PostgreSQL (pgvector) Configuration

Variable	Default	Description
`DATABASE_URL`	-	PostgreSQL connection string (required)
`POSTGRES_VECTOR_TABLE`	`code_snippets_vectors`	Table name for storing vectors
`POSTGRES_EMBEDDING_COLUMN`	`embedding`	Column name for embedding vectors

Embedding Model Configuration

Variable	Default	Description
`EMBEDDING_PROVIDER`	`ollama`	Embedding provider: `openai`, `ollama`
`EMBEDDING_DIMENSION`	`1536`	Dimension of embedding vectors (auto-detected from model if not set)
`EMBEDDING_TIMEOUT`	`30000`	Timeout for embedding API requests (milliseconds)
`EMBEDDING_BATCH_SIZE`	`10`	Number of items to embed per batch
`EMBEDDING_MAX_RETRIES`	`3`	Maximum retry attempts for failed embedding requests

OpenAI Embedding Configuration

Variable	Default	Description
`OPENAI_API_KEY`	-	OpenAI API key (required for OpenAI provider)
`OPENAI_EMBEDDING_MODEL`	`text-embedding-3-small`	OpenAI embedding model to use
`OPENAI_BASE_URL`	`https://api.openai.com`	OpenAI API base URL (for custom endpoints)

Ollama Embedding Configuration

Variable	Default	Description
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama server URL
`OLLAMA_EMBEDDING_MODEL`	`bge-base-en-v1.5`	Ollama embedding model to use

Common Ollama embedding models:

bge-base-en-v1.5 (768 dimensions) - default, good balance
bge-large-en-v1.5 (1024 dimensions) - higher quality
nomic-embed-text (768 dimensions) - fast and efficient
mxbai-embed-large (1024 dimensions) - high quality

LLM Provider Configuration

Variable	Default	Description
`LLM_PROVIDER`	`ollama`	LLM provider for code analysis: `openai`, `ollama`
`LLM_TIMEOUT`	`8000`	Timeout for LLM API requests (milliseconds)
`LLM_MAX_RETRIES`	`2`	Maximum retry attempts for failed LLM requests
`INDEXING_LLM_ENABLED`	`true`	Enable LLM-based metadata generation during indexing

OpenAI LLM Configuration

Variable	Default	Description
`OPENAI_LLM_MODEL`	`gpt-4o-mini`	OpenAI model for code analysis and summaries

Ollama LLM Configuration

Variable	Default	Description
`OLLAMA_MODEL`	`qwen2.5-coder:7b`	Ollama model for code analysis and summaries
`OLLAMA_TIMEOUT`	`30000`	Timeout for Ollama API requests (milliseconds)
`OLLAMA_MAX_RETRIES`	`3`	Maximum retry attempts for failed Ollama requests

Common Ollama LLM models:

qwen2.5-coder:7b - default, excellent for code analysis
mistral - fast and capable, good for quick tasks
llama3 - Meta's Llama 3, general purpose
codellama - Meta's Code Llama, specialized for code generation

Reranker Configuration

Variable	Default	Description
`ENABLE_RERANKING`	`false`	Enable composite reranking for improved search results
`RERANKER_TYPE`	`bm25`	Reranker type: `bm25` (keyword-based) or `qwen3` (semantic)
`RERANK_API_URL`	-	Reranker API endpoint (required if `RERANKER_TYPE=qwen3`)
`RERANK_TIMEOUT_MS`	`5000`	Request timeout in milliseconds

Reranker Types:

bm25 - Keyword-based reranking (fast, no external API needed)
qwen3 - Semantic reranking using Qwen3 model (requires RERANK_API_URL)

Redis Configuration

Variable	Default	Description
`REDIS_URL`	-	Full Redis connection URL (e.g., `redis://localhost:6379`). Takes precedence over individual settings
`REDIS_HOST`	`localhost`	Redis server host (used if `REDIS_URL` not set)
`REDIS_PORT`	`6379`	Redis server port (used if `REDIS_URL` not set)
`REDIS_PASSWORD`	-	Redis password for authentication (optional)
`REDIS_DB`	`0`	Redis database number (0-15)

Background Processing (Bull Queue)

Variable	Default	Description
`CONSUMER_CONCURRENCY`	`1`	Number of concurrent jobs to process
`PROCESSING_TIMEOUT`	`300000`	Job timeout in milliseconds (default: 5 minutes)
`STARTUP_BATCH_ENABLED`	`true`	Enable batch processing of queued jobs on startup
`STARTUP_BATCH_LIMIT`	`50`	Maximum jobs to process in startup batch

Note: Background processing requires a running Redis server. Use REDIS_URL for simple setups or individual settings (REDIS_HOST, REDIS_PORT, REDIS_PASSWORD, REDIS_DB) for more control.

Logging

Variable	Default	Description
`LOG_LEVEL`	`info`	Log level: `debug`, `info`, `warn`, `error`
`LOG_FORMAT`	`json`	Log format: `json` or `text`

Architecture

┌─────────────────────────────────────────────────────────┐
│              Claude Code / MCP Client                    │
└──────────────────────┬──────────────────────────────────┘
                       │ MCP Protocol (JSON-RPC)
                       │
┌──────────────────────▼──────────────────────────────────┐
│          MCP Git Codebase Server                        │
│  ┌─────────────┬──────────────┬────────────────────┐  │
│  │   Tools     │   Indexing   │   Background Jobs  │  │
│  │  (4 tools)  │  Pipeline    │   (Bull + Redis)   │  │
│  └─────────────┴──────────────┴────────────────────┘  │
└──────────────────────┬──────────────────────────────────┘
                       │
        ┌──────────────┼──────────────┐
        │              │              │
        ▼              ▼              ▼
    Git Repo    Vector Database  Embedding Model
    (local)     (Qdrant/etc)      (OpenAI/Ollama)

Data Flow

Indexing Pipeline
- Extract code units (functions, classes, etc.) using tree-sitter parsers
- Generate embeddings via selected provider
- Store in vector database with metadata
- Track indexing state and checkpoints
Query Pipeline
- Convert query to embedding
- Perform vector similarity search
- Re-rank results with BM25/custom rerankers
- Return top matches with context
Background Processing
- Bull job queue backed by Redis
- Async job processing with retry logic
- Failed job persistence and recovery

Supported Languages

TypeScript / JavaScript
Python
Go
Java
Rust
C/C++ (via tree-sitter)
Ruby
PHP
Kotlin
Scala
Swift
Bash
Robot Framework

Configuration Examples

# Start Qdrant (requires Docker)
docker run -p 6333:6333 qdrant/qdrant

# Set environment variables
export VECTOR_DB_PROVIDER=qdrant
export QDRANT_URL=http://localhost:6333
export EMBEDDING_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDING_MODEL=nomic-embed-text

# Start server
npx @devpuccino/mcp-git-codebase

Production with Pinecone

# Set environment variables
export VECTOR_DB_PROVIDER=pinecone
export PINECONE_API_KEY=your-production-key
export PINECONE_ENVIRONMENT=your-environment
export PINECONE_INDEX=prod-codebase
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Start server
npx @devpuccino/mcp-git-codebase

PostgreSQL with pgvector

# Set environment variables
export VECTOR_DB_PROVIDER=postgres
export DATABASE_URL=postgresql://user:password@host:5432/codebase_db
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Start server
npx @devpuccino/mcp-git-codebase

Performance Considerations

Embedding Generation: Largest cost factor (~100ms per code unit)
Vector Search: Sub-100ms for typical queries
Code Extraction: ~50-200ms per file depending on size
Indexing Speed: ~1000-2000 code units per minute

Optimization Tips:

Use background=true for large codebases
Set appropriate CONSUMER_CONCURRENCY based on resources
Implement incremental indexing via update_codebase
Filter by file_extensions to reduce scope
Use higher similarity_threshold if too many results

Troubleshooting

Connection Issues

# Verify vector database is running
curl http://localhost:6333/health  # Qdrant
curl http://localhost:8000/api/v1/heartbeat  # Chroma

# Check logs
export LOG_LEVEL=debug
npx @devpuccino/mcp-git-codebase

Embedding Model Issues

# Verify Ollama is running
curl http://localhost:11434/api/tags

# Or verify OpenAI API key
echo $OPENAI_API_KEY

Out of Memory

Reduce CONSUMER_CONCURRENCY
Process smaller repositories first
Enable background=true for large syncs

Development

# Clone repository
git clone https://github.com/devpuccino/mcp-git-codebase.git
cd mcp-git-codebase

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

# Start in development mode
npm run dev

License

MIT

Support

For issues, questions, or feature requests, please visit the GitHub repository.

Keywords

FAQs

What is @devpuccino/mcp-git-codebase?

Is @devpuccino/mcp-git-codebase well maintained?

Package last updated on 24 Mar 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@devpuccino/mcp-git-codebase

@devpuccino/mcp-git-codebase

Features

Installation

Prerequisites

Install Package

Quick Start

1. Configure Vector Database

2. Configure Embedding Model

3. Use with Claude Code

Tools

query_codebase

get_code_snippet

sync_codebase

update_codebase

Environment Variables

General Vector Database Configuration

Qdrant Configuration

Pinecone Configuration

Chroma Configuration

Milvus Configuration

PostgreSQL (pgvector) Configuration

Embedding Model Configuration

OpenAI Embedding Configuration

Ollama Embedding Configuration

LLM Provider Configuration

OpenAI LLM Configuration

Ollama LLM Configuration

Reranker Configuration

Redis Configuration

Background Processing (Bull Queue)

Logging

Architecture

Data Flow

Supported Languages

Configuration Examples

Production with Pinecone

PostgreSQL with pgvector

Performance Considerations

Troubleshooting

Connection Issues

Embedding Model Issues

Out of Memory

Development

License

Support

Keywords

Related posts

Socket Partners with Replit to Block Malicious Packages in AI-Powered Development

npm Tooling Bug Incorrectly Marks One-Character Packages as Security Holders

`query_codebase`

`get_code_snippet`

`sync_codebase`

`update_codebase`