Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

@devpuccino/mcp-git-codebase

Package Overview
Dependencies
Maintainers
1
Versions
2
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@devpuccino/mcp-git-codebase

MCP server providing semantic code search and indexing for git repositories

latest
npmnpm
Version
1.0.1
Version published
Weekly downloads
10
-37.5%
Maintainers
1
Weekly downloads
 
Created
Source

npm version License: MIT Node.js Version status-badge

@devpuccino/mcp-git-codebase

An MCP (Model Context Protocol) server that provides semantic code search and intelligent indexing for git repositories. Enables AI-powered semantic search across codebases using vector embeddings to find relevant code snippets by intent, not just keywords.

Features

Semantic Search - Find code by meaning, not just keywords
🔍 Multi-Language Support - TypeScript, JavaScript, Python, Go, Java, Rust, and more
📊 Multiple Vector Databases - Qdrant, Pinecone, Chroma, Milvus, PostgreSQL with pgvector
🚀 Scalable Indexing - Handle repositories with 1M+ files and 100GB+ of code
⚙️ Background Processing - Queue indexing jobs via Redis/Bull
🌿 Branch-Aware - Search across specific branches or track changes over time
🎯 Precise Code Retrieval - Get exact code snippets with line-level precision.

Installation

Prerequisites

  • Node.js ≥ 18.0.0
  • Git (for repository operations)
  • One of the supported vector databases (Qdrant, Pinecone, Chroma, Milvus, or PostgreSQL)

Install Package

npm install @devpuccino/mcp-git-codebase

Quick Start

1. Configure Vector Database

Set your preferred vector database and its connection details:

# Qdrant (recommended for local development)
export VECTOR_DB_PROVIDER=qdrant
export QDRANT_URL=http://localhost:6333

# Or Pinecone
export VECTOR_DB_PROVIDER=pinecone
export PINECONE_API_KEY=your-api-key
export PINECONE_ENVIRONMENT=your-environment
export PINECONE_INDEX=your-index

# Or PostgreSQL with pgvector
export VECTOR_DB_PROVIDER=postgres
export DATABASE_URL=postgresql://user:password@localhost:5432/codebase

# Or Chroma
export VECTOR_DB_PROVIDER=chroma
export CHROMA_URL=http://localhost
export CHROMA_PORT=8000

# Or Milvus
export VECTOR_DB_PROVIDER=milvus
export MILVUS_HOST=localhost
export MILVUS_PORT=19530

2. Configure Embedding Model

# Ollama (default, local)
export EMBEDDING_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDING_MODEL=bge-base-en-v1.5

# Or OpenAI (cloud)
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

3. Use with Claude Code

Add to your Claude Code configuration (settings.json or settings.local.json):

Minimal Configuration (Qdrant + Ollama):

{
  "mcpServers": {
    "mcp-git-codebase": {
      "command": "npx",
      "args": ["@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "qdrant",
        "QDRANT_URL": "http://localhost:6333",
        "EMBEDDING_PROVIDER": "ollama",
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_EMBEDDING_MODEL": "nomic-embed-text"
      }
    }
  }
}

Full Configuration Example:

{
  "mcpServers": {
    "git-codebase": {
      "command": "npx",
      "args": ["--legacy-peer-deps", "@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "qdrant",
        "QDRANT_URL": "http://your-qdrant-host:6333",
        "QDRANT_API_KEY": "your-api-key-if-needed",
        "VECTOR_DB_COLLECTION_PREFIX": "codebase_",
        
        "EMBEDDING_PROVIDER": "ollama",
        "OLLAMA_BASE_URL": "http://your-ollama-host:11434",
        "OLLAMA_EMBEDDING_MODEL": "bge-base-en-v1.5",
        "EMBEDDING_TIMEOUT": "30000",
        
        "LLM_PROVIDER": "ollama",
        "OLLAMA_MODEL": "qwen2.5-coder:7b",
        "OLLAMA_TIMEOUT": "30000",
        "OLLAMA_MAX_RETRIES": "3",
        "INDEXING_LLM_ENABLED": "true",
        
        "REDIS_HOST": "your-redis-host",
        "REDIS_PORT": "6379",
        "REDIS_PASSWORD": "your-redis-password",
        "REDIS_DB": "0",
        
        "ENABLE_RERANKING": "true",
        "RERANKER_TYPE": "bm25",
        
        "CONSUMER_CONCURRENCY": "2",
        "STARTUP_BATCH_ENABLED": "true",
        "STARTUP_BATCH_LIMIT": "50",
        
        "LOG_LEVEL": "info"
      }
    }
  }
}

Production Configuration (Pinecone + OpenAI):

{
  "mcpServers": {
    "mcp-git-codebase": {
      "command": "npx",
      "args": ["@devpuccino/mcp-git-codebase"],
      "env": {
        "VECTOR_DB_PROVIDER": "pinecone",
        "PINECONE_API_KEY": "your-pinecone-api-key",
        "PINECONE_ENVIRONMENT": "us-east-1",
        "PINECONE_INDEX": "your-index-name",
        
        "EMBEDDING_PROVIDER": "openai",
        "OPENAI_API_KEY": "your-openai-api-key",
        "OPENAI_EMBEDDING_MODEL": "text-embedding-3-small",
        
        "LLM_PROVIDER": "openai",
        "OPENAI_LLM_MODEL": "gpt-4o-mini",
        
        "LOG_LEVEL": "warn"
      }
    }
  }
}

Tools

query_codebase

Perform semantic search across a git repository to find relevant code snippets by meaning.

Parameters:

  • query_sentence (required): Natural language search query or code snippet
  • project_path (required): Root directory of the git repository
  • branch (optional): Specific branch to search (default: current branch)
  • limit (optional): Max results to return, 1-20 (default: 5)
  • similarity_threshold (optional): Minimum similarity score, 0-1 (default: 0.6)
  • file_extensions (optional): Filter by file extensions (e.g., [".ts", ".tsx"])

Example:

{
  "query_sentence": "function to authenticate users with JWT tokens",
  "project_path": "/workspace/myapp",
  "limit": 5,
  "file_extensions": [".ts", ".tsx"]
}

get_code_snippet

Retrieve a specific code snippet from a file with line-level precision.

Parameters:

  • project_path (required): Root directory of the git repository
  • filepath (required): Relative path to the file
  • start_line (optional): Starting line number (1-indexed)
  • end_line (optional): Ending line number
  • include_line_numbers (optional): Show line numbers (default: true)

Example:

{
  "project_path": "/workspace/myapp",
  "filepath": "src/auth/index.ts",
  "start_line": 10,
  "end_line": 45,
  "include_line_numbers": true
}

sync_codebase

Index or re-index a git repository into the vector database.

Parameters:

  • project_path (required): Root directory of the git repository
  • branch (optional): Branch to sync (default: current branch)
  • file_extensions (optional): Only sync specific file types
  • background (optional): Queue as background job (default: false)
  • force (optional): Force full re-index from scratch (default: false)

Example:

{
  "project_path": "/workspace/myapp",
  "force": false,
  "background": true
}

update_codebase

Trigger indexing after code changes. Optionally commits to git.

Parameters:

  • project_path (required): Root directory of the git repository
  • commit_message (required): Message summarizing changes
  • changed_files (required): Array of changed files with change type
  • trigger_type (required): One of manual, post_generation, post_merge
  • skip_git_commit (optional): Skip git commit (default: false)
  • background (optional): Queue as background job (default: false)

Example:

{
  "project_path": "/workspace/myapp",
  "commit_message": "Update authentication module",
  "changed_files": [
    { "path": "src/auth/index.ts", "change_type": "modified" },
    { "path": "src/auth/jwt.ts", "change_type": "added" }
  ],
  "trigger_type": "manual",
  "background": false
}

Environment Variables

General Vector Database Configuration

VariableDefaultDescription
VECTOR_DB_PROVIDERqdrantVector database type: qdrant, pinecone, chroma, milvus, postgres
EMBEDDING_DIMENSION1536Dimension of embedding vectors (auto-detected from model, rarely needed)
VECTOR_DB_COLLECTION_PREFIX-Optional prefix for collection names (useful for multi-tenant setups)

Qdrant Configuration

VariableDefaultDescription
QDRANT_URLhttp://localhost:6333Qdrant server URL
QDRANT_API_KEY-Qdrant API key (for cloud/managed instances)
QDRANT_COLLECTIONcode_snippetsCollection name for storing embeddings

Pinecone Configuration

VariableDefaultDescription
PINECONE_API_KEY-Pinecone API key (required)
PINECONE_ENVIRONMENT-Pinecone environment/region (required)
PINECONE_INDEXcode-snippetsPinecone index name

Chroma Configuration

VariableDefaultDescription
CHROMA_URLhttp://localhostChroma server URL
CHROMA_PORT8000Chroma server port
CHROMA_COLLECTIONcode_snippetsCollection name for storing embeddings

Milvus Configuration

VariableDefaultDescription
MILVUS_HOSTlocalhostMilvus server host
MILVUS_PORT19530Milvus server port
MILVUS_COLLECTIONcode_snippetsCollection name for storing embeddings

PostgreSQL (pgvector) Configuration

VariableDefaultDescription
DATABASE_URL-PostgreSQL connection string (required)
POSTGRES_VECTOR_TABLEcode_snippets_vectorsTable name for storing vectors
POSTGRES_EMBEDDING_COLUMNembeddingColumn name for embedding vectors

Embedding Model Configuration

VariableDefaultDescription
EMBEDDING_PROVIDERollamaEmbedding provider: openai, ollama
EMBEDDING_DIMENSION1536Dimension of embedding vectors (auto-detected from model if not set)
EMBEDDING_TIMEOUT30000Timeout for embedding API requests (milliseconds)
EMBEDDING_BATCH_SIZE10Number of items to embed per batch
EMBEDDING_MAX_RETRIES3Maximum retry attempts for failed embedding requests

OpenAI Embedding Configuration

VariableDefaultDescription
OPENAI_API_KEY-OpenAI API key (required for OpenAI provider)
OPENAI_EMBEDDING_MODELtext-embedding-3-smallOpenAI embedding model to use
OPENAI_BASE_URLhttps://api.openai.comOpenAI API base URL (for custom endpoints)

Ollama Embedding Configuration

VariableDefaultDescription
OLLAMA_BASE_URLhttp://localhost:11434Ollama server URL
OLLAMA_EMBEDDING_MODELbge-base-en-v1.5Ollama embedding model to use

Common Ollama embedding models:

  • bge-base-en-v1.5 (768 dimensions) - default, good balance
  • bge-large-en-v1.5 (1024 dimensions) - higher quality
  • nomic-embed-text (768 dimensions) - fast and efficient
  • mxbai-embed-large (1024 dimensions) - high quality

LLM Provider Configuration

VariableDefaultDescription
LLM_PROVIDERollamaLLM provider for code analysis: openai, ollama
LLM_TIMEOUT8000Timeout for LLM API requests (milliseconds)
LLM_MAX_RETRIES2Maximum retry attempts for failed LLM requests
INDEXING_LLM_ENABLEDtrueEnable LLM-based metadata generation during indexing

OpenAI LLM Configuration

VariableDefaultDescription
OPENAI_LLM_MODELgpt-4o-miniOpenAI model for code analysis and summaries

Ollama LLM Configuration

VariableDefaultDescription
OLLAMA_MODELqwen2.5-coder:7bOllama model for code analysis and summaries
OLLAMA_TIMEOUT30000Timeout for Ollama API requests (milliseconds)
OLLAMA_MAX_RETRIES3Maximum retry attempts for failed Ollama requests

Common Ollama LLM models:

  • qwen2.5-coder:7b - default, excellent for code analysis
  • mistral - fast and capable, good for quick tasks
  • llama3 - Meta's Llama 3, general purpose
  • codellama - Meta's Code Llama, specialized for code generation

Reranker Configuration

VariableDefaultDescription
ENABLE_RERANKINGfalseEnable composite reranking for improved search results
RERANKER_TYPEbm25Reranker type: bm25 (keyword-based) or qwen3 (semantic)
RERANK_API_URL-Reranker API endpoint (required if RERANKER_TYPE=qwen3)
RERANK_TIMEOUT_MS5000Request timeout in milliseconds

Reranker Types:

  • bm25 - Keyword-based reranking (fast, no external API needed)
  • qwen3 - Semantic reranking using Qwen3 model (requires RERANK_API_URL)

Redis Configuration

VariableDefaultDescription
REDIS_URL-Full Redis connection URL (e.g., redis://localhost:6379). Takes precedence over individual settings
REDIS_HOSTlocalhostRedis server host (used if REDIS_URL not set)
REDIS_PORT6379Redis server port (used if REDIS_URL not set)
REDIS_PASSWORD-Redis password for authentication (optional)
REDIS_DB0Redis database number (0-15)

Background Processing (Bull Queue)

VariableDefaultDescription
CONSUMER_CONCURRENCY1Number of concurrent jobs to process
PROCESSING_TIMEOUT300000Job timeout in milliseconds (default: 5 minutes)
STARTUP_BATCH_ENABLEDtrueEnable batch processing of queued jobs on startup
STARTUP_BATCH_LIMIT50Maximum jobs to process in startup batch

Note: Background processing requires a running Redis server. Use REDIS_URL for simple setups or individual settings (REDIS_HOST, REDIS_PORT, REDIS_PASSWORD, REDIS_DB) for more control.

Logging

VariableDefaultDescription
LOG_LEVELinfoLog level: debug, info, warn, error
LOG_FORMATjsonLog format: json or text

Architecture

┌─────────────────────────────────────────────────────────┐
│              Claude Code / MCP Client                    │
└──────────────────────┬──────────────────────────────────┘
                       │ MCP Protocol (JSON-RPC)
                       │
┌──────────────────────▼──────────────────────────────────┐
│          MCP Git Codebase Server                        │
│  ┌─────────────┬──────────────┬────────────────────┐  │
│  │   Tools     │   Indexing   │   Background Jobs  │  │
│  │  (4 tools)  │  Pipeline    │   (Bull + Redis)   │  │
│  └─────────────┴──────────────┴────────────────────┘  │
└──────────────────────┬──────────────────────────────────┘
                       │
        ┌──────────────┼──────────────┐
        │              │              │
        ▼              ▼              ▼
    Git Repo    Vector Database  Embedding Model
    (local)     (Qdrant/etc)      (OpenAI/Ollama)

Data Flow

  • Indexing Pipeline

    • Extract code units (functions, classes, etc.) using tree-sitter parsers
    • Generate embeddings via selected provider
    • Store in vector database with metadata
    • Track indexing state and checkpoints
  • Query Pipeline

    • Convert query to embedding
    • Perform vector similarity search
    • Re-rank results with BM25/custom rerankers
    • Return top matches with context
  • Background Processing

    • Bull job queue backed by Redis
    • Async job processing with retry logic
    • Failed job persistence and recovery

Supported Languages

  • TypeScript / JavaScript
  • Python
  • Go
  • Java
  • Rust
  • C/C++ (via tree-sitter)
  • Ruby
  • PHP
  • Kotlin
  • Scala
  • Swift
  • Bash
  • Robot Framework

Configuration Examples

# Start Qdrant (requires Docker)
docker run -p 6333:6333 qdrant/qdrant

# Set environment variables
export VECTOR_DB_PROVIDER=qdrant
export QDRANT_URL=http://localhost:6333
export EMBEDDING_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDING_MODEL=nomic-embed-text

# Start server
npx @devpuccino/mcp-git-codebase

Production with Pinecone

# Set environment variables
export VECTOR_DB_PROVIDER=pinecone
export PINECONE_API_KEY=your-production-key
export PINECONE_ENVIRONMENT=your-environment
export PINECONE_INDEX=prod-codebase
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Start server
npx @devpuccino/mcp-git-codebase

PostgreSQL with pgvector

# Set environment variables
export VECTOR_DB_PROVIDER=postgres
export DATABASE_URL=postgresql://user:password@host:5432/codebase_db
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Start server
npx @devpuccino/mcp-git-codebase

Performance Considerations

  • Embedding Generation: Largest cost factor (~100ms per code unit)
  • Vector Search: Sub-100ms for typical queries
  • Code Extraction: ~50-200ms per file depending on size
  • Indexing Speed: ~1000-2000 code units per minute

Optimization Tips:

  • Use background=true for large codebases
  • Set appropriate CONSUMER_CONCURRENCY based on resources
  • Implement incremental indexing via update_codebase
  • Filter by file_extensions to reduce scope
  • Use higher similarity_threshold if too many results

Troubleshooting

Connection Issues

# Verify vector database is running
curl http://localhost:6333/health  # Qdrant
curl http://localhost:8000/api/v1/heartbeat  # Chroma

# Check logs
export LOG_LEVEL=debug
npx @devpuccino/mcp-git-codebase

Embedding Model Issues

# Verify Ollama is running
curl http://localhost:11434/api/tags

# Or verify OpenAI API key
echo $OPENAI_API_KEY

Out of Memory

  • Reduce CONSUMER_CONCURRENCY
  • Process smaller repositories first
  • Enable background=true for large syncs

Development

# Clone repository
git clone https://github.com/devpuccino/mcp-git-codebase.git
cd mcp-git-codebase

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

# Start in development mode
npm run dev

License

MIT

Support

For issues, questions, or feature requests, please visit the GitHub repository.

Keywords

mcp

FAQs

Package last updated on 24 Mar 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts