Socket
Book a DemoInstallSign in
Socket

mongodocs-mcp

Package Overview
Dependencies
Maintainers
1
Versions
54
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

mongodocs-mcp

Transform any GitHub repository into searchable vector embeddings. MCP server with smart indexing, voyage-context-3 embeddings, and semantic search for Claude/Cursor IDEs.

latest
Source
npmnpm
Version
11.0.0
Version published
Weekly downloads
223
227.94%
Maintainers
1
Weekly downloads
 
Created
Source

mongodocs-mcp

A Model Context Protocol (MCP) server that transforms any GitHub repository into searchable vector embeddings, enabling semantic search across codebases and documentation through IDE integration.

Architecture

The system implements a three-phase indexing pipeline with smart change detection:

Repository → Git Clone → Smart Chunking → Vector Embeddings → MongoDB Atlas
                ↓              ↓                ↓                    ↓
           Hash Tracking   Semantic Split   voyage-context-3    Vector Search

Core Components

  • Indexer (src/core/indexer.ts): Git-based change detection using commit hashes
  • Semantic Chunker (src/core/semantic-chunker.ts): Multi-strategy content splitting
  • Embedding Service (src/core/embeddings.ts): Voyage AI integration with batching
  • Storage Service (src/core/storage.ts): MongoDB Atlas vector operations
  • Search Service (src/core/search.ts): Vector, hybrid RRF, and MMR algorithms
  • MCP Server (src/index.ts): Protocol implementation for IDE integration

Installation

Global Package

npm install -g mongodocs-mcp

From Source

git clone https://github.com/yourusername/mongodocs-mcp.git
cd mongodocs-mcp
npm install
npm run build
npm link

Setup

1. MongoDB Atlas

Create free M0 cluster at cloud.mongodb.com:

# Database structure
Database: mongodb_semantic_docs
Collection: documents

# Connection string format
mongodb+srv://username:password@cluster.mongodb.net/?retryWrites=true&w=majority

Network Access Configuration:

  • Navigate to Network Access → Add IP Address
  • Add 0.0.0.0/0 for development (restrict in production)

Vector Search Index Creation:

  • Go to Atlas Search → Create Index
  • Select "JSON Editor"
  • Paste configuration:
{
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "type": "knnVector",
        "dimensions": 1024,
        "similarity": "cosine"
      }
    }
  }
}

Name: vector_index

2. Voyage AI

Get API key from voyageai.com:

  • Model: voyage-context-3
  • Dimensions: 1024
  • Context window: 32,000 tokens
  • Rate limit: 2000 RPM

3. Environment Configuration

Create .env file:

# Required
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/?retryWrites=true&w=majority
VOYAGE_API_KEY=pa-your-api-key

# Optional
GITHUB_TOKEN=ghp_your_token  # For private repos

Usage

Web Interface

# Start web UI
npm run web

# Opens http://localhost:3000
# 4-step wizard:
# 1. Configure APIs
# 2. Select repositories
# 3. Review MCP setup
# 4. Start processing

Command Line

# Index repositories (smart mode - only changed files)
npm run index

# Force complete rebuild
npm run rebuild

# Monitor indexing progress
npm run progress

# Database statistics
npm run stats

# Clean database
npm run clean

Programmatic API

import { Indexer } from 'mongodocs-mcp';

const config = {
  repositories: [{
    name: 'My Documentation',
    repo: 'owner/repository',
    branch: 'main',
    product: 'custom-my-docs'
  }],
  embedding: {
    model: 'voyage-context-3',
    dimensions: 1024,
    chunkSize: 1000,
    chunkOverlap: 200
  }
};

const indexer = new Indexer(config);
indexer.onProgress((progress) => {
  console.log(`${progress.phase}: ${progress.current}/${progress.total}`);
});
await indexer.index();

MCP Integration

Claude Desktop

File: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "mongodocs": {
      "command": "npx",
      "args": ["mongodocs-mcp"],
      "env": {
        "MONGODB_URI": "your-connection-string",
        "VOYAGE_API_KEY": "your-api-key"
      }
    }
  }
}

Cursor IDE

File: .cursor/mcp_settings.json

{
  "mcpServers": {
    "mongodocs": {
      "command": "npx",
      "args": ["mongodocs-mcp"],
      "env": {
        "MONGODB_URI": "your-connection-string",
        "VOYAGE_API_KEY": "your-api-key"
      }
    }
  }
}

Restart IDE after configuration.

Search Methods

1. Hybrid RRF Search (Primary)

Reciprocal Rank Fusion combining vector and keyword search:

// Weight configuration
vectorWeight: 0.7
keywordWeight: 0.3

// Ranking formula
score = 1 / (k + rank) where k = 60

2. MMR Search (Diversity)

Maximum Marginal Relevance for result diversity:

// Parameters
fetchK: 20        // Initial candidates
lambdaMult: 0.7   // Relevance vs diversity
limit: 5          // Final results

// Algorithm
MMR = λ * Sim(Di, Q) - (1-λ) * max Sim(Di, Dj)

Cosine similarity search:

// Configuration
numCandidates: 40  // 7.5x faster than default 300
limit: 10

Technical Implementation

Semantic Chunking

Three-strategy approach with statistical analysis:

1. Interquartile Method

// Calculate sentence distances
distances = sentences.map(embed).map(cosineDistance)
// Find breakpoints at quartile boundaries
Q1, Q3 = quartiles(distances)
threshold = Q3 + 1.5 * (Q3 - Q1)

2. Gradient Method

// Identify semantic transitions
gradients = distances.map(derivative)
breakpoints = gradients.filter(g => g > threshold)

3. Hybrid Scoring

score = 0.6 * interquartile + 0.4 * gradient
// Adaptive to content type

Chunk Optimization

const CHUNK_CONFIG = {
  base: 1000,      // Target size
  min: 100,        // Prevent empty
  max: 2500,       // Respect limits
  overlap: 200,    // Context preservation
  
  // Token validation
  maxTokens: 6000,  // voyage-context-3 safety
  tokenizer: 'cl100k_base'
};

Smart Indexing

Repository state tracking:

// Check existing hash
const existingHash = await storage.getRepositoryHash(repo.name);
const currentHash = await git.getLatestCommit();

if (existingHash === currentHash) {
  console.log('✅ Repository up to date, skipping...');
  return;
}

// Process only changed files
const changedFiles = await git.diff(existingHash, currentHash);
await processFiles(changedFiles);
await storage.updateRepositoryHash(repo.name, currentHash);

Error Handling

// Exponential backoff with jitter
const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
const jitter = Math.random() * 1000;
await sleep(delay + jitter);

// Token limit handling
if (error.message.includes('32000 tokens')) {
  // Split chunk and retry
  const subChunks = emergencySplit(chunk);
  return processSubChunks(subChunks);
}

Performance Characteristics

Indexing Metrics

  • Processing rate: 100-150 docs/hour (Voyage API limited)
  • Batch size: 32 documents optimal
  • Memory usage: <500MB peak
  • Network bandwidth: ~10MB/hour

Search Performance

  • Latency: <100ms p99
  • Throughput: 1000+ QPS
  • Index size: ~1.5KB per chunk
  • Cache TTL: 5 minutes

Storage Efficiency

// Document structure (avg 1.5KB)
{
  _id: ObjectId,
  title: string,           // 50 bytes
  content: string,         // 1000 bytes
  embedding: float[1024],  // 4KB compressed
  metadata: {              // 200 bytes
    file: string,
    repo: string,
    product: string,
    indexedAt: Date
  }
}

Repository Configuration

Default Repositories

const repositories = [
  {
    name: 'MongoDB Documentation',
    repo: 'mongodb/docs',
    branch: 'master',
    product: 'mongodb-docs',
    priority: 10
  },
  // Add custom repositories...
];

Custom Repository

{
  name: 'Your Documentation',
  repo: 'owner/repository',
  branch: 'main',
  product: 'custom-your-docs',
  
  // Optional filters
  include: ['docs/**/*.md'],
  exclude: ['**/node_modules/**'],
  
  // Processing options
  chunkSize: 1500,
  chunkOverlap: 300
}

Development

Build Pipeline

# Development with watch
npm run dev

# Production build
npm run build

# Type checking
npm run typecheck

# Linting
npm run lint

# Testing
npm test

Project Structure

src/
├── core/
│   ├── indexer.ts           # Orchestration
│   ├── semantic-chunker.ts  # Content splitting
│   ├── embeddings.ts        # Vector generation
│   ├── storage.ts           # Database operations
│   └── search.ts            # Query algorithms
├── config/
│   └── index.ts             # Repository definitions
├── web/
│   ├── server.ts            # Express server
│   ├── coordinator.ts       # Web orchestration
│   └── templates/           # HTML interfaces
└── index.ts                 # MCP server

dist/                        # Compiled output
.repos/                      # Cloned repositories

Key Dependencies

{
  "mongodb": "^6.10.0",           // Native driver
  "voyageai": "^0.0.1-5",         // Embeddings
  "@modelcontextprotocol/sdk": "^1.0.0",  // MCP
  "js-tiktoken": "^1.0.15",       // Tokenization
  "simple-git": "^3.27.0"         // Repository ops
}

Troubleshooting

Connection Issues

# Test MongoDB connection
node -e "
  const { MongoClient } = require('mongodb');
  MongoClient.connect(process.env.MONGODB_URI)
    .then(() => console.log('✅ Connected'))
    .catch(err => console.error('❌', err.message));
"

# Test Voyage AI
curl -X POST https://api.voyageai.com/v1/embeddings \
  -H "Authorization: Bearer $VOYAGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": ["test"], "model": "voyage-context-3"}'

Index Issues

# Verify vector index
mongosh $MONGODB_URI --eval "
  db.documents.getSearchIndexes()
"

# Check document structure
mongosh $MONGODB_URI --eval "
  db.documents.findOne()
"

Performance Tuning

// Adjust for your use case
const tuning = {
  // Smaller batches for memory constraints
  batchSize: 16,
  
  // More candidates for precision
  numCandidates: 100,
  
  // Larger chunks for context
  chunkSize: 2000,
  
  // Disable for speed
  smartIndexing: false
};

Best Practices

Security

  • Store credentials in environment variables
  • Use least-privilege MongoDB user
  • Rotate API keys regularly
  • Enable MongoDB audit logging

Optimization

  • Index during off-peak hours
  • Use incremental updates
  • Monitor token usage
  • Cache frequent queries

Scaling

  • Horizontal sharding for large corpuses
  • Read replicas for search traffic
  • CDN for static assets
  • Queue system for processing

Contributing

Pull requests welcome. Please ensure:

  • TypeScript strict mode compliance
  • Test coverage >80%
  • Conventional commits
  • Documentation updates

License

MIT

Support

Built with MongoDB Atlas vector search and Voyage AI embeddings.

Keywords

mcp

FAQs

Package last updated on 19 Aug 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.