
Product
Introducing Tier 1 Reachability: Precision CVE Triage for Enterprise Teams
Socket’s new Tier 1 Reachability filters out up to 80% of irrelevant CVEs, so security teams can focus on the vulnerabilities that matter.
mongodocs-mcp
Advanced tools
Transform any GitHub repository into searchable vector embeddings. MCP server with smart indexing, voyage-context-3 embeddings, and semantic search for Claude/Cursor IDEs.
A Model Context Protocol (MCP) server that transforms any GitHub repository into searchable vector embeddings, enabling semantic search across codebases and documentation through IDE integration.
The system implements a three-phase indexing pipeline with smart change detection:
Repository → Git Clone → Smart Chunking → Vector Embeddings → MongoDB Atlas
↓ ↓ ↓ ↓
Hash Tracking Semantic Split voyage-context-3 Vector Search
src/core/indexer.ts
): Git-based change detection using commit hashessrc/core/semantic-chunker.ts
): Multi-strategy content splittingsrc/core/embeddings.ts
): Voyage AI integration with batchingsrc/core/storage.ts
): MongoDB Atlas vector operationssrc/core/search.ts
): Vector, hybrid RRF, and MMR algorithmssrc/index.ts
): Protocol implementation for IDE integrationnpm install -g mongodocs-mcp
git clone https://github.com/yourusername/mongodocs-mcp.git
cd mongodocs-mcp
npm install
npm run build
npm link
Create free M0 cluster at cloud.mongodb.com:
# Database structure
Database: mongodb_semantic_docs
Collection: documents
# Connection string format
mongodb+srv://username:password@cluster.mongodb.net/?retryWrites=true&w=majority
Network Access Configuration:
0.0.0.0/0
for development (restrict in production)Vector Search Index Creation:
{
"mappings": {
"dynamic": true,
"fields": {
"embedding": {
"type": "knnVector",
"dimensions": 1024,
"similarity": "cosine"
}
}
}
}
Name: vector_index
Get API key from voyageai.com:
voyage-context-3
Create .env
file:
# Required
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/?retryWrites=true&w=majority
VOYAGE_API_KEY=pa-your-api-key
# Optional
GITHUB_TOKEN=ghp_your_token # For private repos
# Start web UI
npm run web
# Opens http://localhost:3000
# 4-step wizard:
# 1. Configure APIs
# 2. Select repositories
# 3. Review MCP setup
# 4. Start processing
# Index repositories (smart mode - only changed files)
npm run index
# Force complete rebuild
npm run rebuild
# Monitor indexing progress
npm run progress
# Database statistics
npm run stats
# Clean database
npm run clean
import { Indexer } from 'mongodocs-mcp';
const config = {
repositories: [{
name: 'My Documentation',
repo: 'owner/repository',
branch: 'main',
product: 'custom-my-docs'
}],
embedding: {
model: 'voyage-context-3',
dimensions: 1024,
chunkSize: 1000,
chunkOverlap: 200
}
};
const indexer = new Indexer(config);
indexer.onProgress((progress) => {
console.log(`${progress.phase}: ${progress.current}/${progress.total}`);
});
await indexer.index();
File: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"mongodocs": {
"command": "npx",
"args": ["mongodocs-mcp"],
"env": {
"MONGODB_URI": "your-connection-string",
"VOYAGE_API_KEY": "your-api-key"
}
}
}
}
File: .cursor/mcp_settings.json
{
"mcpServers": {
"mongodocs": {
"command": "npx",
"args": ["mongodocs-mcp"],
"env": {
"MONGODB_URI": "your-connection-string",
"VOYAGE_API_KEY": "your-api-key"
}
}
}
}
Restart IDE after configuration.
Reciprocal Rank Fusion combining vector and keyword search:
// Weight configuration
vectorWeight: 0.7
keywordWeight: 0.3
// Ranking formula
score = 1 / (k + rank) where k = 60
Maximum Marginal Relevance for result diversity:
// Parameters
fetchK: 20 // Initial candidates
lambdaMult: 0.7 // Relevance vs diversity
limit: 5 // Final results
// Algorithm
MMR = λ * Sim(Di, Q) - (1-λ) * max Sim(Di, Dj)
Cosine similarity search:
// Configuration
numCandidates: 40 // 7.5x faster than default 300
limit: 10
Three-strategy approach with statistical analysis:
1. Interquartile Method
// Calculate sentence distances
distances = sentences.map(embed).map(cosineDistance)
// Find breakpoints at quartile boundaries
Q1, Q3 = quartiles(distances)
threshold = Q3 + 1.5 * (Q3 - Q1)
2. Gradient Method
// Identify semantic transitions
gradients = distances.map(derivative)
breakpoints = gradients.filter(g => g > threshold)
3. Hybrid Scoring
score = 0.6 * interquartile + 0.4 * gradient
// Adaptive to content type
const CHUNK_CONFIG = {
base: 1000, // Target size
min: 100, // Prevent empty
max: 2500, // Respect limits
overlap: 200, // Context preservation
// Token validation
maxTokens: 6000, // voyage-context-3 safety
tokenizer: 'cl100k_base'
};
Repository state tracking:
// Check existing hash
const existingHash = await storage.getRepositoryHash(repo.name);
const currentHash = await git.getLatestCommit();
if (existingHash === currentHash) {
console.log('✅ Repository up to date, skipping...');
return;
}
// Process only changed files
const changedFiles = await git.diff(existingHash, currentHash);
await processFiles(changedFiles);
await storage.updateRepositoryHash(repo.name, currentHash);
// Exponential backoff with jitter
const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
const jitter = Math.random() * 1000;
await sleep(delay + jitter);
// Token limit handling
if (error.message.includes('32000 tokens')) {
// Split chunk and retry
const subChunks = emergencySplit(chunk);
return processSubChunks(subChunks);
}
// Document structure (avg 1.5KB)
{
_id: ObjectId,
title: string, // 50 bytes
content: string, // 1000 bytes
embedding: float[1024], // 4KB compressed
metadata: { // 200 bytes
file: string,
repo: string,
product: string,
indexedAt: Date
}
}
const repositories = [
{
name: 'MongoDB Documentation',
repo: 'mongodb/docs',
branch: 'master',
product: 'mongodb-docs',
priority: 10
},
// Add custom repositories...
];
{
name: 'Your Documentation',
repo: 'owner/repository',
branch: 'main',
product: 'custom-your-docs',
// Optional filters
include: ['docs/**/*.md'],
exclude: ['**/node_modules/**'],
// Processing options
chunkSize: 1500,
chunkOverlap: 300
}
# Development with watch
npm run dev
# Production build
npm run build
# Type checking
npm run typecheck
# Linting
npm run lint
# Testing
npm test
src/
├── core/
│ ├── indexer.ts # Orchestration
│ ├── semantic-chunker.ts # Content splitting
│ ├── embeddings.ts # Vector generation
│ ├── storage.ts # Database operations
│ └── search.ts # Query algorithms
├── config/
│ └── index.ts # Repository definitions
├── web/
│ ├── server.ts # Express server
│ ├── coordinator.ts # Web orchestration
│ └── templates/ # HTML interfaces
└── index.ts # MCP server
dist/ # Compiled output
.repos/ # Cloned repositories
{
"mongodb": "^6.10.0", // Native driver
"voyageai": "^0.0.1-5", // Embeddings
"@modelcontextprotocol/sdk": "^1.0.0", // MCP
"js-tiktoken": "^1.0.15", // Tokenization
"simple-git": "^3.27.0" // Repository ops
}
# Test MongoDB connection
node -e "
const { MongoClient } = require('mongodb');
MongoClient.connect(process.env.MONGODB_URI)
.then(() => console.log('✅ Connected'))
.catch(err => console.error('❌', err.message));
"
# Test Voyage AI
curl -X POST https://api.voyageai.com/v1/embeddings \
-H "Authorization: Bearer $VOYAGE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": ["test"], "model": "voyage-context-3"}'
# Verify vector index
mongosh $MONGODB_URI --eval "
db.documents.getSearchIndexes()
"
# Check document structure
mongosh $MONGODB_URI --eval "
db.documents.findOne()
"
// Adjust for your use case
const tuning = {
// Smaller batches for memory constraints
batchSize: 16,
// More candidates for precision
numCandidates: 100,
// Larger chunks for context
chunkSize: 2000,
// Disable for speed
smartIndexing: false
};
Pull requests welcome. Please ensure:
MIT
Built with MongoDB Atlas vector search and Voyage AI embeddings.
FAQs
Transform any GitHub repository into searchable vector embeddings. MCP server with smart indexing, voyage-context-3 embeddings, and semantic search for Claude/Cursor IDEs.
The npm package mongodocs-mcp receives a total of 200 weekly downloads. As such, mongodocs-mcp popularity was classified as not popular.
We found that mongodocs-mcp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket’s new Tier 1 Reachability filters out up to 80% of irrelevant CVEs, so security teams can focus on the vulnerabilities that matter.
Research
/Security News
Ongoing npm supply chain attack spreads to DuckDB: multiple packages compromised with the same wallet-drainer malware.
Security News
The MCP Steering Committee has launched the official MCP Registry in preview, a central hub for discovering and publishing MCP servers.