QMD - Query Markup Documents
An on-device search engine for everything you need to remember. Index your markdown notes, meeting transcripts, documentation, and knowledge bases. Search with keywords or natural language. Ideal for your agentic flows.
QMD combines BM25 full-text search, vector semantic search, and LLM re-ranking—all running locally via node-llama-cpp with GGUF models.

You can read more about QMD's progress in the CHANGELOG.
Quick Start
npm install -g @tobilu/qmd
bun install -g @tobilu/qmd
npx @tobilu/qmd ...
bunx @tobilu/qmd ...
qmd collection add ~/notes --name notes
qmd collection add ~/Documents/meetings --name meetings
qmd collection add ~/work/docs --name docs
qmd context add qmd://notes "Personal notes and ideas"
qmd context add qmd://meetings "Meeting transcripts and notes"
qmd context add qmd://docs "Work documentation"
qmd embed
qmd search "project timeline"
qmd vsearch "how to deploy"
qmd query "quarterly planning process"
qmd get "meetings/2024-01-15.md"
qmd get "#abc123"
qmd multi-get "journals/2025-05*.md"
qmd search "API" -c notes
qmd search "API" --all --files --min-score 0.3
Using with AI Agents
QMD's --json and --files output formats are designed for agentic workflows:
qmd search "authentication" --json -n 10
qmd query "error handling" --all --files --min-score 0.4
qmd get "docs/api-reference.md" --full
MCP Server
Although the tool works perfectly fine when you just tell your agent to use it on the command line, it also exposes an MCP (Model Context Protocol) server for tighter integration.
Tools exposed:
query — Search with typed sub-queries (lex/vec/hyde), combined via RRF + reranking
get — Retrieve a document by path or docid (with fuzzy matching suggestions)
multi_get — Batch retrieve by glob pattern, comma-separated list, or docids
status — Index health and collection info
Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"qmd": {
"command": "qmd",
"args": ["mcp"]
}
}
}
Claude Code — Install the plugin (recommended):
claude plugin marketplace add tobi/qmd
claude plugin install qmd@qmd
Or configure MCP manually in ~/.claude/settings.json:
{
"mcpServers": {
"qmd": {
"command": "qmd",
"args": ["mcp"]
}
}
}
HTTP Transport
By default, QMD's MCP server uses stdio (launched as a subprocess by each client). For a shared, long-lived server that avoids repeated model loading, use the HTTP transport:
qmd mcp --http
qmd mcp --http --port 8080
qmd mcp --http --daemon
qmd mcp stop
qmd status
The HTTP server exposes two endpoints:
POST /mcp — MCP Streamable HTTP (JSON responses, stateless)
GET /health — liveness check with uptime
LLM models stay loaded in VRAM across requests. Embedding/reranking contexts are disposed after 5 min idle and transparently recreated on the next request (~1s penalty, models remain loaded).
Point any MCP client at http://localhost:8181/mcp to connect.
SDK / Library Usage
Use QMD as a library in your own Node.js or Bun applications.
Installation
npm install @tobilu/qmd
Quick Start
import { createStore } from '@tobilu/qmd'
const store = await createStore({
dbPath: './my-index.sqlite',
config: {
collections: {
docs: { path: '/path/to/docs', pattern: '**/*.md' },
},
},
})
const results = await store.search({ query: "authentication flow" })
console.log(results.map(r => `${r.title} (${Math.round(r.score * 100)}%)`))
await store.close()
Store Creation
createStore() accepts three modes:
import { createStore } from '@tobilu/qmd'
const store = await createStore({
dbPath: './index.sqlite',
config: {
collections: {
docs: { path: '/path/to/docs', pattern: '**/*.md' },
notes: { path: '/path/to/notes' },
},
},
})
const store2 = await createStore({
dbPath: './index.sqlite',
configPath: './qmd.yml',
})
const store3 = await createStore({ dbPath: './index.sqlite' })
Search
The unified search() method handles both simple queries and pre-expanded structured queries:
const results = await store.search({ query: "authentication flow" })
const results2 = await store.search({
query: "rate limiting",
intent: "API throttling and abuse prevention",
collection: "docs",
limit: 5,
minScore: 0.3,
explain: true,
})
const results3 = await store.search({
queries: [
{ type: 'lex', query: '"connection pool" timeout -redis' },
{ type: 'vec', query: 'why do database connections time out under load' },
],
collections: ["docs", "notes"],
})
const fast = await store.search({ query: "auth", rerank: false })
For direct backend access:
const lexResults = await store.searchLex("auth middleware", { limit: 10 })
const vecResults = await store.searchVector("how users log in", { limit: 10 })
const expanded = await store.expandQuery("auth flow", { intent: "user login" })
const results4 = await store.search({ queries: expanded })
Retrieval
const doc = await store.get("docs/readme.md")
const byId = await store.get("#abc123")
if (!("error" in doc)) {
console.log(doc.title, doc.displayPath, doc.context)
}
const body = await store.getDocumentBody("docs/readme.md", {
fromLine: 50,
maxLines: 100,
})
const { docs, errors } = await store.multiGet("docs/**/*.md", {
maxBytes: 20480,
})
Collections
await store.addCollection("myapp", {
path: "/src/myapp",
pattern: "**/*.ts",
ignore: ["node_modules/**", "*.test.ts"],
})
const collections = await store.listCollections()
const defaults = await store.getDefaultCollectionNames()
await store.removeCollection("myapp")
await store.renameCollection("old-name", "new-name")
Context
Context adds descriptive metadata that improves search relevance and is returned alongside results:
await store.addContext("docs", "/api", "REST API reference documentation")
await store.setGlobalContext("Internal engineering documentation")
const contexts = await store.listContexts()
await store.removeContext("docs", "/api")
await store.setGlobalContext(undefined)
Indexing
const result = await store.update({
collections: ["docs"],
onProgress: ({ collection, file, current, total }) => {
console.log(`[${collection}] ${current}/${total} ${file}`)
},
})
const embedResult = await store.embed({
force: false,
chunkStrategy: "auto",
onProgress: ({ current, total, collection }) => {
console.log(`Embedding ${current}/${total}`)
},
})
Types
Key types exported for SDK consumers:
import type {
QMDStore,
SearchOptions,
LexSearchOptions,
VectorSearchOptions,
HybridQueryResult,
SearchResult,
ExpandedQuery,
DocumentResult,
DocumentNotFound,
MultiGetResult,
UpdateProgress,
UpdateResult,
EmbedProgress,
EmbedResult,
StoreOptions,
CollectionConfig,
IndexStatus,
IndexHealthInfo,
} from '@tobilu/qmd'
Utility exports:
import {
extractSnippet,
addLineNumbers,
DEFAULT_MULTI_GET_MAX_BYTES,
Maintenance,
} from '@tobilu/qmd'
Lifecycle
await store.close()
The SDK requires explicit dbPath — no defaults are assumed. This makes it safe to embed in any application without side effects.
Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ QMD Hybrid Search Pipeline │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ User Query │
└────────┬────────┘
│
┌──────────────┴──────────────┐
▼ ▼
┌────────────────┐ ┌────────────────┐
│ Query Expansion│ │ Original Query│
│ (fine-tuned) │ │ (×2 weight) │
└───────┬────────┘ └───────┬────────┘
│ │
│ 2 alternative queries │
└──────────────┬──────────────┘
│
┌───────────────────────┼───────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Original Query │ │ Expanded Query 1│ │ Expanded Query 2│
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
┌───────┴───────┐ ┌───────┴───────┐ ┌───────┴───────┐
▼ ▼ ▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│ BM25 │ │Vector │ │ BM25 │ │Vector │ │ BM25 │ │Vector │
│(FTS5) │ │Search │ │(FTS5) │ │Search │ │(FTS5) │ │Search │
└───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘
│ │ │ │ │ │
└───────┬───────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────────────┼───────────────────────┘
│
▼
┌───────────────────────┐
│ RRF Fusion + Bonus │
│ Original query: ×2 │
│ Top-rank bonus: +0.05│
│ Top 30 Kept │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ LLM Re-ranking │
│ (qwen3-reranker) │
│ Yes/No + logprobs │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ Position-Aware Blend │
│ Top 1-3: 75% RRF │
│ Top 4-10: 60% RRF │
│ Top 11+: 40% RRF │
└───────────────────────┘
Score Normalization & Fusion
Search Backends
| FTS (BM25) | SQLite FTS5 BM25 | Math.abs(score) | 0 to ~25+ |
| Vector | Cosine distance | 1 / (1 + distance) | 0.0 to 1.0 |
| Reranker | LLM 0-10 rating | score / 10 | 0.0 to 1.0 |
Fusion Strategy
The query command uses Reciprocal Rank Fusion (RRF) with position-aware blending:
- Query Expansion: Original query (×2 for weighting) + 1 LLM variation
- Parallel Retrieval: Each query searches both FTS and vector indexes
- RRF Fusion: Combine all result lists using
score = Σ(1/(k+rank+1)) where k=60
- Top-Rank Bonus: Documents ranking #1 in any list get +0.05, #2-3 get +0.02
- Top-K Selection: Take top 30 candidates for reranking
- Re-ranking: LLM scores each document (yes/no with logprobs confidence)
- Position-Aware Blending:
- RRF rank 1-3: 75% retrieval, 25% reranker (preserves exact matches)
- RRF rank 4-10: 60% retrieval, 40% reranker
- RRF rank 11+: 40% retrieval, 60% reranker (trust reranker more)
Why this approach: Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from destroying high-confidence retrieval results.
Score Interpretation
| 0.8 - 1.0 | Highly relevant |
| 0.5 - 0.8 | Moderately relevant |
| 0.2 - 0.5 | Somewhat relevant |
| 0.0 - 0.2 | Low relevance |
Requirements
System Requirements
- Node.js >= 22
- Bun >= 1.0.0
- macOS: Homebrew SQLite (for extension support)
brew install sqlite
GGUF Models (via node-llama-cpp)
QMD uses three local GGUF models (auto-downloaded on first use):
embeddinggemma-300M-Q8_0 | Vector embeddings (default) | ~300MB |
qwen3-reranker-0.6b-q8_0 | Re-ranking | ~640MB |
qmd-query-expansion-1.7B-q4_k_m | Query expansion (fine-tuned) | ~1.1GB |
Models are downloaded from HuggingFace and cached in ~/.cache/qmd/models/.
Custom Embedding Model
Override the default embedding model via the QMD_EMBED_MODEL environment variable.
This is useful for multilingual corpora (e.g. Chinese, Japanese, Korean) where
embeddinggemma-300M has limited coverage.
export QMD_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf"
qmd embed -f
Supported model families:
- embeddinggemma (default) — English-optimized, small footprint
- Qwen3-Embedding — Multilingual (119 languages including CJK), MTEB top-ranked
Note: When switching embedding models, you must re-index with qmd embed -f
since vectors are not cross-compatible between models. The prompt format is
automatically adjusted for each model family.
Installation
npm install -g @tobilu/qmd
bun install -g @tobilu/qmd
Development
git clone https://github.com/tobi/qmd
cd qmd
npm install
npm link
Usage
Collection Management
qmd collection add . --name myproject
qmd collection add ~/Documents/notes --name notes --mask "**/*.md"
qmd collection list
qmd collection remove myproject
qmd collection rename myproject my-project
qmd ls notes
qmd ls notes/subfolder
Generate Vector Embeddings
qmd embed
qmd embed -f
qmd embed --chunk-strategy auto
qmd query "auth flow" --chunk-strategy auto
AST-aware chunking (--chunk-strategy auto) uses tree-sitter to chunk code
files at function, class, and import boundaries instead of arbitrary text
positions. This produces higher-quality chunks and better search results for
codebases. Markdown and other file types always use regex-based chunking
regardless of strategy.
The default is regex (existing behavior). Use --chunk-strategy auto to
opt in. Run qmd status to verify which grammars are available.
Note: Tree-sitter grammars are optional dependencies. If they are not
installed, --chunk-strategy auto falls back to regex-only chunking
automatically. Tested on both Node.js and Bun.
Context Management
Context adds descriptive metadata to collections and paths, helping search understand your content.
qmd context add qmd://notes "Personal notes and ideas"
qmd context add qmd://docs/api "API documentation"
cd ~/notes && qmd context add "Personal notes and ideas"
cd ~/notes/work && qmd context add "Work-related notes"
qmd context add / "Knowledge base for my projects"
qmd context list
qmd context rm qmd://notes/old
Search Commands
┌──────────────────────────────────────────────────────────────────┐
│ Search Modes │
├──────────┬───────────────────────────────────────────────────────┤
│ search │ BM25 full-text search only │
│ vsearch │ Vector semantic search only │
│ query │ Hybrid: FTS + Vector + Query Expansion + Re-ranking │
└──────────┴───────────────────────────────────────────────────────┘
qmd search "authentication flow"
qmd vsearch "how to login"
qmd query "user authentication"
Options
-n <num>
-c, --collection
--all
--min-score <num>
--full
--line-numbers
--explain
--index <name>
--files
--json
--csv
--md
--xml
qmd get <file>[:line]
-l <num>
--from <num>
-l <num>
--max-bytes <num>
Output Format
Default output is colorized CLI format (respects NO_COLOR env).
When stdout is a TTY, result paths are emitted as clickable terminal hyperlinks (OSC 8). Clicking a path opens the file in your editor using an editor URI template.
When stdout is not a TTY (for example piped to another command or redirected to a file), QMD emits plain text paths with no escape sequences.
TTY example:
docs/guide.md:42 #a1b2c3
Title: Software Craftsmanship
Context: Work documentation
Score: 93%
This section covers the **craftsmanship** of building
quality software with attention to detail.
See also: engineering principles
notes/meeting.md:15 #d4e5f6
Title: Q4 Planning
Context: Personal notes and ideas
Score: 67%
Discussion about code quality and craftsmanship
in the development process.
Configure the editor link target with QMD_EDITOR_URI (or editor_uri in config):
export QMD_EDITOR_URI="vscode://file/{path}:{line}:{col}"
export QMD_EDITOR_URI="cursor://file/{path}:{line}:{col}"
export QMD_EDITOR_URI="zed://file/{path}:{line}:{col}"
export QMD_EDITOR_URI="subl://open?url=file://{path}&line={line}"
Template placeholders:
-
{path} absolute filesystem path (URI-encoded)
-
{line} 1-based line number
-
{col} or {column} 1-based column number
-
Path: Collection-relative path (e.g., docs/guide.md)
-
Docid: Short hash identifier (e.g., #a1b2c3) - use with qmd get #a1b2c3
-
Title: Extracted from document (first heading or filename)
-
Context: Path context if configured via qmd context add
-
Score: Color-coded (green >70%, yellow >40%, dim otherwise)
-
Snippet: Context around match with query terms highlighted
Examples
qmd query -n 10 --min-score 0.3 "API design patterns"
qmd search --md --full "error handling"
qmd query --json "quarterly reports"
qmd query --json --explain "quarterly reports"
qmd --index work search "quarterly reports"
Index Maintenance
qmd status
qmd update
qmd update --pull
qmd get notes/meeting.md
qmd get "#abc123"
qmd get notes/meeting.md:50 -l 100
qmd multi-get "journals/2025-05*.md"
qmd multi-get "doc1.md, doc2.md, #abc123"
qmd multi-get "docs/*.md" --max-bytes 20480
qmd multi-get "docs/*.md" --json
qmd cleanup
Data Storage
Index stored in: ~/.cache/qmd/index.sqlite
Schema
collections
path_contexts
documents
documents_fts
content_vectors
vectors_vec
llm_cache
Environment Variables
XDG_CACHE_HOME | ~/.cache | Cache directory location |
QMD_LLAMA_GPU | auto | Force llama.cpp GPU backend (metal, vulkan, cuda) or disable GPU with false |
QMD_FORCE_CPU | unset | Set to 1/true to force CPU mode before any CUDA/Vulkan/Metal probing. Equivalent CLI flag: --no-gpu. |
QMD_EMBED_PARALLELISM | automatic | Override embedding/reranking context parallelism (1-8). Windows CUDA defaults to 1 because parallel CUDA contexts can crash with ggml-cuda.cu:98; use Vulkan or raise this only if your driver is stable. |
How It Works
Indexing Flow
Collection ──► Glob Pattern ──► Markdown Files ──► Parse Title ──► Hash Content
│ │ │
│ │ ▼
│ │ Generate docid
│ │ (6-char hash)
│ │ │
└──────────────────────────────────────────────────►└──► Store in SQLite
│
▼
FTS5 Index
Embedding Flow
Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:
Document ──► Smart Chunk (~900 tokens) ──► Format each chunk ──► node-llama-cpp ──► Store Vectors
│ "title | text" embedBatch()
│
└─► Chunks stored with:
- hash: document hash
- seq: chunk sequence (0, 1, 2...)
- pos: character position in original
Smart Chunking
Instead of cutting at hard token boundaries, QMD uses a scoring algorithm to find natural markdown break points. This keeps semantic units (sections, paragraphs, code blocks) together.
Break Point Scores:
# Heading | 100 | H1 - major section |
## Heading | 90 | H2 - subsection |
### Heading | 80 | H3 |
#### Heading | 70 | H4 |
##### Heading | 60 | H5 |
###### Heading | 50 | H6 |
``` | 80 | Code block boundary |
--- / *** | 60 | Horizontal rule |
| Blank line | 20 | Paragraph boundary |
- item / 1. item | 5 | List item |
| Line break | 1 | Minimal break |
Algorithm:
- Scan document for all break points with scores
- When approaching the 900-token target, search a 200-token window before the cutoff
- Score each break point:
finalScore = baseScore × (1 - (distance/window)² × 0.7)
- Cut at the highest-scoring break point
The squared distance decay means a heading 200 tokens back (score ~30) still beats a simple line break at the target (score 1), but a closer heading wins over a distant one.
Code Fence Protection: Break points inside code blocks are ignored—code stays together. If a code block exceeds the chunk size, it's kept whole when possible.
AST-Aware Chunking (Code Files):
For supported code files, QMD also parses the source with tree-sitter and adds AST-derived break points that are merged with the regex scores above:
| Class / interface / struct / impl / trait | 100 | All |
| Function / method | 90 | All |
| Type alias / enum | 80 | All |
| Import / use declaration | 60 | All |
Supported for .ts, .tsx, .js, .jsx, .py, .go, and .rs files. Enable with --chunk-strategy auto. Markdown and other file types always use regex chunking.
Query Flow (Hybrid)
Query ──► LLM Expansion ──► [Original, Variant 1, Variant 2]
│
┌─────────┴─────────┐
▼ ▼
For each query: FTS (BM25)
│ │
▼ ▼
Vector Search Ranked List
│
▼
Ranked List
│
└─────────┬─────────┘
▼
RRF Fusion (k=60)
Original query ×2 weight
Top-rank bonus: +0.05/#1, +0.02/#2-3
│
▼
Top 30 candidates
│
▼
LLM Re-ranking
(yes/no + logprob confidence)
│
▼
Position-Aware Blend
Rank 1-3: 75% RRF / 25% reranker
Rank 4-10: 60% RRF / 40% reranker
Rank 11+: 40% RRF / 60% reranker
│
▼
Final Results
Model Configuration
Models are configured in src/llm.ts as HuggingFace URIs:
const DEFAULT_EMBED_MODEL = "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf";
const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf";
const DEFAULT_GENERATE_MODEL = "hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf";
EmbeddingGemma Prompt Format
// For queries
"task: search result | query: {query}"
// For documents
"title: {title} | text: {content}"
Qwen3-Reranker
Uses node-llama-cpp's createRankingContext() and rankAndSort() API for cross-encoder reranking. Returns documents sorted by relevance score (0.0 - 1.0).
Qwen3 (Query Expansion)
Used for generating query variations via LlamaChatSession.
License
MIT