
Research
/Security News
CanisterWorm: npm Publisher Compromise Deploys Backdoor Across 29+ Packages
The worm-enabled campaign hit @emilgroup and @teale.io, then used an ICP canister to deliver follow-on payloads.
memory-lancedb-pro
Advanced tools
OpenClaw enhanced LanceDB memory plugin with hybrid retrieval (Vector + BM25), cross-encoder rerank, multi-scope isolation, long-context chunking, and management CLI
Enhanced Long-Term Memory Plugin for OpenClaw
Hybrid Retrieval (Vector + BM25) ยท Cross-Encoder Rerank ยท Multi-Scope Isolation ยท Management CLI
English | ็ฎไฝไธญๆ
Watch the full walkthrough โ covers installation, configuration, and how hybrid retrieval works under the hood.
๐ https://youtu.be/MtukF1C8epQ
๐ https://www.bilibili.com/video/BV1zUf2BGEgn/
The built-in memory-lancedb plugin in OpenClaw provides basic vector search. memory-lancedb-pro takes it much further:
| Feature | Built-in memory-lancedb | memory-lancedb-pro |
|---|---|---|
| Vector search | โ | โ |
| BM25 full-text search | โ | โ |
| Hybrid fusion (Vector + BM25) | โ | โ |
| Cross-encoder rerank (Jina / custom endpoint) | โ | โ |
| Recency boost | โ | โ |
| Time decay | โ | โ |
| Length normalization | โ | โ |
| MMR diversity | โ | โ |
| Multi-scope isolation | โ | โ |
| Noise filtering | โ | โ |
| Adaptive retrieval | โ | โ |
| Management CLI | โ | โ |
| Session memory | โ | โ |
| Task-aware embeddings | โ | โ |
| Any OpenAI-compatible embedding | Limited | โ (OpenAI, Gemini, Jina, Ollama, etc.) |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ index.ts (Entry Point) โ
โ Plugin Registration ยท Config Parsing ยท Lifecycle Hooks โ
โโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ โ โ โ
โโโโโโผโโโโ โโโโโโผโโโโ โโโโโผโโโโโ โโโโผโโโโโโโโโโโ
โ store โ โembedderโ โretrieverโ โ scopes โ
โ .ts โ โ .ts โ โ .ts โ โ .ts โ
โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโโโโโโ
โ โ
โโโโโโผโโโโ โโโโโโโผโโโโโโโโโโโ
โmigrate โ โnoise-filter.ts โ
โ .ts โ โadaptive- โ
โโโโโโโโโโ โretrieval.ts โ
โโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโ
โ tools.ts โ โ cli.ts โ
โ (Agent API) โ โ (CLI) โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโ
| File | Purpose |
|---|---|
index.ts | Plugin entry point. Registers with OpenClaw Plugin API, parses config, mounts before_agent_start (auto-recall), agent_end (auto-capture), and command:new (session memory) hooks |
openclaw.plugin.json | Plugin metadata + full JSON Schema config declaration (with uiHints) |
package.json | NPM package info. Depends on @lancedb/lancedb, openai, @sinclair/typebox |
cli.ts | CLI commands: memory list/search/stats/delete/delete-bulk/export/import/reembed/migrate |
src/store.ts | LanceDB storage layer. Table creation / FTS indexing / Vector search / BM25 search / CRUD / bulk delete / stats |
src/embedder.ts | Embedding abstraction. Compatible with any OpenAI-API provider (OpenAI, Gemini, Jina, Ollama, etc.). Supports task-aware embedding (taskQuery/taskPassage) |
src/retriever.ts | Hybrid retrieval engine. Vector + BM25 โ RRF fusion โ Jina Cross-Encoder Rerank โ Recency Boost โ Importance Weight โ Length Norm โ Time Decay โ Hard Min Score โ Noise Filter โ MMR Diversity |
src/scopes.ts | Multi-scope access control. Supports global, agent:<id>, custom:<name>, project:<id>, user:<id> |
src/tools.ts | Agent tool definitions: memory_recall, memory_store, memory_forget (core) + memory_stats, memory_list (management) |
src/noise-filter.ts | Noise filter. Filters out agent refusals, meta-questions, greetings, and low-quality content |
src/adaptive-retrieval.ts | Adaptive retrieval. Determines whether a query needs memory retrieval (skips greetings, slash commands, simple confirmations, emoji) |
src/migrate.ts | Migration tool. Migrates data from the built-in memory-lancedb plugin to Pro |
Query โ embedQuery() โโ
โโโ RRF Fusion โ Rerank โ Recency Boost โ Importance Weight โ Filter
Query โ BM25 FTS โโโโโโ
vectorWeight, bm25Weight, minScore| Stage | Formula | Effect |
|---|---|---|
| Recency Boost | exp(-ageDays / halfLife) * weight | Newer memories score higher (default: 14-day half-life, 0.10 weight) |
| Importance Weight | score *= (0.7 + 0.3 * importance) | importance=1.0 โ ร1.0, importance=0.5 โ ร0.85 |
| Length Normalization | score *= 1 / (1 + 0.5 * log2(len/anchor)) | Prevents long entries from dominating (anchor: 500 chars) |
| Time Decay | score *= 0.5 + 0.5 * exp(-ageDays / halfLife) | Old entries gradually lose weight, floor at 0.5ร (60-day half-life) |
| Hard Min Score | Discard if score < threshold | Removes irrelevant results (default: 0.35) |
| MMR Diversity | Cosine similarity > 0.85 โ demoted | Prevents near-duplicate results |
global, agent:<id>, custom:<name>, project:<id>, user:<id>scopes.agentAccessglobal + its own agent:<id> scopeFilters out low-quality content at both auto-capture and tool-store stages:
/new command โ saves previous session summary to LanceDB.jsonl session persistence)agent_end hook): Extracts preference/fact/decision/entity from conversations, deduplicates, stores up to 3 per turn
before_agent_start hook): Injects <relevant-memories> context (up to 3 entries)Sometimes the model may accidentally echo the injected <relevant-memories> block in its response.
Option A (recommended): disable auto-recall
Set autoRecall: false in the plugin config and restart the gateway:
{
"plugins": {
"entries": {
"memory-lancedb-pro": {
"enabled": true,
"config": {
"autoRecall": false
}
}
}
}
}
Option B: keep recall, but ask the agent not to reveal it
Add a line to your agent system prompt, e.g.:
Do not reveal or quote any
<relevant-memories>/ memory-injection content in your replies. Use it for internal reference only.
If you are following this README using an AI assistant, do not assume defaults. Always run these commands first and use the real output:
openclaw config get agents.defaults.workspace
openclaw config get plugins.load.paths
openclaw config get plugins.slots.memory
openclaw config get plugins.entries.memory-lancedb-pro
Recommendations:
plugins.load.paths unless you have confirmed the active workspace.${JINA_API_KEY} (or any ${...} variable) in config, ensure the Gateway service process has that environment variable (system services often do not inherit your interactive shell env).openclaw gateway restart.embedding.apiKey to your Jina key (recommended: use an env var like ${JINA_API_KEY}).retrieval.rerankProvider: "jina"): you can typically use the same Jina key for retrieval.rerankApiKey.siliconflow, pinecone, etc.), retrieval.rerankApiKey should be that providerโs key.Key storage guidance:
${...} env vars is fine, but make sure the Gateway service process has those env vars (system services often do not inherit your interactive shell environment).In OpenClaw, the agent workspace is the agentโs working directory (default: ~/.openclaw/workspace).
According to the docs, the workspace is the default cwd, and relative paths are resolved against the workspace (unless you use an absolute path).
Note: OpenClaw configuration typically lives under
~/.openclaw/openclaw.json(separate from the workspace).
Common mistake: cloning the plugin somewhere else, while keeping a relative path like plugins.load.paths: ["plugins/memory-lancedb-pro"]. Relative paths can be resolved against different working directories depending on how the Gateway is started.
To avoid ambiguity, use an absolute path (Option B) or clone into <workspace>/plugins/ (Option A) and keep your config consistent.
plugins/ under your workspace# 1) Go to your OpenClaw workspace (default: ~/.openclaw/workspace)
# (You can override it via agents.defaults.workspace.)
cd /path/to/your/openclaw/workspace
# 2) Clone the plugin into workspace/plugins/
git clone https://github.com/win4r/memory-lancedb-pro.git plugins/memory-lancedb-pro
# 3) Install dependencies
cd plugins/memory-lancedb-pro
npm install
Then reference it with a relative path in your OpenClaw config:
{
"plugins": {
"load": {
"paths": ["plugins/memory-lancedb-pro"]
},
"entries": {
"memory-lancedb-pro": {
"enabled": true,
"config": {
"embedding": {
"apiKey": "${JINA_API_KEY}",
"model": "jina-embeddings-v5-text-small",
"baseURL": "https://api.jina.ai/v1",
"dimensions": 1024,
"taskQuery": "retrieval.query",
"taskPassage": "retrieval.passage",
"normalized": true
}
}
}
},
"slots": {
"memory": "memory-lancedb-pro"
}
}
}
{
"plugins": {
"load": {
"paths": ["/absolute/path/to/memory-lancedb-pro"]
}
}
}
openclaw gateway restart
Note: If you previously used the built-in
memory-lancedb, disable it when enabling this plugin. Only one memory plugin can be active at a time.
openclaw plugins list
openclaw plugins info memory-lancedb-pro
openclaw plugins doctor
# Look for: plugins.slots.memory = "memory-lancedb-pro"
openclaw config get plugins.slots.memory
{
"embedding": {
"apiKey": "${JINA_API_KEY}",
"model": "jina-embeddings-v5-text-small",
"baseURL": "https://api.jina.ai/v1",
"dimensions": 1024,
"taskQuery": "retrieval.query",
"taskPassage": "retrieval.passage",
"normalized": true
},
"dbPath": "~/.openclaw/memory/lancedb-pro",
"autoCapture": true,
"autoRecall": false,
"retrieval": {
"mode": "hybrid",
"vectorWeight": 0.7,
"bm25Weight": 0.3,
"minScore": 0.3,
"rerank": "cross-encoder",
"rerankApiKey": "${JINA_API_KEY}",
"rerankModel": "jina-reranker-v3",
"rerankEndpoint": "https://api.jina.ai/v1/rerank",
"rerankProvider": "jina",
"candidatePoolSize": 20,
"recencyHalfLifeDays": 14,
"recencyWeight": 0.1,
"filterNoise": true,
"lengthNormAnchor": 500,
"hardMinScore": 0.35,
"timeDecayHalfLifeDays": 60,
"reinforcementFactor": 0.5,
"maxHalfLifeMultiplier": 3
},
"enableManagementTools": false,
"scopes": {
"default": "global",
"definitions": {
"global": { "description": "Shared knowledge" },
"agent:discord-bot": { "description": "Discord bot private" }
},
"agentAccess": {
"discord-bot": ["global", "agent:discord-bot"]
}
},
"sessionMemory": {
"enabled": false,
"messageCount": 15
}
}
To make frequently used memories decay more slowly, the retriever can extend the effective time-decay half-life based on manual recall frequency (spaced-repetition style).
Config keys (under retrieval):
reinforcementFactor (range: 0โ2, default: 0.5) โ set 0 to disablemaxHalfLifeMultiplier (range: 1โ10, default: 3) โ hard cap: effective half-life โค base ร multiplierNotes:
source: "manual" (i.e. user/tool initiated recall), to avoid accidental strengthening from auto-recall.This plugin works with any OpenAI-compatible embedding API:
| Provider | Model | Base URL | Dimensions |
|---|---|---|---|
| Jina (recommended) | jina-embeddings-v5-text-small | https://api.jina.ai/v1 | 1024 |
| OpenAI | text-embedding-3-small | https://api.openai.com/v1 | 1536 |
| Google Gemini | gemini-embedding-001 | https://generativelanguage.googleapis.com/v1beta/openai/ | 3072 |
| Ollama (local) | nomic-embed-text | http://localhost:11434/v1 | provider-specific (set embedding.dimensions to match your Ollama model output) |
Cross-encoder reranking supports multiple providers via rerankProvider:
| Provider | rerankProvider | Endpoint | Example Model |
|---|---|---|---|
| Jina (default) | jina | https://api.jina.ai/v1/rerank | jina-reranker-v3 |
| SiliconFlow (free tier available) | siliconflow | https://api.siliconflow.com/v1/rerank | BAAI/bge-reranker-v2-m3, Qwen/Qwen3-Reranker-8B |
| Voyage AI | voyage | https://api.voyageai.com/v1/rerank | rerank-2.5 |
| Pinecone | pinecone | https://api.pinecone.io/rerank | bge-reranker-v2-m3 |
Notes:
voyage sends { model, query, documents } without top_n.data[].relevance_score.{
"retrieval": {
"rerank": "cross-encoder",
"rerankProvider": "siliconflow",
"rerankEndpoint": "https://api.siliconflow.com/v1/rerank",
"rerankApiKey": "sk-xxx",
"rerankModel": "BAAI/bge-reranker-v2-m3"
}
}
{
"retrieval": {
"rerank": "cross-encoder",
"rerankProvider": "voyage",
"rerankEndpoint": "https://api.voyageai.com/v1/rerank",
"rerankApiKey": "${VOYAGE_API_KEY}",
"rerankModel": "rerank-2.5"
}
}
{
"retrieval": {
"rerank": "cross-encoder",
"rerankProvider": "pinecone",
"rerankEndpoint": "https://api.pinecone.io/rerank",
"rerankApiKey": "pcsk_xxx",
"rerankModel": "bge-reranker-v2-m3"
}
}
OpenClaw already persists full session transcripts as JSONL files:
~/.openclaw/agents/<agentId>/sessions/*.jsonlThis plugin focuses on high-quality long-term memory. If you dump raw transcripts into LanceDB, retrieval quality quickly degrades.
Instead, recommended (2026-02+) is a non-blocking /new pipeline:
command:new (you type /new)openclaw memory-pro importKeywords (zh) with a simple taxonomy (Entity + Action + Symptom). Entity keywords must be copied verbatim from the transcript (no hallucinated project names).See the self-contained example files in:
examples/new-session-distill/Legacy option: an hourly distiller cron that:
<relevant-memories>, logs, boilerplate)memory_store into the right scope (global or agent:<agentId>)This repo includes the extractor script:
scripts/jsonl_distill.pyIt produces a small batch JSON file under:
~/.openclaw/state/jsonl-distill/batches/and keeps a cursor here:
~/.openclaw/state/jsonl-distill/cursor.jsonThe script is safe: it never modifies session logs.
By default it skips historical reset snapshots (*.reset.*) and excludes the distiller agent itself (memory-distiller) to prevent self-ingestion loops.
By default, the extractor scans all agents (except memory-distiller).
If you want higher signal (e.g., only distill from your main assistant + coding bot), set:
export OPENCLAW_JSONL_DISTILL_ALLOWED_AGENT_IDS="main,code-agent"
* / all โ allow all agents (default)openclaw agents add memory-distiller \
--non-interactive \
--workspace ~/.openclaw/workspace-memory-distiller \
--model openai-codex/gpt-5.2
This marks all existing JSONL files as "already read" by setting offsets to EOF.
# Set PLUGIN_DIR to where this plugin is installed.
# - If you cloned into your OpenClaw workspace (recommended):
# PLUGIN_DIR="$HOME/.openclaw/workspace/plugins/memory-lancedb-pro"
# - Otherwise, check: `openclaw plugins info memory-lancedb-pro` and locate the directory.
PLUGIN_DIR="/path/to/memory-lancedb-pro"
python3 "$PLUGIN_DIR/scripts/jsonl_distill.py" init
Tip: start the message with run ... so memory-lancedb-pro's adaptive retrieval will skip auto-recall injection (saves tokens).
# IMPORTANT: replace <PLUGIN_DIR> in the template below with your actual plugin path.
MSG=$(cat <<'EOF'
run jsonl memory distill
Goal: distill NEW chat content from OpenClaw session JSONL files into high-quality LanceDB memories using memory_store.
Hard rules:
- Incremental only: call the extractor script; do NOT scan full history.
- Store only reusable memories; skip routine chatter.
- English memory text + final line: Keywords (zh): ...
- < 500 chars, atomic.
- <= 3 memories per agent per run; <= 3 global per run.
- Scope: global for broadly reusable; otherwise agent:<agentId>.
Workflow:
1) exec: python3 <PLUGIN_DIR>/scripts/jsonl_distill.py run
2) If noop: stop.
3) Read batchFile (created/pending)
4) memory_store(...) for selected memories
5) exec: python3 <PLUGIN_DIR>/scripts/jsonl_distill.py commit --batch-file <batchFile>
EOF
)
openclaw cron add \
--agent memory-distiller \
--name "jsonl-memory-distill (hourly)" \
--cron "0 * * * *" \
--tz "Asia/Shanghai" \
--session isolated \
--wake now \
--timeout-seconds 420 \
--stagger 5m \
--no-deliver \
--message "$MSG"
openclaw cron run <jobId> --expect-final --timeout 180000
openclaw cron runs --id <jobId> --limit 5
When distilling all agents, always set scope explicitly when calling memory_store:
scope=globalscope=agent:<agentId>This prevents cross-bot memory pollution.
openclaw cron disable <jobId> / openclaw cron rm <jobId>openclaw agents delete memory-distillerrm -rf ~/.openclaw/state/jsonl-distill/# List memories (output includes the memory id)
openclaw memory-pro list [--scope global] [--category fact] [--limit 20] [--json]
# Search memories
openclaw memory-pro search "query" [--scope global] [--limit 10] [--json]
# View statistics
openclaw memory-pro stats [--scope global] [--json]
# Delete a memory by ID (supports 8+ char prefix)
# Tip: copy the id shown by `memory-pro list` / `memory-pro search` (or use --json for full output)
openclaw memory-pro delete <id>
# Bulk delete with filters
openclaw memory-pro delete-bulk --scope global [--before 2025-01-01] [--dry-run]
# Export / Import
openclaw memory-pro export [--scope global] [--output memories.json]
openclaw memory-pro import memories.json [--scope global] [--dry-run]
# Re-embed all entries with a new model
openclaw memory-pro reembed --source-db /path/to/old-db [--batch-size 32] [--skip-existing]
# Migrate from built-in memory-lancedb
openclaw memory-pro migrate check [--source /path]
openclaw memory-pro migrate run [--source /path] [--dry-run] [--skip-existing]
openclaw memory-pro migrate verify [--source /path]
/lesson)This plugin provides the core memory tools (memory_store, memory_recall, memory_forget, memory_update). You can define custom slash commands in your Agent's system prompt to create convenient shortcuts.
/lesson commandAdd this to your CLAUDE.md, AGENTS.md, or system prompt:
## /lesson command
When the user sends `/lesson <content>`:
1. Use memory_store to save as category=fact (the raw knowledge)
2. Use memory_store to save as category=decision (actionable takeaway)
3. Confirm what was saved
/remember command## /remember command
When the user sends `/remember <content>`:
1. Use memory_store to save with appropriate category and importance
2. Confirm with the stored memory ID
| Tool | Description |
|---|---|
memory_store | Store a memory (supports category, importance, scope) |
memory_recall | Search memories (hybrid vector + BM25 retrieval) |
memory_forget | Delete a memory by ID or search query |
memory_update | Update an existing memory in-place |
Note: These tools are registered automatically when the plugin loads. Custom commands like
/lessonare not built into the plugin โ they are defined at the Agent/system-prompt level and simply call these tools.
LanceDB table memories:
| Field | Type | Description |
|---|---|---|
id | string (UUID) | Primary key |
text | string | Memory text (FTS indexed) |
vector | float[] | Embedding vector |
category | string | preference / fact / decision / entity / other |
scope | string | Scope identifier (e.g., global, agent:main) |
importance | float | Importance score 0โ1 |
timestamp | int64 | Creation timestamp (ms) |
metadata | string (JSON) | Extended metadata |
On LanceDB 0.26+ (via Apache Arrow), some numeric columns may be returned as BigInt at runtime (commonly: timestamp, importance, _distance, _score). If you see errors like:
TypeError: Cannot mix BigInt and other types, use explicit conversionsupgrade to memory-lancedb-pro >= 1.0.14. This plugin now coerces these values using Number(...) before doing arithmetic (for example, when computing scores or sorting by timestamp).
For OpenClaw users: copy the code block below into your
AGENTS.mdso your agent enforces these rules automatically.
## Rule 1 โ ๅๅฑ่ฎฐๅฟๅญๅจ๏ผ้ๅพ๏ผ
Every pitfall/lesson learned โ IMMEDIATELY store TWO memories to LanceDB before moving on:
- **Technical layer**: Pitfall: [symptom]. Cause: [root cause]. Fix: [solution]. Prevention: [how to avoid]
(category: fact, importance โฅ 0.8)
- **Principle layer**: Decision principle ([tag]): [behavioral rule]. Trigger: [when it applies]. Action: [what to do]
(category: decision, importance โฅ 0.85)
- After each store, immediately `memory_recall` with anchor keywords to verify retrieval.
If not found, rewrite and re-store.
- Missing either layer = incomplete.
Do NOT proceed to next topic until both are stored and verified.
- Also update relevant SKILL.md files to prevent recurrence.
## Rule 2 โ LanceDB ๅซ็
Entries must be short and atomic (< 500 chars). Never store raw conversation summaries, large blobs, or duplicates.
Prefer structured format with keywords for retrieval.
## Rule 3 โ Recall before retry
On ANY tool failure, repeated error, or unexpected behavior, ALWAYS `memory_recall` with relevant keywords
(error message, tool name, symptom) BEFORE retrying. LanceDB likely already has the fix.
Blind retries waste time and repeat known mistakes.
## Rule 4 โ ็ผ่พๅ็กฎ่ฎค็ฎๆ ไปฃ็ ๅบ
When working on memory plugins, confirm you are editing the intended package
(e.g., `memory-lancedb-pro` vs built-in `memory-lancedb`) before making changes;
use `memory_recall` + filesystem search to avoid patching the wrong repo.
## Rule 5 โ ๆไปถไปฃ็ ๅๆดๅฟ
้กปๆธ
jiti ็ผๅญ๏ผMANDATORY๏ผ
After modifying ANY `.ts` file under `plugins/`, MUST run `rm -rf /tmp/jiti/` BEFORE `openclaw gateway restart`.
jiti caches compiled TS; restart alone loads STALE code. This has caused silent bugs multiple times.
Config-only changes do NOT need cache clearing.
| Package | Purpose |
|---|---|
@lancedb/lancedb โฅ0.26.2 | Vector database (ANN + FTS) |
openai โฅ6.21.0 | OpenAI-compatible Embedding API client |
@sinclair/typebox 0.34.48 | JSON Schema type definitions (tool parameters) |
Top contributors (from GitHub's contributors list, sorted by commit contributions; bots excluded):
Full list: https://github.com/win4r/memory-lancedb-pro/graphs/contributors
MIT
FAQs
OpenClaw enhanced LanceDB memory plugin with hybrid retrieval (Vector + BM25), cross-encoder rerank, multi-scope isolation, long-context chunking, and management CLI
The npm package memory-lancedb-pro receives a total of 2,761 weekly downloads. As such, memory-lancedb-pro popularity was classified as popular.
We found that memory-lancedb-pro demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.ย It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
/Security News
The worm-enabled campaign hit @emilgroup and @teale.io, then used an ICP canister to deliver follow-on payloads.

Research
/Security News
Attackers compromised Trivy GitHub Actions by force-updating tags to deliver malware, exposing CI/CD secrets across affected pipelines.

Security News
ENISAโs new package manager advisory outlines the dependency security practices companies will need to demonstrate as the EUโs Cyber Resilience Act begins enforcing software supply chain requirements.