
Product
Introducing Repository Access Permissions and Custom Roles
Socket now supports Custom Roles and Repository Access Permissions so organizations can control who can access specific repositories and actions.
@mastra/longmemeval
Advanced tools
Affected versions:
This package implements the LongMemEval benchmark (+Github) for testing Mastra's long-term memory capabilities.
LongMemEval is a comprehensive benchmark designed by researchers to evaluate the long-term memory capabilities of chat assistants. It was introduced in the paper:
"LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory"
Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, Dong Yu (ICLR 2025)
š Paper | š Website | š¤ Dataset
The benchmark evaluates five core long-term memory abilities through 500 meticulously curated questions:
Current LLMs show a 30-60% performance drop when tested on LongMemEval, revealing significant challenges in maintaining coherent long-term memory. This benchmark helps identify and improve these limitations.
# From packages/longmemeval directory
# 1. Set your API keys
export OPENAI_API_KEY=your_openai_key_here
export HF_TOKEN=your_huggingface_token_here # For automatic dataset download
# 2. Run a benchmark (downloads datasets automatically if needed)
pnpm bench:s # Run small dataset (10 parallel requests)
pnpm bench:m # Run medium dataset (10 parallel requests)
pnpm bench:oracle # Run oracle dataset (10 parallel requests)
# Or run quick 10-question tests
pnpm bench:s:quick # Test with 10 questions from small dataset
pnpm bench:m:quick # Test with 10 questions from medium dataset
pnpm bench:oracle:quick # Test with 10 questions from oracle dataset
Note: The benchmark will automatically download datasets on first run. Get your HuggingFace token from https://huggingface.co/settings/tokens
# From the monorepo root
pnpm install
pnpm build
# Set your HuggingFace token
export HF_TOKEN=your_token_here
# Download datasets (no Python or Git LFS required)
pnpm download
If automatic download fails, see DOWNLOAD_GUIDE.md for manual download instructions.
# From packages/longmemeval directory
# Quick commands for each dataset (10 parallel requests)
pnpm bench:s # Small dataset (full run)
pnpm bench:m # Medium dataset (full run)
pnpm bench:oracle # Oracle dataset (full run)
# Quick test runs (10 questions only, 5 parallel)
pnpm bench:s:quick # Small dataset (quick test)
pnpm bench:m:quick # Medium dataset (quick test)
pnpm bench:oracle:quick # Oracle dataset (quick test)
# Advanced: Use full CLI with custom options
pnpm cli run --dataset longmemeval_s --model gpt-4o
# Adjust parallelization (default: 5)
pnpm cli run --dataset longmemeval_s --model gpt-4o --concurrency 20
# Graceful shutdown: Press Ctrl+C to stop and save progress
# Run with specific memory configuration
pnpm cli run --dataset longmemeval_s --memory-config last-k --model gpt-4o
pnpm cli run --dataset longmemeval_s --memory-config semantic-recall --model gpt-4o
pnpm cli run --dataset longmemeval_s --memory-config working-memory --model gpt-4o
# Custom subset size
pnpm cli run --dataset longmemeval_oracle --model gpt-4o --subset 25
pnpm cli stats --dataset longmemeval_s
pnpm cli evaluate --results ./results/run_12345/results.jsonl --dataset longmemeval_s
pnpm cli report --results ./results/
Results are saved in the results/ directory with:
results.jsonl: Individual question resultshypotheses.json: Model responsesquestions.json: Questions for referencemetrics.json: Aggregated metrics and configurationLongMemEval provides three dataset variants:
If you use this benchmark in your research, please cite the original paper:
@article{wu2024longmemeval,
title={LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory},
author={Wu, Di and Wang, Hongwei and Yu, Wenhao and Zhang, Yuwei and Chang, Kai-Wei and Yu, Dong},
journal={arXiv preprint arXiv:2410.10813},
year={2024}
}
To add custom memory configurations:
src/benchmark/runner.ts and add your configuration to getMemoryConfig()MemoryConfigType in src/data/types.tssrc/memory-adapters/mastra-adapter.tsFAQs
LongMemEval benchmark implementation for Mastra Memory
The npm package @mastra/longmemeval receives a total of 3,660 weekly downloads. As such, @mastra/longmemeval popularity was classified as popular.
We found that @mastra/longmemeval demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Ā It has 6 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Product
Socket now supports Custom Roles and Repository Access Permissions so organizations can control who can access specific repositories and actions.

Product
Socket MCP now lets AI assistants review org alerts, investigate threats using the Socket threat feed, and inspect package files in addition to dependency scoring.

Product
Socket Firewall blocks malicious VS Code and Open VSX extensions before install, protecting developers from compromised editor marketplaces.