Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement →

listsift

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

listsift

A self-hostable AI dedupe/triage relay for high-volume security & bug mailing lists. Deterministic near-duplicate clustering + validity-ranking + a triaged digest for human triage. Never auto-discards a report. Optional Anthropic LLM layer sharpens validi

latest

npm

Version: 0.1.0

Version published: 2 weeks ago

Maintainers: 1

Created: 2 weeks ago

Source

listsift

Groups the flood of near-identical bug reports on a busy security mailing list into one tidy digest a human can actually get through.

A few very high-traffic security/bug mailing lists are buried under waves of the same AI-written report sent over and over. listsift reads the mailbox, bundles the near-duplicate copies of each report together, sorts the bundles so the ones most likely to be real float to the top, and hands the maintainer one digest to read. It only ever groups, ranks, and labels — it never throws anything away. A human still decides.

Nothing is ever auto-deleted. Every message stays in the digest, just grouped and ordered. This is enforced in code and tested — see test/no-discard.test.js.

Read this first — honest scope

This is a narrow tool for a small audience. It exists because a few very high-traffic security/bug mailing lists are, in their maintainers' own public words, becoming "almost entirely unmanageable" under a flood of mass-duplicated, often-made-up AI bug reports (HN discussion). AI is good at spotting and grouping duplicates; mailing-list software has no built-in duplicate detection. listsift is a small, honest tool you run yourself that fills exactly that gap.

Being blunt about the size of the problem this addresses:

The realistic universe of "lists big enough to need this" is a handful of projects (kernel-scale security/bug lists). This is a viable open-source / reputation tool. It is weak as a standalone business, and there is intentionally no payment, account, or hosted-billing code in this repository.
A thin optional hosted tier may be wired up separately by the operator later (via a Merchant-of-Record). It is not part of this codebase and is not implied by anything here. Treat listsift as: a useful OSS tool a list maintainer can git clone and run. Nothing more is promised.

If you maintain a list that isn't drowning in duplicate AI reports, you almost certainly don't need this. That's fine — it's built for the ones that are.

What it does on its own (no API key, no network)

listsift is fully usable with no API key and no internet. The core runs entirely on your machine, gives the same result every time, and has no outside dependencies. The steps:

parse (mbox/maildir/message)
  -> normalize (strip quoting/signatures/Re:/[list] tags, mask volatile tokens)
  -> cluster (threading + normalized-subject + body-shingle Jaccard, union-find)
  -> validity heuristics (advisory; transparent signals)
  -> ranked digest (every input message represented)

Without a key it produces a complete, useful, byte-stable digest:

Threading: messages linked by References / In-Reply-To are one issue.
Near-duplicate clustering: mass-submitted copies of the same report (trivially reworded, different subjects/Re: prefixes) collapse into one cluster, reported as "N duplicates collapsed". All original Message-IDs, senders, and dates are preserved in the digest.
Advisory validity ranking: each cluster gets a transparent score from visible heuristics (concrete-evidence markers like stack traces / CVE refs / repro steps / version pins push up; grandiose-but-detail-free, AI-tell, and pure-boilerplate markers push down). Every contributing signal is printed so a human can see why and overrule it in one glance.

Quick start

npm ci            # only dev dependency is the test runner (vitest)

# Triage an mbox -> human-readable digest (no key, no network):
node bin/listsift.js digest /path/to/list.mbox

# Or a maildir, or a directory of one-message files, or stdin:
node bin/listsift.js digest /path/to/Maildir
cat list.mbox | node bin/listsift.js digest -

# Machine-readable:
node bin/listsift.js digest list.mbox --json

# Tune the near-duplicate threshold (default 0.5; higher = stricter):
node bin/listsift.js digest list.mbox --similarity 0.6

Example (the test fixture): 9 messages → 4 issues, 5 duplicates collapsed; the real use-after-free report ranks first, a 5-way mass-duplicated report collapses into one cluster, and a grandiose-but-empty "SEVERE VULNERABILITY" message is ranked last as low-signal — but still present in the digest.

The optional AI layer (off by default)

There's an optional second pass that uses an LLM (a large language model — the kind of AI behind chat assistants). It only sharpens the "is this real" score on the hard, borderline cases (confident-sounding text with no actual evidence is exactly where the plain rules are weakest). It is off unless you turn it on with your own key.

export ANTHROPIC_API_KEY=sk-ant-...     # YOUR key. Never bundled. Never shared.
node bin/listsift.js digest list.mbox --llm

Anthropic Claude only. There is no OpenAI/Gemini/other path, by design. The key is read from the environment by the CLI and passed in explicitly — it is never hardcoded. (Statically asserted in test/no-network.test.js.)
It re-ranks; it never discards. The LLM score is blended with the deterministic heuristic via a weighted mean (combineScores in src/scorer.js), so even a model that hallucinates "this is invalid → 0" cannot single-handedly bury a real report at the default weight. Unparseable model output falls back to a neutral score, never a silent zero. A scorer error falls back to the deterministic score and is surfaced in the digest rationale — never swallowed. The no-discard invariant (every message present, exactly once) always holds regardless of LLM weight; the weighted-mean design keeps a real report out-ranking noise at the default setting.
Not exercised in CI. Testing the real adapter would need a live key and network. The scoring contract is covered by a deterministic FakeScorer; the real AnthropicScorer is a thin, documented HTTP call to the Anthropic Messages API.

Privacy & safety posture (this processes sensitive mail)

Security-list mail is sensitive (sometimes embargoed). listsift's posture:

Local-first. No telemetry. No phone-home. Ever. Run with no flags and listsift makes zero network connections. This is asserted by the test suite, which traps fetch and runs a full digest expecting it to never be called.
The only possible network destination is the Anthropic endpoint, and only when you pass --llm with your own key. In that mode, only the representative (richest) message of a cluster is sent, truncated — never the full list, never your whole mailbox. If you never pass --llm, no message content leaves your machine.
Your data is never silently altered. Normalization (quote/signature stripping, volatile-token masking) is applied only to the internal comparison copy used for similarity — your original messages are preserved verbatim in the digest output (Message-IDs, senders, dates, subjects).
The feedback file is local-only (FEEDBACK.local.ndjson, gitignored) and is never transmitted anywhere.

The non-negotiable: nothing is ever auto-discarded

A hallucinated low validity score silently dropping a real vulnerability report would be catastrophic for a security list. So:

Every input message is represented in the digest, exactly once. Low- ranked clusters are sorted to the bottom and labeled, never removed. sum(cluster sizes) === input message count is verified at runtime; the CLI refuses to emit a digest that lost a message (exit code 3) rather than silently drop one.
"Discard" is not a behavior this code can perform. There is no code path that deletes, rejects, or hides a message based on a score. The validity score controls order and a human-readable label only.
The blend ensures the heuristic signal always contributes: at the default LLM weight (0.45) a model returning 0 can push the blend down but cannot reach 0 on its own — the worst a report can look is "very low signal — a human should still glance at it". The no-discard invariant is enforced structurally (every message appears in the digest) and is not dependent on any score floor.

This is tested directly, including an adversarial scorer that returns 0 for everything and one that throws — in both cases every message still appears.

Limitations (read this — no numbers are invented)

Heuristic dedupe has false positives and false negatives. Clustering is threading + normalized-subject + word-shingle Jaccard. It will sometimes merge two unrelated reports that share a generic subject and boilerplate, or fail to merge two true duplicates that are heavily reworded. There is no published accuracy number and none is claimed — signals are validated on a hand-authored fixture, not a large labeled corpus. Tune --similarity for your list and report misfires (see FEEDBACK.md).
The validity score is advisory, not authoritative. It is a transparent heuristic, not a classifier and not a trained model. It ranks and labels; it never decides. A human maintainer triages.
The LLM layer is an optional enhancement, not a source of truth. It needs your own Anthropic key; without it listsift runs deterministic-only (fully functional, blunter on deliberately-subtle prose). A model can be wrong; the blend + floor + neutral-fallback exist precisely so it cannot do damage.
The mail parser is pragmatic, not a full RFC 5322 / MIME implementation. It handles the plain-text mbox / maildir / single-message traffic typical of these lists. It does not decode MIME multipart, base64/quoted-printable bodies, or attachments; for multipart messages it dedupes on the raw decoded text. mbox splitting uses the standard From -line heuristic.
Heuristics are English-leaning. Quote/signature/boilerplate detection is tuned for English-language plain-text mail today.
Tiny audience. Restating the honest scope: this helps the small set of lists that are actually overwhelmed. It is not a general-purpose mail tool and is not a business in this form.

Development

npm ci
npm test        # vitest — no network, no API key required or used

The core (src/parse.js, normalize.js, cluster.js, validity.js, digest.js, feedback.js, scorer.js) is pure and dependency-free. Filesystem I/O is isolated in src/io.js; the only network I/O is the optional AnthropicScorer in src/scorer.js.
The suite (87 tests) asserts its own guarantees: no ANTHROPIC_API_KEY present, zero network in the deterministic path, Anthropic-only (no competing provider endpoint/SDK in src/), no hardcoded key, the nothing-auto-discarded invariant (incl. adversarial/throwing scorers), and byte-stable deterministic output.
CI (.github/workflows/ci.yml) runs npm ci && npm test only — no secrets, no network.

Feedback

Misfires (false merges, missed duplicates, misleading ranks) are the single most useful signal — see FEEDBACK.md. Reports are stored and read verbatim (no summarization). Zero-friction path: add the listsift-feedback label to a related issue, or append a line to your local FEEDBACK.local.ndjson via listsift feedback <kind> "<text>".

License

MIT — see LICENSE.

Keywords

FAQs

What is listsift?

Is listsift well maintained?

Package last updated on 19 May 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

listsift

listsift

Read this first — honest scope

What it does on its own (no API key, no network)

Quick start

The optional AI layer (off by default)

Privacy & safety posture (this processes sensitive mail)

The non-negotiable: nothing is ever auto-discarded

Limitations (read this — no numbers are invented)

Development

Feedback

License

Keywords

Related posts

Famous Chollima Targets PHP Developers Through Compromised Packagist Package

Rust Moves to Restrict LLM Use in Contributions After Months of Internal Debate