mailtrainer
mailtrainer is a tool for automatically training spam filters by scanning
a Maildir directory tree. Every message in a spam folder is learned
as spam, every message in an inbox is learned as ham, and every message
in an outbox has its recipients whitelisted. mailtrainer does not
do any learning/whitelisting itself, but rather executes user-defined
commands such as spamprobe or sa-learn, making it agnostic to the spam
filter you use. mailtrainer is designed to run periodically from
cron, so you can easily retrain your spam filter by moving messages
between mailboxes using your MUA.
Notable Features
-
mailtrainer remembers how messages have already been learned and
avoids learning a message the same way more than once.
-
You can specify a minimum age before a message will be learned,
so you have time to catch classification mistakes before they affect
other mail.
-
Each mailbox can be configured as an inbox, an outbox, or a spam folder,
(or be ignored) so it's no problem if you use multiple mailboxes.
Motivation
Bayesian filters need to be regularly trained to be effective.
SpamAssassin and SpamProbe both support auto-learning which trains
the filter based on the classification of incoming messages: if an
incoming message is classified as ham, it's auto-learned as ham, and
likewise for spam. The problem with this approach is that if a message
is incorrectly classified, it gets learned incorrectly, decreasing the
effectiveness of future classification. SpamAssassin and SpamProbe
address this problem by allowing messages to be manually relearned.
Unfortunately, if you neglect to manually relearn a message (e.g. you
delete a spam message from your INBOX instead of relearning it), then
your Bayesian database is permanently polluted and made less effective.
Mailtrainer allows you to specify a minimum age on messages before
auto-learning them, allowing you time to correct misclassifications
before your Bayesian database is affected.
Recommended Usage
-
Set the training.ham and training.spam options in your mailtrainer
config file to be the appropriate commands for your spam filter. For
SpamAssassin, they are sa-learn --ham and sa-learn --spam. For
SpamProbe, they are spamprobe good and spamprobe spam.
-
Disable auto-learning in your spam filter, since mailtrainer will do
it for you. For SpamAssassin, this means setting bayes_auto_learn 0, and for SpamProbe, this means using spamprobe score to classify
your mail instead of spamprobe receive or spamprobe train.
-
Designate one of your mailboxes as the Spam mailbox by setting its
purpose to "spam" in your mailtrainer config file. Divert messages
classified as spam to this mailbox using e.g. procmail.
-
Designate your mailboxes that receive incoming mail as inboxes by
setting their purpose to "inbox" in your mailtrainer config file.
Set a min-age depending on how frequently you check for new mail
in this mailbox. The min-age should be at least as long as the
interval at which you check for new mail, since you want to have
an opportunity to move spam to the spam folder before mailtrainer
learns it as ham. If you rarely examine new messages in a mailbox,
set a very high min-age or don't designate the mailbox as an inbox.
If you go on vacation and won't be checking mail often, increase the
min-age before you leave.
-
Set up a cron job to run mailtrainer once an hour.
Example Config File
[paths]
maildir = /home/andrew/Maildir
[training]
ham = spamprobe good
spam = spamprobe spam
[mailbox "Sent"]
purpose = outbox
[mailbox "Spam"]
purpose = spam
[mailbox "INBOX"]
purpose = inbox
min-age = 72h
[mailbox "Lists"]
purpose = inbox
min-age = 168h
Installing
mailtrainer requires Go 1.20 or higher.
Install by running:
go install src.agwa.name/mailtrainer@latest