[Israel.pm] Pesach cleanup

Yuval Kogman lists at woobling.org
Mon Apr 12 16:32:43 PDT 2004

On Mon, Apr 12, 2004 at 22:48:53 -0200, Gabor Szabo wrote:
> After the first day it had filtered out about 30% of my messages
> into my SPAM folder. Nice, but far from my expectations.

There are also some effective rulesets - backhairs and friends.
> It still leaves me with 20-40 spams a day but that's really manageable
> already.

Too much... I've been using it for nearly a year now, since I set up my
mail server, and i've had 0 false positives, and 2 false negatives
(both were quite cunning, and got through when I hadn't updated from an
old version). I get about 30 spams a day (new domain), and around 200
legitimate messages (mailing lists and such, mostly skimmed on
weekends), for my addres.  Some of my friends have accounts on my box,
but I don't think they're having any trouble either (maybe I should
start reading their mails ;-).

> Of course what I was afraid of, that the filter will mark good
> message as spams (false positives) also happened. In the
> first few days - after going through about 5000 messages marked as
> SPAM I found about 10 which were real messages.

My dad's work place sends lots of HTML mail, sometimes with GIFs instead
of place (you'd expect more from a university, no?). Find the rules
which get at your ham, and tone them down.

Use razor, pyzor, and DCC, and raise your threshold a little.

Increase the weight of the DNSbls, and use them excessively. Just make
sure that you don't reject a message solely for being an open relay.

If your baysean classification is good, you can make it a bit more
significant by editing the weight of the rules.

Lastly, and this has been quite effective for me, add a spamtrap address
(i phrase mine cleverly for the future (no fnords ("bait",  "trap",
"(no)?spam"), or say that you should not mail the address, simply state
that the address might like advertisments about genital enhancements or
morgages), and hide them with CSS if I can ) in places you know your
realmail is stored. Then pipe all this mail to sa-learn --spam. I also
wget on all the URLS listed in the email, to trigger the web bug images,
and i used to autosubmit to spamcop (too much effort, now that the trap
gets more mail than I do). Once you unsubscribe it's address from the
spams you get, it will get spam similar to the kind you're getting,
perhaps even the same messages (mine is named aarnoch to exploit this
case - by the time i get the messages, it already learnt them).

I actually did not go through all this effort for myself, but rather for
my dad, who enjoys a quantity of spam an order of magnitude larger than

As for the baysean engine - i've always been meaning to port the
algorithm CRM114 uses, but (oh gee, i wonder why) i haven't gotten to it
yet. Maybe if you have some free time....


 ()  Yuval Kogman <nothingmuch at woobling.org> 0xEBD27418  perl hacker &
 /\  kung foo master: /me supports the ASCII Ribbon Campaign: neeyah!!!

More information about the Perl mailing list