Spam Statistics: The Impact Of Koffler-Linfoot Enhanced Rules
Starting about three weeks ago, I finally began to pay some new attention to anti-spam measures on my servers. I haven't been particularly aggressive about it for the past few years. I was doing presentations on anti-spam technology at Lotusphere and other conferences, and I wanted to remain "pure" in the sense that while I was evaluating feature sets of many anti-spam offerings, I didn't want to play favorites by running with any one in particular. Furthermore, with Lotus pushing anti-spam features as part of the ND6 and 6.5 story, I wanted to experience what customers would experience if they relied purely on the built-in functionality. Plus, I have a fairly small user population,so the complaints could be easily handled :-)

For a while, the volume of spam delivered to user inboxes was acceptable. I was using just four anti-spam measures: a static list of IP addresses to block, DNS Blacklists, server mail rules, and user mail rules. About two months ago, however, I started seeing more and more spam reaching inboxes, and my efforts to tune the rules weren't helping,so I decided to make some changes. I downloaded Chris Linfoot's enhanced mail rules design elements, which are an extension of the technique first demonstrated by Daniel Koffler. The enhanced rules let you check the $DNSBLSite field, HELO data, X-Mailer header, and several other headers for markers that help identify probable spam. For the most part, I followed the anti-spam recipe that Chris posted. I also changed the DNSBL configuration on my server to tag and log messages instead of reject them, allowing the enhanced rules to quarantine them. This will allow me to move to some more aggressive DNSBLs in the future without losing mail to false positives.

So, let's look at the situation before the changes:

These numbers aren't totally accurate becasue (a) I don't keep my logs long enough to have been able to go back and recover actual DNSBL rejections during this period, so I held the percentage of good mail vs.spam constant before and after and extrapolated a DNSBL number; and (b) I haven't sat down and figured out how to track rejections due to the static IP block list (and it surely involves analyzing log data that I don't have anyhow). The 11% of spam attempts that are shown to have been actually delivered to user mailboxes is therefore a little higher than reality. 89% spam rejection with only the built-in anti-spam features of ND 6.5, by the way, is better than I would have expected. The last time I took a serious look at the numbers (close to two years ago), I was seeing only 70% to 80% spam rejection. I guess my hand-tuning of rules was somewhat more effective than I had given myself credit for, and/or the DNSBLs were doing much better than they used to.

And now, the situation after the changes:

The numbers show that the Koffler-Linfoot enhanced rules have resulted in a reduction by more than 55% in spam delievered to user inboxes. I'm now getting 95.6% spam rejection. There were 3 false positives during the two week period that I've had the enhanced rules running -- all three from the same mailing list, and for the life of me I can't see what triggered the rule hit! I've made that mailing list an exception to the enhanced rules, so that case should be taken care of. (I'm thinking, by the way, of writing a tool to execute rules against my quarantine database and mark each message with information that identifies all the rules that would trigger on it. That would be a big help in quantifying rule effectiveness and diagnosing false positive causes. I just need to figure out how to execute the pre-compiled formula items in the rule docs. It's probably going to require some API coding.

1. Gregg Eldred05/12/2005 11:37:21 PM

This is extremely cool. After our little conversation about the L-NotesL listserv being blocked, and your comments, I took a closer look at Chris' website. Very good information there. Couple that with what you have done and I feel that I need to re-evaluate my anti-spam measures, too. This is a great post!
Thanks for your insights.

