Sign In/My Account | View Cart  
advertisement


Listen Print

Mail Filtering
by Michael Stevens | Pages: 1, 2

Testing ... 1 ... 2 ... 3 ...

I decided to do an extended comparison of both PerlMx and Mail::Audit. As one of the most common applications of mail filtering tools is for spam filtering, I set up recent versions of both the tools on my personal e-mail, by various nefarious means, ran them for a week, and compared the results on two main criteria:

  • False positives (legitimate email recognized as spam)
  • False negatives (spam not recognized as spam)

Mail::Audit doesn't come with much spam filtering technology by default, so I decided to add SpamAssassin (http://www.spamassassin.org/) to the testing, as it can be used as a Mail::Audit extension.

I used procmail to copy all my incoming e-mail to two pop3 mailboxes setup for the purposes of testing - one would contain mail to be processed by Mail::Audit, the other mail to be processed by PerlMx's spam filtering. fetchmail was used to pull the mail down into the domain of Mail::Audit and PerlMx.

Once I had Mail::Audit and SpamAssassin setup, I started feeding mail into the test box with fetchmail, and was reminded that as the Mail::Audit approach of setting up a perl program to run from a .forward file has ... unpleasant effects if you receive more than a few e-mails in quick succession. As my test mail-server collapsed under the load, I checked the PerlMx machine, started at roughly the same time, and found that while it was working through the e-mail more slowly, it hadn't put any serious load on the machine.

Due to a PerlMx configuration error on my part, of the first 171 messages processed, 10 were quarantined as spam AND delivered to the inbox of my test user. PerlMx runs by default in 'training mode' when processing spam - in this mode, mail is spamchecked as normal, but even if it is found to be spam and quarantined, it is also delivered to the user.

I decided to keep track of any mail lost or mislaid during initial setup problems, so I could see what problems could arise from the tools being misconfigured. An important aspect of any software is not only how it behaves when configured right, but how much it punishes you when you get the configuration wrong.

Waking up the next morning, I found I'd bounced several hundred e-mails back to the account from which I was forwarding all the test e-mails, someone of which appeared to have gone back and forth, or found their way into the PerlMx test mailbox. Most of the problems appeared to be internal errors from within SpamAssassin. My mail-server still hadn't recovered.

I later found this was because of an compatibility issue with SpamAssassin / Mail::Audit, and there was a recommended fix in the SpamAssassin FAQ involving the nomime option to Mail::Audit (but not, sadly, in the documentation for the Mail::SpamAssassin module itself).

The SpamAssassin / Mail::Audit script I ended up using in the end was:


  #!/usr/local/bin/perl -w

  use strict;
  use C<Mail::Audit>;
  use Mail::SpamAssassin;

  # create C<Mail::Audit> object, log to /tmp, disable mime processing
  # for SpamAssassin compatibility, and store mail in ~/emergency_mbox
  # if processing fails
  my $mail = C<Mail::Audit>->new(emergency=>"~/emergency_mbox",
                              log => '/tmp/audit.log',
                              loglevel => 4, nomime => 1);

  my $spamtest = Mail::SpamAssassin->new;
  
  # check mail with SpamAssassin
  my $status = $spamtest->check($mail);
  
  # if it was spam, rewrite to indicate what the problem was, and 
  # store in the file ass-spam in our home directory
  if ($status->is_spam) {
          $status->rewrite_mail;
          $mail->accept("/home/spam1/ass-spam");
  # if if wasn't spam, accept it as normal mail
  } else {
          $mail->accept;
  }
  
  exit 0;

After clearing down all my mail, and losing two days of testing, I started again. It was only the nature of the testing setup that meant the bounce mail went to me and not the original sender. So, at 23:25 on Tuesday, I had another go. This time I knew enough to limit SpamAssassin to receiving messages in batches of five (using fetchmail) - something I could do in testing, but wouldn't be an easy option in most production setups. This meant my test machine could just about cope with delivering mail using SpamAssassin.

At 10 p.m. Sunday, I declared the testing closed, and examined the accuracy or otherwise of each system.

During the testing between Aug. 6 and 11, Mail::Audit marked 16 pieces of e-mail as spam. Seven of these e-mails proved to be false positives - mail that I had actually solicited and would have liked to have received. Six spam emails were accepted into my Inbox. There were 874 e-mails received in all. Mail::Audit appeared to receive 15 pieces of spam mail in total.

PerlMx marked 14 e-mails as spam. Two of these e-mails proved to be false positives - mail that was not spam. Impressively, it received 886 e-mails in the same period that Mail::Audit received 874 e-mails. I was unable to work out the exact cause of this, although the power-cut in the middle of the testing period will always be a major suspect. Eleven spam messages were incorrectly allowws through into my Inbox. PerlMx appeared to receive 23 pieces of spam mail in total.

The sample was small, as all I had was my own personal e-mail to work with, and I get what I'm told is surprisingly little spam, but it shows that Mail::Audit / SpamAssassin seems to decide more mail is spam than PerlMx does, but is also wrong more of the time. PerlMx marked slightly less e-mail as spam, and let more spam through, but when it did claim e-mail was spam it was right more of the time.

These tests would benefit significantly from being re-run during a long period of time on a larger mail-server, but I had neither the time nor the mail-server available.

Both tools can be extensively configured in terms of what is considered spam, and are likely to need regular updating to ensure they keep up to date with new tricks of the spammers. Here I only considered the behavior with the default configuration of the latest release at the time I ran my tests.

Feature Comparison

To help you choose, I've summarized the basic characteristics of both systems below. Some of the points are quite subjective and are more my impressions of the tools rather than hard facts - these are marked separately.

PerlMX Mail::Audit
Scalable Yes - persistent server Maybe - depends on config - obvious default configurations scale poorly
Ships with wide range of existing filtering functionality Yes Limited range, more available from third-parties
Target use System-wide mail filtering for mailservers Per-use mail filtering as a replacement for programs like procmail
Extensible? Yes Yes
Licensing Commercial Open-source
Mail Server Compatibility Sendmail Almost any mail server
Spam filtering Yes Third-party extension
Virus filtering Yes No
Easy to setup Yes Not so easy, requires custom code
Efficient and Scalable Very scalable - easily separated from the mailserver, and no noticable performance impact during testing Performance problems during testing in default configuration

Conclusions

During testing, PerlMx was significantly more reliable, both in terms of the amount of mail bounced due to configuration problems (none), and in terms of the load put in the mailserver (minimal) than Mail::Audit. Although Mail::Audit appears able to be setup for good performance, the obvious suggested configuration showed extremely poor scalability during testing. Also, as Mail::Audit requires writing some filtering code, bugs, mostly in this code, resulted in nontrivial quantities of mail being bounced during testing due to code/configuration errors, a problem that simply didn't occur with PerlMx's more pre-supplied, configuration file based system.

Both PerlMx and Mail::Audit provide good mail filtering solutions using Perl, but are targeted at entirely different markets. PerlMx is a systemwide solution providing drop-in functionality on mailservers, with Perl extensibility as well, whereas Mail::Audit is a more low-level tool, mostly focused on use by individuals, designed to let users build their own mail processing tools more easily.