Sign In/My Account | View Cart  
advertisement


Listen Print

Mail Filtering

by Michael Stevens
August 27, 2002

There are many ways to filter your e-mail with Perl. Two of the more popular and interesting ways are to use PerlMx or Mail::Audit. I took a long look at both, and this is what I thought of them.

PerlMx

PerlMx is a server product from ActiveState that uses the milter support in recent versions of sendmail to hook in at almost every stage of the mail-handling process.

PerlMx comes with its own copy of Perl, and all the supporting modules it needs - it can't run from a normal Perl, as it needs Perl to be built with various options such as ithreads support and multiplicity. This means you need to install any modules you want to use with PerlMx twice if you already have them installed somewhere else on your system.

Related Reading

Perl in a Nutshell, 2nd Edition

Perl in a Nutshell, 2nd Edition
By Stephen Spainhour, Ellen Siever, Nate Patwardhan

PerlMx provides a persistent daemon that processes e-mail for an entire mail-server - it avoids the overhead of starting a Perl process to handle each e-mail by running forever, and by using threads to ensure it can service more than one e-mail at a time.

PerlMx ships with two main filters - the Spam and Virus filters. The Virus filtering looks interesting, but ultimately I don't receive that many viruses in e-mail, so I was unable to test it beyond establishing that it didn't mangle my e-mail.

The Spam filtering in PerlMX is much more interesting - it seems to be based on Mail::SpamAssassin, a popular spam filtering module often used with Mail::Audit, procmail, or other ways of processing e-mail.

In two weeks of testing with PerlMx, using it to process a copy of all my personal e-mail, I found a lot useful functionality, and a few minor problems.

The first hassles were setup - I don't normally use sendmail, but PerlMx requires it for the milter API, so I installed sendmail, set it up, and hooked it into PerlMx.

Once you have sendmail setup, and built with milter support (as the default build from Debian Linux I used was), it's easy to add a connection to PerlMx with one line in your sendmail.mc file:


INPUT_MAIL_FILTER(`C<PerlMx>', `S=inet:3366@localhost, F=T, 
     T=S:3m;R:3m;E:8m'')

PerlMx essentially works out of the box - it asks a number of simple questions when you install and set it up, and assuming you get these right, no further configuration will be required.

The INPUT_MAIL_FILTER line also sets several key options, including the timeouts for communication between sendmail and PerlMx - I had to raise these significantly to deal with a problem I found where PerlMx was taking too much time to process spam (it appear to be doing DNS lookups), sendmail was timing out the connection to PerlMx, and refusing to accept mail.

In PerlMx 2.1, it even ships with its own sendmail install, pre-configured for use with PerlMx, but you can choose to ignore this and use an existing system sendmail.

Once you've done this, suddenly all the mail that goes through your mail-server is spam filtered, and virus checked. Mail that looks likely to be spam, or that contains a virus is stopped and held in a quarantine queue, the rest are sent to the user, possibly with a spam header added to indicate a score representing how likely to be spam they are. The quarantine queue is a systemwide collection of messages which, for one reason or another, weren't appropriate to deliver to the user - this will be normally as they are either suspected to contain viruses or spam.

If the filters supplied with PerlMx aren't to your tastes, then it comes supplied with an extension API, and extensive documentation and samples to allow you to write your own.

While testing PerlMx, I never managed to bounce or accidentally lose my e-mail - I made many configuration errors, which meant mail wasn't processed and a lot of stuff was somewhat over-enthusiastically marked as spam when it was actually valid. But as far as I can tell, nothing bounced or disappeared into the system - this is pretty impressive, as when configuring most new bits of e-mail I usually manage to delete everything I send to it in the first few attempts, or, worse, make myself look stupid by sending errors back to random people unfortunate enough to be on the same mailing list as me.

Mail::Audit

Mail::Audit is very different from PerlMx. For starters, once you've installed it, by default it doesn't do anything. Mail::Audit is just a Perl module - it's a powerful tool for implementing mail filters, but mostly you have to write them yourself. PerlMx ships with spam filtering and virus checking configured by default, Mail::Audit provides duplicate killing, a mailing list processing module (based on Mail::ListDetector), and a few simple spam filtering options based on Realtime Blackhole Lists or Vipul's Razor.

Mail::Audit is not designed to be used with an entire mail-server in the same way as PerlMx. Instead, it allows you to easily write little e-mail filter programs that can be triggered from the .forward file of a particular user. Mail::Audit can be easily configured and used on a per-user basis, whereas PerlMx takes over an entire mail-server and is an all-or-nothing choice.

The default Mail::Audit configuration starts one Perl process for each mail handled - normally this won't be a problem, but if you're processing large volumes of mail, or have a system which is already at or near capacity, it may be enough to tip the balance and cause performance problems (Translation: Long ago I installed Mail::Audit on an old, spare machine I was using as a mail-server, received 200 e-mails in less than a minute, and spent quite a while waiting for the system to stop gazing at its navel and start responding to the outside world again). If your mail comes to you via POP3, or can be made to do so (possibly by installing a POP3 daemon if you do not have one already), then a simple script supplied with Mail::Audit called popread provides a base you can use to feed articles from a POP3 server into Mail::Audit in a single Perl process, improving performance. I didn't do this myself, as I wanted to use what appeared to be the 'recommended' approach to Mail::Audit setup - the one that is, if not actively promoted in the documentation, most strongly suggested by it, of running a Mail::Audit script from a user's .forward file.

A popular Mail::Audit addition is SpamAssassin (the same codebase as PerlMx's mail processing is loosely based on) - this comes as a Mail::Audit plugin, among other forms.

Mail::Audit makes it easy to write mail filters that work on a per-user basis, whereas PerlMx by default applies to all mail processed on a given mailserver.

If you wanted to install Mail::Audit systemwide, then many mail-servers (such as exim) provide a way to configure a custom local delivery agent on flexible criteria. For example, this article provides some documentation on how to do this with exim.

Pages: 1, 2

Next Pagearrow