Troubleshooters.Com
Presents
Linux Productivity
Magazine
Volume 3
Issue 2, February 2004
Spamassassin
|
Copyright (C) 2004 by Steve Litt. All rights reserved.
Materials from guest authors copyrighted by them and licensed for perpetual
use to Linux Productivity Magazine. All rights reserved to the copyright
holder, except for items specifically marked otherwise (certain free software
source code, GNU/GPL, etc.). All material herein provided "As-Is". User assumes
all risk and responsibility for any outcome.
[ Troubleshooters.Com
| Back Issues |Troubleshooting Professional Magazine
]
THERE IS A STRICT JUNE DEADLINE. THE
TIME TO START IS NOW!! -- Laurence Canter
(From the 4/12/1994 Canter and Siegel Greencard Lottery Usenet spam)
|
CONTENTS
Editor's Desk
By Steve Litt
Sanford A. Wallace.
How that name takes me back. Back to a simpler time, a more innocent time.
The radio featured the Cranberries, Ace of Base, and Boyz
II Men. Your TV sported a brand new sitcom called "Friends". The year was
1994, I was on Compuserve, and read every email as if it were gold. Even
the ads. It was a time when the phrase "You've got mail" made you happy.
Many of those ancient ads came from Sanford A. Wallace. Great ads. Well written
ads. Ads you could enjoy.
For a while.
After several months of ads from Sanford A. Wallace, I wrote him a rather
blunt email telling him not to send me any more ads. As I remember (and my
memory is rather vague), he wrote me back.
Sanford A. Wallace was the proprietor of Cyber Promotions, a pioneer in the
field of unsolicited commercial email. He led the way for countless others
to follow.
Not that any of this was important. In 1994 I could go through a day's email
in 5 minutes. Whole days went by without unsolicited advertisements.
2004. Now we have a name for what Sanford A. Wallace did -- spam.
80% of my email are spams. I get at least 400 spams per day -- way too many
to examine closely.
But I can't simply delete casually. Brand new customers contact me through
email. If one of those were deleted, it could cost me ten thousand dollars.
What's a businessperson to do?
Spam filtering is the answer. Create an automated program that flags probable
spam messages in such a way that your email client can place them in a probable
spam folder. When going through that folder, you quickly glance to see whether
a message looks like something needed, and if not, delete it. I often delete
8 or 10 probable spams at a time, based only on their subjects and senders.
For messages that aren't probable spam, you use more consideration in making
the deletion decision.
When a non-spam message is flagged as probable spam, that's called a false
positive. It serves as a warning that your spam marking criteria are too loose.
Put the address in the whitelist, and use that message to train Spamassassin
not to flag similar messages.
Today spammers make war on the public. We maintain blacklists -- they send
out emails designed specifically to poison spam detectors into creating false
positives so that people will dump their spam detectors. How I long for the
innocent days of Sanford A. Wallace.
This issue of Linux Productivity Magazine details Spamassassin: how to install
it, how to configure it, and how to use it. No two SpamAssassin installations
are alike because of how differently email is handled in different situations.
But this issue will guide you through a few of the most common scenarios.
Using Spamassassin, you can put back the genie that Sanford A. Wallace released
in a year when Newt Gingrich briefly became the most powerful politician
on the planet.
So kick back, relax, and read this month's Linux Productivity Magazine. And
remember, if you're a free software user, contributor, or evangelist, this
is your magazine. Enjoy!
Help Publicize
Linux Productivity Magazine
By Steve Litt
Loyal readers, I need your help.
For months I've publicized Linux Productivity Magazine, expanding it from
a new magazine to a mainstay read by thousands. There's a limit to what I
can do alone, but if you take one minute to help, the possibilities are boundless.
If you like this magazine, please report it to one of the Linux magazines.
Tell them the URL, why you like it, and ask them to link to it.
I report it to them, but they don't take it very seriously when an author
blows his own horn. When a hundred readers report the magazine, they'll sit
up and take notice.
Reporting is simple enough. Just click on one of these links, and report
the magazine. It will take less than 5 minutes.
If you really like this magazine, please take 5 minutes to help bring it
to a wider audience. Submit it to one of the preceding sites.
GNU/Linux, open source and free software
By Steve Litt
Linux is a kernel. The operating system often described as "Linux" is that
kernel combined with software from many different sources. One of the most
prominent, and oldest of those sources, is the GNU project.
"GNU/Linux" is probably the most accurate moniker one can give to this
operating system. Please be aware that in all of Troubleshooters.Com,
when I say "Linux" I really mean "GNU/Linux". I completely believe that without
the GNU project, without the GNU Manifesto and the GNU/GPL license it spawned,
the operating system the press calls "Linux" never would have happened.
I'm part of the press and there are times when it's easier to say "Linux"
than explain to certain audiences that "GNU/Linux" is the same as what the
press calls "Linux". So I abbreviate. Additionally, I abbreviate in the same
way one might abbreviate the name of a multi-partner law firm. But make no
mistake about it. In any article in Troubleshooting Professional Magazine,
in the whole of Troubleshooters.Com, and even in the technical books I write,
when I say "Linux", I mean "GNU/Linux".
There are those who think FSF is making too big a deal of this. Nothing
could be farther from the truth. The GNU General Public License, combined
with Richard Stallman's GNU Manifesto and the resulting GNU-GPL License,
are the only reason we can enjoy this wonderful alternative to proprietary
operating systems, and the only reason proprietary operating systems aren't
even more flaky than they are now.
For practical purposes, the license requirements of "free software" and "open
source" are almost identical. Generally speaking, a license that complies
with one complies with the other. The difference between these two is a difference
in philosophy. The "free software" crowd believes the most important aspect
is freedom. The "open source" crowd believes the most important aspect is
the practical marketplace advantage that freedom produces.
I think they're both right. I wouldn't use the software without the freedom
guaranteeing me the right to improve the software, and the guarantee that
my improvements will not later be withheld from me. Freedom is essential.
And so are the practical benefits. Because tens of thousands of programmers
feel the way I do, huge amounts of free software/open source is available,
and its quality exceeds that of most proprietary software.
In summary, I use the terms "Linux" and "GNU/Linux" interchangably, with
the former being an abbreviation for the latter. I usually use the terms
"free software" and "open source" interchangably, as from a licensing perspective
they're very similar. Occasionally I'll prefer one or the other depending
if I'm writing about freedom, or business advantage.
Obligatory Abbreviations
By Steve Litt
I wish I didn't have to write this article. In my opinion the abbreviations
MTA, MDA and MUA are so similar sounding as to be utterly confusing. But
when you hear someone glibly rattle off these abbreviations, perhaps it's
best to know them. I try not to use them. Everyone knows what an "email client"
is. They might not know it as an "MUA".
Abbrv
|
Stands for
|
Function
|
Examples
|
Label on
preceding
Diagram
|
MTA
|
Mail Transport Agent
|
Moves email from one host to another via the SMTP protocol.
|
Sendmail, qmail, Exim, Postfix, Exchange
|
SMTP
|
MDA
|
Mail Delivery Agent
|
Delivers email to the user's mail queue.
|
Procmail
|
Procmail
|
MUA
|
Mail User Agent
|
AKA: email client. This is what you use to read and
compose email. Most modern email clients (MUA, if you must) can grab mail
from a POP3 or IMAP server.
|
Kmail, mutt, pine, Evolution, Eudora, Outlook
|
Email
Client
|
Pop server
|
Pop server
|
Just to make things more difficult, the agent serving
up data from the server's user queue to email clients doesn't have a cutesy
name, but instead is called the Pop server. Ughh!
|
ipop3d (Red Hat, Mandrake)
|
POP3
|
IMAP server
|
IMAP server
|
Analogous to the Pop server, but uses the IMAP protocol
instead.
|
imapd (Red Hat, Mandrake)
|
n/a
|
I'll try not to use abbreviations MTA, MDA and MUA throughout this document.
They're just too similar, and therefore confusing. Whenever possible, I'll
use diagrams.
Email Basics
By Steve Litt
NOTE
The following documentation is Sendmail based and Redhat/Mandrake centric.
When this documentation talks about SMTP, it's referring to Sendmail's implementation
of SMTP. When this documentation refers to Procmail, it's referring to the
program packaged with Redhat to drop email into local mail queues.
That being said, the principles are sound. If you use qmail, Postfix, exim
or whatever, just substitute your SMTP server's components.
|
Before understanding Spamassassin, you must understand the basics of email
transmission.
Definition
|
Email client: A computer program to compose
and read email messages. Kmail, pine, mutt, and Outlook are examples of email
clients. Most modern email clients have the ability to send a composed email
to a SMTP server, and to retrieve an email from a POP3 server. This document
assumes your email client has those abilities. |
The Email client is how users interface with email. For many users, it's the
only visible component of email.
Looking a little deeper, when you send an email after composing it, what you
are really doing is pushing the email, as a file, to a SMTP server located
at your ISP.
Definition
|
SMTP server: Simple Mail
Transport Protocol server. A computer
program that runs continuously, transferring email. When an email client
pushes an email onto the SMTP server, the SMTP server reads the recipient
address and pushes the email to the SMTP server on the recipient's ISP, which
then drops the email in the correct mailbox. The SMTP is described fully
in RFC 821.
Sendmail, qmail, Postfix and exim are all examples of SMTP servers.
This document is Sendmail-centric, but the principles can be applied universally.
|
NOTE
In all diagrams in this document, blocks labeled SMTP refer to SMTP servers,
not to the SMTP protocol. If one really wanted to get picky, one could make
it look like this:
.--------. .--------. .--------. .--------. .--------. | Email | SMTP | SMTP | SMTP | SMTP | | proc | \ Email \ | client |--------->| server |--------->| server |--->| mail |---->/ queue / `--------' protocol `--------' protocol `--------' `--------' '--------'
In the preceding diagram, on Sendmail systems the "SMTP server"
is sendmail.
For simplicity, in this document we leave out the protocol indicator, and
abbrieviate "SMTP server" as "SMTP":
.--------. .--------. .--------. .--------. .--------. | Email | | SMTP | | SMTP | | proc | \ Email \ | client |--------->| |--------->| |--->| mail |---->/ queue / `--------' `--------' `--------' `--------' '--------'
|
The following diagram shows the route an email takes from the time Chandler
sends it to the time Monica opens it.
Chandler composes his email on his email client (kmail, Evolution, mutt, Eudora,
Outlook), and sends it. It's sent to the SMTP server on the ISP that Chandler
uses. Chandler's SMTP evaluates the message, notes that it's not destined
for anyone local, and retransmits it, this time to the SMTP server at Monica's
ISP. Monica's SMTP evaluates the message and deduces it IS for someone local,
namely, Monica. So Monica's SMTP passes Chandler's email message to the procmail
program, which deposits the email in Monica's mail queue on her ISP's
server.
This queue would typically be/var/spool/mail/monica, but this is
configurable. Some Sendmail configurations deposit email directly in the
user's home directory tree. For the remainder of this document we'll assume
that incoming mail is stored in a file whose name is the same as the receiving
user, and that this file is kept in directory/var/spool/mail. Note
that on many systems, symlink/var/mail points to directory /var/spool/mail.
For brevity's sake, this document often refers to the shorter symlink.
At this point, Monica could read Chandler's message using her ISP's webmail
program. But Monica wants to read and store her programs locally, so she runs
her email client, and clicks the "check mail" icon. Her email client then
reaches out on port 110 to contact the POP3 server at her ISP and retrieve
the email stored in her folder at the ISP. The email is placed into a folder
within the $HOME/Mail tree according to the configuration and filters set
up in Monica's email client.
Perhaps Monica wants more control over her email. If so, she could choose
not to have her email client retrieve mail from the POP3 server directly.
Perhaps she would instead use fetchmail and procmail between
the POP3 server and her email client. This gives her many opportunities, including
the opportunity to insert spamassassin:
In the preceding diagram, Monica has chosen to use fetchmail to retrieve
email from her ISP's POP3 server, and has chosen to use fetchmail's
default behavior of passing the email on to the procmail program.
The procmail program's purpose is to deposit incoming email into
the user's email queue file, in this case /var/mail/monica, after
calling any necessary filtering programs. Monica chooses to use spamassassin
as a filtering program called by procmail.
Monica now modifies her email client (kmail for example) so instead of retrieving
mail from the ISP's POP3 server, it retrieves it directly from the email
queue on her local Linux box.
NOTE
In real life, Monica's fetchmail program would probably send
email to some sort of local SMTP server listening on port 25, instead of
sending it directly to procmail as shown in the preceding diagram.
However, fetchmail can be configured to output directly to the procmail
executable, and that is what we have chosen to show throughout this document.
If, on Monica's machine, there were nothing listening on port 25 (in other
words, no sort of SMTP server was running), Monica could run her fetchmail
like this:
fetchmail -d60 -m "/usr/bin/procmail -d %T"
The preceding command causes fetchmail to dump email directly to
the procmail executable, bypassing port 25.
For the purposes of understanding SpamAssassin, a conceptual mapping of fetchmail
dumping directly to procmail is easiest to understand.
|
Notice that an ISP could implement Spamassassin in exactly the same way, except
that the SMTP server would replace fetchmail. Such a configuration
provides Spamassassin filtering to all the ISP's POP3 users. Let's say that
Rachel, Ross, Phoebe and Joey have a different ISP than Chandler. Watch the
message flow:
NOTE
Mandrake and RedHat use symbolic link /var/mail to point to the
real directory, /var/spool/mail. The preceding diagram uses the
shorter names (/var/mail/ross etc) to save space. If your distro
doesn't have this handy symlink, use /var/spool/mail/ross for Ross's
mail queue.
|
The preceding diagram is just one of many ways to incorporate spamassassin
onto a mail server. It has the advantage of not needing to reconfigure the
server's sendmail configuration. On the other hand, it might not
be the most effective use of Spamassassin, and certainly for performance's
sake you'd substitute spamc for spamassassin.
Notice the preceding configuration filters all local email through Spamassassin,
while leaving email "just passing through" unchanged. Unless you want to police
the net, that's OK. The preceding spam filters email before it reaches user
queues (/var/mail/username) on the server, so any web email system
can also take advantage of Spamassassin.
Servers and Protocols
The following is a list of protocols and their usual port numbers:
Protocol
|
Port
|
Typical
Server
|
SMTP
|
25
|
Sendmail
|
POP3
|
110
|
ipop3d
|
IMAP3
|
220
|
imapd
|
SMTP over SS
|
465
|
|
POP-3 over SS
|
995
|
|
IMAP over SSL |
993 |
|
One way to ascertain whether a server is listening to a port is to run nmap
on localhost:
Starting nmap V. 3.00 ( www.insecure.org/nmap/ ) Interesting ports on obscured.obscure.fyi (127.0.0.1): (The 1015 ports scanned but not shown below are in state: closed) Port State Service 22/tcp open ssh 25/tcp open smtp 80/tcp open http 110/tcp open pop-3 111/tcp open sunrpc 443/tcp open https 631/tcp open ipp 783/tcp open hp-alarm-mgr 1011/tcp open unknown
Nmap run completed -- 1 IP address (1 host up) scanned in 1 second [root@newbox root]#
|
The preceding shows the SMTP server running on port 25, and the POP3 server
running on port 110.
SpamAssassin Basics
By Steve Litt
At its simplest, SpamAssassin is an executable file, a Perl script UNIX type
filter to be specific, that takes the email file coming in through stdin,
parses and evaluates it every way from Sunday, and sends it to stdout, adding
a few lines to the mail header describing where it was filtered, what tests
were performed, and most importantly, a spam score which correlates fairly
well to the likelihood of the email being spam. That spam score header looks
like this:
X-Spam-Level: ****
The number of stars represents the likelihood that it's spam. Typically, a
plain text email, to a single recipient, not containing words like "mortgage",
"Viagra", offers of 15% of a Ugandan fortune, mention of various provocative
body parts and the like, will have no stars. An email with more than
10 stars is extremely likely to be spam. With 20 stars, I often throw it away
sight unseen, although this runs the risk of throwing away good email if
SpamAssassin somehow goes bad.
In your email client, you could use the following filters in the following
order:
X-Spam-Level: ********************
X-Spam-Level: **********
X-Spam-Level: *****
The first one could send everything with 20 or more stars to the bitbucket.
The second one might send everything with 10 to 20 stars to a folder called
"probablyspam". The third could send everything with 5 to 10 stars to a folder
called "likelyspam".
To foster a better understanding, let's start with some terminology:
Email message: |
An electronically transmitted message comprised
of a header and a body, and possibly one or more attachments. |
Email header: |
The part of the email describing the email itself
-- addressees, subject, priority and the like. |
Email body: |
The part of the email containing the message told
by the sender to the receiver. |
Email attachment:
|
A file that piggybacks along with the email.
|
Spam: |
An unasked for, non-personalized commercial email
message. Often concerns size of body parts, sexual performance, or mortgage
rates. |
File: |
A disk based chunk of data. Email messages are
often stored as files (Maildir), or as parts of a larger file (mbox, and
inside mail queues). |
Data repository: |
A container for data. A data repository does not
alter or act on the data. |
Process: |
A computer program or system that receives data
from a data repository or another process, and gives it to a data repository
or process, usually after altering or acting on the data |
Push: |
When a process initiates a transfer of data from
itself to another process or a data repository. SMTP (Simple Mail Transfer
Protocol) servers push email, but receive email passively. |
Pull: |
When a process sucks data out of another process
or a data repository. Note that in any single data transfer between two
processes, one process either pushes or pulls, and the other is passive.
POP and IMAP servers passively wait for email clients to pull email data
from them, and then pull that data from the user's mail queue. |
Filter
(Unix terminology): |
A process receiving data from one process or data
repository and transferring it to another process or repository. The filter
usually either alters the data, takes action on the data, or both.
|
Filter
(Email terminology): |
The act of directing an email to a certain mailbox,
or to the garbage can (/dev/null), based on the attributes
of the email. As a noun, it refers to a single configuration to route a certain
type of email to a certain mailbox.
|
Spamassassin: |
A (Unix style) filter that is passed an email file,
analyzes that email file, determines the likelihood that the email file is
spam, records that determination in the header of the email file, and then
pushes that email file to another process. The receiving process, or a process
downstream from that one, typically routes that email file to a specific
mailbox, depending on Spamassassin's determination.
|
SMTP:
|
Simple Mail Transfer
Protocol. A method of transferring emails between
two servers (called SMTP servers). Your email client pushes an outgoing email
message to a SMTP server, which then pushes the email to the SMTP server local
to the recipient. Possibly the email message flows through one or more relay
SMTP servers. SMTP servers sit passively until another process pushes email
to them, and then either relay the email to another SMTP server, or store
the email for later pickup. You can read the details in RFC 0821. Chances
are the SMTP server you use is located at your ISP.
This document is based on the Sendmail implementation of SMTP. If you have
a different SMTP implementation (Postfix, qmail, or exim), the specifics
of this document must be changed to fit your situation, but the principles
are valid.
|
POP, POP3:
|
POP stands for Post Office
Protocol. POP3 refers to version 3 of that protocol.
RFC 1939 describes version 3. Your email client pulls your email from a POP3
server, which in turn obtains your email from the mail queue (typically /var/spool/mail/username)
where the SMTP server deposited your email. Chances are the POP server is
located at your ISP.
|
IMAP:
|
IMAP is similar to POP, but is more versatile
than POP. For instance, IMAP sends to the email client only the header information
(Subject, sender, size and the like). From there, the user decides, for each
email, whether to delete it, or whether to download the body. This is a huge
plus for a person checking email from multiple computers.
|
Web Mail:
|
A web app enabling a user with a web browser to directly
view his or her email messages in his mail queue on the ISP's server.
|
As mentioned, the spamassassin executable is a filter, meaning it
can be inserted anywhere between two processes that pipe email. For my desktop
system it looks something like this:
In the preceding diagram, fetchmail pulls your mail from your ISP's
pop server, and pushes it on to procmail. Procmail deposits it in
the user's mail queue (this is a file) after sending it through a pipeline
of various filters. One of those filters is spamassassin, which inserts
several spam related headers, including the X-Spam-Levelheader. This
header contains a number of starts corresponding to the likelihood that the
email is spam. Kmail, or whatever email client you use, can then deposit email
in its own mailboxes based partially on the headers inserted by SpamAssassin.
Quick and Dirty Spamassassin
By Steve Litt
Time is money. This article gets you up and running with Spamassassin in
record time. Here are the steps:
- Download, compile and install Spamassassin
- Test the Spamassassin program with a file containing a single email
message
- Pipe each email through the spamassassin command by inserting it somewhere
within the travel of emails. For instance, on my box, email goes thru fetchmail
to procmail to spamassassin to kmail.
- Once step 3 is running well, improve performance by substituting the
spamd daemon for the pipe through spamassassin. You'll also
need the client side, spamc.
Download, compile and install Spamassassin
Some modern Linux distributions come wth Spamassassin. If so, just use your
package management to install it. Otherwise, download it from http://www.spamassassin.org.
The file will probably be called something like Mail-SpamAssassin-2.61.tar.bz2.
Logged in as an ordinary user, put that file in your home directory and execute
the following command:
tar xjvf Mail-SpamAssassin-2.61.tar.bz2
The preceding command creates a directory tree called Mail-SpamAssassin-2.61
inside your home directory.
Compiling is pretty easy. Do the following:
- export LANG=en_US
- perl Makefile.PL
- make
- make test
- make install
If you've done it correctly, typing spamassassin will run a program
that appears to do nothing but hang. Then type Ctrl+D in order to send an
EOF stdin, and after a few seconds the program outputs some text. Here's what
it did on my computer:
[slitt@newbox slitt]$ spamassassin X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on newbox.domain.cxm X-Spam-Level: ** X-Spam-Status: No, hits=2.9 required=5.0 tests=DATE_MISSING,FROM_NO_LOWER autolearn=no version=2.61
[slitt@newbox slitt]$
|
If you get that message, you know you've done something right.
Test the Spamassassin program with a single email file
There are a million ways to get a single email message, but in case you cannot
find a way, the following works. This assumes you have the mutt email client
-- most people do. In the following procedure, text between angle brackets
are comments, and you do not type them. Here's the procedure:
- mutt
- m <to compose an email>
- yourname@localhost <at the prompt, to address the email>
- Test Email <at the prompt, this is the subject>
- Test Email <in the VI environment, to create the message body>
- :wq <to exit the body editing>
- y <to send>
- ?i <go in and out of help to refresh the screen, revealing reception
of the new email>
- s <to save the email>
- sa_test <in response to the prompt for which mailbox to save to.
Note that you'll need to backspace over the default mailbox>
- y <when asked if you want to create the sa_test mailbox>
- q <to exit qmail>
- n <when asked if you want to purge the deleted message>
- The file /home/yourname/Mail/sa_test now contains exactly one email
Now test that file with the following command:
cat /home/yourname/Mail/sa_test | spamassassin
Here's the output I got:
[slitt@newbox slitt]$ cat /home/slitt/Mail/sa_test | spamassassin From slitt@newbox.domain.cxm Sat Jan 31 20:14:38 2004 Return-Path: <slitt@newbox.domain.cxm> Received: from newbox.domain.cxm (newbox.domain.cxm [127.0.0.1]) by newbox.domain.cxm (8.12.8/8.12.8) with ESMTP id i111EcBq021319 for <slitt@newbox.domain.cxm>; Sat, 31 Jan 2004 20:14:38 -0500 Received: (from slitt@localhost) by newbox.domain.cxm (8.12.8/8.12.8/Submit) id i111Eck6021317 for slitt@localhost; Sat, 31 Jan 2004 20:14:38 -0500 Date: Sat, 31 Jan 2004 20:14:38 -0500 From: slitt@newbox.domain.cxm To: slitt@newbox.domain.cxm Subject: Test Message-ID: <20040201011438.GA21311@newbox.domain.cxm> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i Content-Length: 11 Lines: 1 X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on newbox.domain.cxm X-Spam-Level: X-Spam-Status: No, hits=0.3 required=5.0 tests=NO_REAL_NAME autolearn=no version=2.61
Test email
[slitt@newbox slitt]$
|
Note that the X-Spam-Level has no stars, and the hits is listed as 0.3.
Now use mutt to create the following spam-like email to yourself:
To: you@localhost
Subject: ADV: As seen on TV, Free Instant GUARANTEED H.O.T
B.A.B.E.S for Your Family. Lose Pounds
Body: This is not spam! We strongly
oppose the use of spam email too. This email conforms with House Bill 4176,
HR 3113, the UCE-Mail Act. We GUARANTEE it.
There is no catch! You can make lots of money. Nobody's perfect, but this
is a cure for impotence. Order our report right now, subject to credit approval.
This is a free investment with no credit check, and we do accept credit cards.
Use our program to consolidate your bills, and stop those creditors from calling.
No inventory, just invaluable marketing information with huge potential earnings.
It even reverses aging while you sleep. Stop snoring, lose body fat, and
get paid for hidden assets with an insurance policy from our affiliate partners. |
The preceding is a characterature of a spam. It has it all, snoring, fat,
money, insurance, affiliate partners, non-spam protests, and G.A.P.P.Y T.E.X.T.
Following the previous instructions, save this one as /home/yourself/Mail/sa_test_spam.
Then run it through SpamAssassin, and see what happens:
[slitt@newbox slitt]$ cat /home/slitt/Mail/sa_test | spamassassin From slitt@newbox.domain.cxm Sat Jan 31 21:10:48 2004 Received: from localhost [127.0.0.1] by newbox.domain.cxm with SpamAssassin (2.61 1.212.2.1-2003-12-09-exp); Sat, 31 Jan 2004 21:14:26 -0500 From: slitt@newbox.domain.cxm To: slitt@newbox.domain.cxm Subject: ADV: As seen on TV, Free Instant GUARANTEED H.O.T B.A.B.E.S for Your Family. Lose Pounds Date: Sat, 31 Jan 2004 21:10:48 -0500 Message-Id: <20040201021048.GA21383@newbox.domain.cxm> X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on newbox.domain.cxm X-Spam-Level: ************************************************** X-Spam-Status: Yes, hits=66.1 required=5.0 tests=ACCEPT_CREDIT_CARDS, ADVERT_CODE,AS_SEEN_ON,BAD_CREDIT,CONSOLIDATE_DEBT,EARNINGS,EXCUSE_15, FREE_INVESTMENT,GAPPY_SUBJECT,GUARANTEE,HIDDEN_ASSETS, INVALUABLE_MARKETING,LOSEBODYFAT,LOSE_POUNDS,NO_CATCH,NO_CREDIT_CHECK, NO_INVENTORY,NO_REAL_NAME,OUR_AFFILIATE_PARTNERS,REVERSE_AGING, STOP_SNORING,SUBJ_2_CREDIT,SUBJ_AS_SEEN,SUBJ_FREE_INSTANT, SUBJ_GUARANTEED,SUBJ_YOUR_FAMILY,THIS_AINT_SPAM,WE_HATE_SPAM, WHILE_YOU_SLEEP autolearn=no version=2.61 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----------=_401C6102.5F49A468"
This is a multi-part message in MIME format.
------------=_401C6102.5F49A468 Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 8bit
Spam detection software, running on the system "newbox.domain.cxm", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or block similar future email. If you have any questions, see the administrator of that system for details.
Content preview: This is not spam! We strongly oppose the use of spam email too. This email conforms with House Bill 4176, HR 3113, the UCE-Mail Act. We GUARANTEE it. There is no catch! You can make lots of money. Nobody's perfect, but this is a cure for impotence. Order our report right now, subject to credit approval. This is a free investment with no credit check, and we do accept credit cards. Use our program to consolidate your bills, and stop those creditors from calling. No inventory, just invaluable marketing information with huge potential earnings. It even reverses aging while you sleep. Stop snoring, lose body fat, and get paid for hidden assets with an insurance policy from our affiliate partners. [...]
Content analysis details: (66.1 points, 5.0 required)
pts rule name description ---- ---------------------- -------------------------------------------------- 2.8 SUBJ_FREE_INSTANT Subject contains "Free Instant" 0.3 NO_REAL_NAME From: does not include a real name 2.6 SUBJ_AS_SEEN Subject contains "As Seen" [slitt@newbox slitt]$ cat /home/slitt/Mail/sa_test | spamassassin
|
The preceding scored 66.1 points. You've proven the concept. Now you're ready
to put Spamassin to work.
Insert Spamassassin in the email processing chain
Email travels in a series of transfers. Spamassassin is an executable file.
It's a filter that modifies the data passing through it. It is implemented
by inserting it between two of the transfer points.
Personal Protection
If you're a typical user, you can implement Spamassassin by inserting it
as a filter called by the Procmail program. In this scenario, Fetchmail pulls
the mail from your ISP's SMTP server, and sends the mail to Procmail. Procmail
sends the email through Spamassassin before depositing it in your Linux box's
mail queue file/var/spool/mail/yourname. Then, at a later time, your
kmail email client (or whatever email client you use) pulls the mail out
of /var/spool/mail/yourname.
In your email client, set a filter so that if the X-Spam-Level header contains
more than a certain number of stars, the mail is sent to a special folder
(usually Trash). In the Trash folder, you can quickly scan to reassure yourself
there are no false positives, and then delete the mail entirely.
Configure Fetchmail with the following ~/.fetchmailrc
~/.fetchmailrc
|
# Configuration created Mon Jun 2 09:16:09 2003 by fetchmailconf set postmaster "yourname" set bouncemail set no spambounce set properties "" set daemon 600 poll pop.myisp.com with proto POP3 user 'yourname@yourdomain.com' there is 'yourname' here limit 50000000 warnings 3200 expunge 60
|
In the preceding, Fetchmail queries every 600 seconds (10 minutes). It grabs
email from POP server pop.myisp.com using the POP3 protocol, pulling mail
sent to user 'yourname@yourdomain.com', and sends it on to user yourname
on the local box. It does not pull emails over 50000000 bytes, and after
each set of 60 emails downloaded it deletes those 60 on the server.
Expunging at intervals of 60 minimizes those horrible situations where you
time out or otherwise bomb while downloading 3000 messages after a week's
vacation, and each time it bombs you need to redownload everything.
You run Fetchmail as a daemon by configuring your server to do this:
fetchmail -d60
That command runs it as a daemon and awakens it every 60 seconds.
Unless Fetchmail is configured to do otherwise, its default behavior is to
send the pulled mail on to procmail (if it exists). Typically, Procmail
is implemented as a filter (/usr/bin/procmail). That filter program
is configured with /etc/procmailrc. Here's an example:
/etc/procmailrc
|
LOGFILE=/var/log/procmail.log VERBOSE=ON
# send to spamassasin :0 fw * < 256000 |/usr/bin/spamassassin # |/usr/bin/spamc -f
|
The preceding ~/.fetchmailrc and /etc/procmailrc are sufficient
to place send the mail through spamassassin, placing spamassassin headers
in every file. The next step is to configure Spamassassin. That is done with
~/.spamassassin/user_prefs:
~/.spamassassin/user_prefs |
# SpamAssassin user preferences file. See 'perldoc Mail::SpamAssassin::Conf' # for details of what can be tweaked. ###########################################################################
# How many hits before a mail is considered spam. required_hits 40
spam_level_stars 1 score MICROSOFT_EXECUTABLE 15 score PYZOR_CHECK 5 score RAZOR2_CHECK 5 score HTML_WEB_BUGS 15
|
In the preceding, required_hits is the number of hits required for Spamassasin
to declare the email a spam. The overwhelmingly vast majority of emails with
a score of 10 are pure spam, so why set this so high? I set it high because
I don't want Spamassassin converting the message into an attachment, which
is what it does if it declares an email spam. Checking emails for false positives
would be VERY slow if one needed to go into an attachment for each one. So
I crank it up to 40. If something comes in higher than 40, I feel confident
in deleting it without further investigation.
The spam_level_stars is a boolean declaring whether the X-Spam-Level
line should have stars. It should, because that's the easiest thing for an
email filter to parse for.
The remainder of the preceding file changes the weights of certain tests.
Once Spamassassin has imprinted the email with an X-Spam-Level with a number
of stars corresponding to the number of hits, the final step is to configure
a filter in your email client to send such emails directly to the trash can.
From there, you can quickly scan those files to ascertain there are no false
positives, and then delete the bunch of them.
Thus, you've just set up a Spamassassin for personal protection. But what
about protection for everyone? That comes later, but first, let's discuss
throughput...
Improve performance by substituting the spamd daemon for
the pipe through spamassassin.
Can you imagine starting and stopping spamassassin 150 times when you download
150 emails? I've done it while watching my handy-dandy IceWM CPU monitor.
When I download, the constant starting and stopping of the spamassassin
filter pegs both CPU's. Spamassassin is a huge program that takes bigtime
resources just to load.
What's needed is a Spamassassin that runs constantly. That's what spamd
is. But how do you filter emails through a constantly running program? It's
simple. spamd is run as a server, and a tiny program called spamc
acts as the client. You filter through tiny spamc, and spamc
sends the information, via a socket, to spamd. The spamd
server filters the email and sends it, via the socket, to the spamc
client that sent it.
To repeat, you substitute spamc -f for spamassassin in
yourprocmailrcfile:
/etc/procmailrc
|
LOGFILE=/var/log/procmail.log VERBOSE=ON
# send to spamassasin :0 fw * < 256000 # |/usr/bin/spamassassin |/usr/bin/spamc -f
|
Here's a diagram:
The result is much lighter CPU usage. But there are two little tricks:
- spamd must be running at all times.
- Configuration cannot be done in ~/.spamassassin/user_prefs,
but instead must be done site wide in /etc/mail/spamassassin/local.cf.
To assure spamd runs all the time, create or obtain a startup
file. Here's a quick and dirty /etc/rc.d/init.d/spamd I created:
|
#!/bin/sh ######################################################################### # # chkconfig: 2345 99 99 # description: "spamd" is the spamassassin daemon # # Start / stop script for spamd # # In order to be distibution independant, the server known a few # extra commands: # start # stop # # ##########################################################################
# Special options, adapt this NAME=spamd PIDFILE=/var/run/spamd.pid
# where the program is located
PROG=/usr/bin/spamd
case $1 in start) echo -n "Starting $NAME " $PROG -d -r $PIDFILE; RETVAL=$?; test $RETVAL && echo [ OK ]; test $RETVAL || echo [FAILED];; stop) echo -n "Stopping $NAME " test -f $PIDFILE && kill `cat $PIDFILE` && RETVAL=$? && rm -f $PIDFILE; test $RETVAL && echo [ OK ]; test $RETVAL || echo [FAILED];; restart) $0 stop sleep 2 $0 start RETVAL=$?;; status) $PROG status;; *) echo "Syntax `basename $0` start|stop|status|restart" RETVAL=1;; esac
RETVAL=$? exit $RETVAL
|
Perhaps a better idea would be to copy the pld-rc-script.sh or
redhat-rc-script.sh or debian-rc-script.sh or
netbsd-rc-script.sh or whatever script in the spamd directory
of your distribution to /etc/rc.d/init.d/spamd. Either way, after
creating that file (and you might be able to find a much better one elsewhere),
make sure it's on at boottime with the following command:
chkconfig spamd on
That brings us to modifying /etc/mail/spamassassin/local.cf.
Copy all your special configuration lines from ~/.spamassassin/user_prefs,
into that file. Also, if you will be doing site-wide bayes filtering, insert
the following two lines:
bayes_path /var/sa_bayes
bayes_file_mode 0666
In the preceding case, make sure /var/sa_bayes is world readable
and writeable (ugh!). Alternatively, let users maintain their own bayes filters
and leave those statements out of /etc/mail/spamassassin/local.cf.
Spamassassin and Sendmail
By Steve Litt
There are many ways to incorporate Spamassassin with Sendmail. Most are beyond
the scope of this magazine, but one was discussed in an earlier article --
simply have Procmail run Spamassassin. Once again, here is the diagram:
Remember, to save space the preceding diagram shows the mail queues as /var/mail/username,
which is the symlink on some distros. The real location is /var/spool/mail/username.
Life After Windows: Who's the Boss?
Life After Windows is a regular Linux Productivity Magazine column,
by Steve Litt, bringing you observations and tips subsequent to Troubleshooters.Com's
Windows to Linux conversion.
By Steve Litt
My interest in Spamassassin started when two different web hosts could not
keep their Spamassassin correctly configured. After researching Spamassassin,
I understand the challenge in applying it to huge numbers of remote users.
It would be difficult for an ISP to consistently maintain Spamassassin such
that most users are pleased most of the time.
As I lost trust in ISPs' Spamassassin, the solution became obvious -- install
my own. Now I'm the boss. When the spam marking criteria
need change, there's no need to go through a restrictive web interface or
beg tech support to please, please, please help me. I just fire up an editor,
make the change, and I'm done.
When a bug appears, there's no begging and pleading. I just troubleshoot it
with logs and other techniques, and, if absolutely necessary, by going into
the source code.
If Spamassassin needs updating, no waiting is necessary. Just download, back
up, ./configure make make install and done.
Having your own Spamassassin makes you more portable. Twice in my life as
a webmaster it's been necessary to very quickly transfer my websites to a
different web host. The more plain-vanilla your website, email, and other
accoutraments, the easier it is to move to a new web host. By moving your
Spamassassin to your local machine, you can depend on your web host for your
core need -- bandwidth.
Life after Windows is being the boss. If you want control over your spam filtering,
you download Spamassassin.
Letters to the Editor
All letters become the property of the publisher (Steve Litt), and may
be edited for clarity or brevity. We especially welcome additions, clarifications,
corrections or flames from vendors whose products have been reviewed in this
magazine. We reserve the right to not publish letters we deem in bad
taste (bad language, obscenity, hate, lewd, violence, etc.).
Submit letters to the editor to Steve Litt's email address, and be sure
the subject reads "Letter to the Editor". We regret that we cannot return
your letter, so please make a copy of it for future reference.
How to Submit an Article
We anticipate two to five articles per issue, with issues coming out monthly.
We look for articles that pertain to the GNU/Linux or open source. This can
be done as an essay, with humor, with a case study, or some other literary
device. A Troubleshooting poem would be nice. Submissions may mention a specific
product, but must be useful without the purchase of that product. Content
must greatly overpower advertising. Submissions should be between 250 and
2000 words long.
Any article submitted to Linux Productivity Magazine must be licensed
with the Open Publication License, which you can view at http://opencontent.org/openpub/.
At your option you may elect the option to prohibit substantive modifications.
However, in order to publish your article in Linux Productivity Magazine,
you must decline the option to prohibit commercial use, because Linux Productivity
Magazine is a commercial publication.
Obviously, you must be the copyright holder and must be legally able to
so license the article. We do not currently pay for articles.
Troubleshooters.Com reserves the right to edit any submission for clarity
or brevity, within the scope of the Open Publication License. If you elect
to prohibit substantive modifications, we may elect to place editors notes
outside of your material, or reject the submission, or send it back for modification.
Any published article will include a two sentence description of the author,
a hypertext link to his or her email, and a phone number if desired. Upon
request, we will include a hypertext link, at the end of the magazine issue,
to the author's website, providing that website meets the Troubleshooters.Com
criteria for links and that the author's
website first links to Troubleshooters.Com. Authors: please understand we
can't place hyperlinks inside articles. If we did, only the first article
would be read, and we can't place every article first.
Submissions should be emailed to Steve Litt's email address, with subject
line Article Submission. The first paragraph of your message should read
as follows (unless other arrangements are previously made in writing):
Copyright (c) 2003 by <your name>. This material
may be distributed only subject to the terms and conditions set forth in
the Open Publication License, version Draft v1.0, 8 June 1999 (Available
at http://www.troubleshooters.com/openpub04.txt/ (wordwrapped for readability
at http://www.troubleshooters.com/openpub04_wrapped.txt). The latest version
is presently available at http://www.opencontent.org/openpub/).
Open Publication License Option A [ is | is not] elected,
so this document [may | may not] be modified. Option B is not elected, so
this material may be published for commercial purposes.
After that paragraph, write the title, text of the article, and a two
sentence description of the author.
Why not Draft v1.0, 8 June 1999 OR LATER
The Open Publication License recommends using the word "or later" to describe
the version of the license. That is unacceptable for Troubleshooting Professional
Magazine because we do not know the provisions of that newer version, so
it makes no sense to commit to it. We all hope later versions will be better,
but there's always a chance that leadership will change. We cannot take the
chance that the disclaimer of warranty will be dropped in a later version.
Trademarks
All trademarks are the property of their respective owners. Troubleshooters.Com(R)
is a registered trademark of Steve Litt.
URLs Mentioned in this Issue
_