Spam filtering

How Cambridge spam filtering works

The majority of email to @cam.ac.uk and @SOMETHING.cam.ac.uk addresses (including DAMTP, DPMMS and Statslab email) passes through the central email server ppsw.cam.ac.uk. This server puts a "tag" on each mail message, indicating how likely it is to be spam (that is, unsolicited and unwanted mail). The spam is not blocked by the university, but instead users can themselves easily arrange for appropriate action to be taken automatically, depending on the tag. Viruses and other malware are blocked: see the UCS page on the central email scanner for details.

The tag is added to the message by the insertion of extra header lines. Most mail programs have an option to allow you to see the header lines in a message; for example, in pine, pressing the "h" key makes them visible. Here is an example of lines that you might see:

X-Cam-ScannerAdmin: mail-scanner-support@ucs.cam.ac.uk
X-Cam-AntiVirus: Not scanned
X-Cam-SpamDetails: not spam, SpamAssassin (score=5.1, required 10,
GAPPY_SUBJECT, HTML_60_70, HTML_IMAGE_ONLY_02, HTML_MESSAGE,
MIME_HTML_ONLY, PLING_PLING, REMOVE_PAGE, SUBJ_FREE_CAP)
X-Cam-SpamScore: sssss


The crucial lines are the third and fourth. The third says the message scored 5.1, giving reasons why this score was reached (eg the message was written in HTML, a typical spam feature). The phrase "not spam" means only that the score was less than 10; this message is still very likely to be spam. The fourth line is the important one for filtering purposes: it gives a string of letter "s"s whose length is the score. This line is easily checkable by filtering programs, which can then take appropriate action. Usually this involves saving email which looks like junk in a folder called "spam".

How to filter spam on Hermes

This is set up via Hermes webmail.

• select Settings from the Application Bar then Mail Processing in the left hand column
• select Junk Email and specify the threshold (score) above which you wish to have mail filtered. Filtering is enabled by default but the default threshold is 10 which lets a lot of spam through. Try setting it to 5 initially or 4 if you are still getting too much spam.
• Select Update

Much more detail about spam filtering on Hermes

How to filter spam on Statslab / DPMMS / DAMTP email

This is done using Exim filters to sort email based on the special header X-Cam-Spam-Score. E.g. create a file called .forward in your home directory containing the following. Or download and edit a longer .forward file with more filtering options (useful if you forward your mail off site but would like to forward just the real email and not the spam).

    # Exim filter
if \$h_X-Cam-SpamScore contains ssssss then
save mail/spam
seen finish
endif


Some people prefer to keep their spam email outside of their mail folder to avoid it counting towards their quota. In this case, first create a directory in a store space in which to store your spam. (Scratch space won't do as it is only accessible from one computer.)

store-space create DAMTP
cd /store/DAMTP/ab999 # Replace ab999 with your CRSid
mkdir -p tmp


Edit your .forward file, replacing the line

	save mail/spam


with the destination for your spam mail eg

	save /store/DAMTP/ab999/tmp/spam


and create a symlink from your mail folder to this destination, e.g.

	ln -s /store/DAMTP/ab999/tmp/spam  ~/mail/spam


Note: Statslab users have historically been told to filter their spam via procmail and have been provided with a .procmailrc file which filters anything with a score of 6 or more into the spam folder. However as far as Eva is aware the DAMTP and DPMMS mail servers do not support procmail, while Exim filters will work anywhere which runs Exim. Please don't try to use procmail and Exim filters at the same time as it may not work as you expect.

Scoring - when to reject?

It is impossible for any computer program to be 100% reliable at distinguishing spam from genuine email, since spammers are constantly searching for new ways to sneak past the filters. The higher a message's score, the more likely it is to be spam.

The default threshold is 5 or 6, at which level real email is very seldom misclassified as spam but quite a lot of spam is missed. Reduce the number of esses in the Exim filter, or the numeric threshold on Hermes, if you want to catch more spam; this will increase the risk of real mail going into the spam folder. Four esses is fairly safe and kills most spam. Note, though, that genuine email is likely to score at least one if it is in HTML format (as is typical with, say, mail sent from a hotmail account). Likewise, mail from a mailing list can score positively under some circumstances.