[unisog] anti-spam question

Christopher A Bongaarts cab at tc.umn.edu
Tue Oct 29 21:15:26 GMT 2002


[Warning: long, but hopefully worth it.]

As Jerome M Berkman once put it so eloquently:

> We have noticed a huge increase in spam this year.  We are trying to
> figure out what to do about it, and we are wondering what others are
> doing, and especially what has proved successful.
[...]
> The server I help administer, uclink.berkeley.edu, has 40,000+ accounts
> (students, staff, and faculty).  Users access their mail via POP and IMAP
> clients such as Eudora, Netscape, and Outlook Express.

At the University of Minnesota, we have a set of servers with about
120,000 accounts for students, staff, faculty, deparments, student
organizations, and alumni.  We offer POP and IMAP service (with and
without SSL) and support pretty much the same client set that you
folks do.

At the beginning of October, we put a new spam-blocking system into
operation.  It is completely homebrew, and is not suitable for general 
distribution, as it uses hooks into our commercial mail routing
software (Syntegra's Mail*Hub).  But the ideas behind it may prove
useful to others.

Two aspects of our system are not commonly found among most solutions
I've heard of:  (1) We make decisions based on several criteria;
i.e. we use MAPS RSS, but don't necessarily block you unless other
criteria based on IP address or other blackhole lists also match, and
(2) it is configurable on a per-user basis.

> Please let me know if you have tried any of the following (or anything
> else) and whether it was useful, especially on a large scale system:
> 
> - blackhole lists, RBLs, DULs, etc.  Which do you use or did you 
> create your own?

We use several lists as part of our criteria, but as noted above, we
do not use any as a sole reason for blocking.  So if a local system
gets listed on one of these, our policy for our IP space will prevent
any messages from being blocked.  Currently we use MAPS RSS, MAPS RBL, 
SBL, ORDB, RSL, and DSBL as inputs to our decisionmaking process, as
well as a dialup-line list (I forget who runs it - MAPS DUL?).

> - if you block, do you block on content, on IP address, or on "from "
> address?

We do not block on content.  Our blocking operates at the SMTP RCPT TO 
level (to implement per-user configurability).   We are planning to
add (user-selectable) SpamAssassin filtering to our blocking policy,
so users can add their own spam filtering criteria to their email
client.

The basis of our criteria for blocking starts with the IP address of
the connecting MTA.  We have a DNS-based database that tells what
policies apply to various IP ranges (so we can say "block all mail
from 211.* that does not have a reverse DNS name, except for 211.x.y.z 
which we know is good", or "block all mail in 24.* that appears in
MAPS RSS, ORDB, RSL, or DSBL").

> - tar pits to slow up the spam arrival rate 

Heck no.  Wastes our time more than theirs.

> - open source applications such as spamassassin or spamcop

SpamAssassin is in testing right now.  Our plan is to have a centrally
administered set of rules/whitelist/blacklist and to simply set a
header or rewrite the subject line for mail identified as spam.
We also add a header with a URL so that if mail is classified as spam
when it is not, the user can report that fact to us and we can adjust
our rules accordingly.

> - commercial solutions, such as those offered by BrightMail,
> TrendMicro, and Sendmail, Inc.

No, except to the extent that we continue to use a commercial sendmail 
replacement (as we have for the last 10 years) and our system
interfaces directly with it.

> Is your system opt-in or opt-out?

We spammed our users (no, the irony is not lost on me) a couple weeks
before we implemented the system, giving them the option to adjust
their mail control settings before it went live.  We have three
options: (1) allow mail from "well-behaved" MTA's only, the default;
(2) allow all mail from everywhere, for the diehard freedom types, and 
(3) allow only mail from umn.edu MTA's, i.e. local only.

Since the default setting is "block from non-well-behaved", it's
techincally opt-out.

Further, every user can set up to 20 exceptions (based on the SMTP
MAIL FROM address) for email that would otherwise be blocked.  This
operates on a per-user basis as well.

How does a user know 

> Do you use spamtraps?  What types?

We plan to use dormant accounts as a means of detecting compliance
with opt-in/opt-out policies.  If an account that has been inactive
for 6 months suddenly gets 

> Do you block spam or put it in a "grey" folder for the user to decide
> what to do with?  If you block, do you block during the SMTP protocol
> or bounce it later or just delete it without notice?

We block at the SMTP RCPT TO level (returning 5xx errors for users
that have blocking enabled).  Then we take the information we have (IP
address/DNS name of MTA, SMTP MAIL FROM address, date/time, and
reason(s) for blocking) and put it in a database that keeps the
previous two weeks of block activity.

Users can query the database to see what messages have been blocked,
and can select messages to add an exception for the sender (and thus
allow future emails from them).  This is very useful from a marketing
standpoint, too: one of the complaints from users early on (while we
were slowing turning up the blocking criteria) was that it wasn't
doing anything, or "made it worse" (since the optprofessionals spam
started shortly after we turned it on, and we didn't block it directly
right away, although reverse DNS checks stopped some of it).  This
page gives us a tangible way to say to users "*you* *personally* had N
messages blocked in the last two weeks" and show them that it's
working.

Exceptions are especially effective for users who have selected "allow
umn.edu MTA's only", as they can allow the few outside users who send
them mail and block everything else.  One user doing this had over
2,000 messages blocked the first two weeks of implementation; all of
them were spam.  He's down to about 800 in the last two weeks right
now.  My other favorite benchmark user has about 500.  I personally
have 229.

Currently, there are 2,944,629 blocked emails in the database (for the
last two weeks), for 54,258 different accounts.  That's a lot of saved
mail spool disk space and email server cycles. 812 users have granted
1187 exceptions.

The 5xx error that is returned to the blocked sender contains a URL
that if viewed will show the reason for the block and allow the sender
to request an exception of the recipient (this involves a step where
we send mail to the origination address to verify its validity).  The
intended recipient can then either allow the exception, actively
reject the request for exception, or ignore it (and thus silently
reject the request).  The URLs involved are encrypted to prevent
forgery.  We have had about 150 hits on the initial URL, and about
half of them actually requested an exception.

There is also a header in every message with a URL that allows users
to report a particular message as spam.  The result of visiting the
URL is a page that varies based on our classification of the sending
MTA.  Unclassified MTA's will tell the user "We Will Investigate";
MTA's that we know have a working abuse@ address will tell the user to
report it there; known or suspected "good" bulk mail sites will tell
the user to try to unsubscribe from the list (the old advice about
"don't unsub, the spammers will know your address is good" is bogus -
the spammers know your address is good as soon as they get the 200
repsonse on the RCPT TO, and most of them don't care, they just keep
blasting away; in fact, failure to accept a delivery status
notification is one of the reasons we use for blocking mail).

Regardless of what we tell the user, we log the fact that they
reported it as spam, and we can use that information to make decisions
on our blocking and classification.  For example, if we have
classified a site as "try to unsub", and we get repeated spam reports
from the same user for messages, that may indicate that the site is
not actually unsubbing the user, and maybe we should block them.

In the past 11 days, we have had about 38,750 spam reports.

The future SpamAssassin option will allow for users to implement their 
own "grey" folder.

> How do you prevent harvesting of addresses in your University's web
> pages?  In departmental directories on the web?  From ldap
> directories?

Currently we have nothing preventing this other than manual detection
and nullrouting.  In the future, I plan to add an authentication
option to our directory lookup web page, where unauthenticated users
get a very small number of searches per time period, and authenticated 
users get unlimited searches.

----

As you might have guessed, we spent a lot of time thinking about this, 
and continue to assimilate user feedback and observe patterns in
blocked mail to refine things.  The goal is not so much to eliminate
100% of spam, so much as it is to reduce the amount of spam we have to 
process and store, and prevent wasting end users' time.  We try to err 
on the side of permissiveness, but we can feel somewhat safe knowing
that there is a way for users to allow mail that gets blocked as
"collateral damage".

Hopefully other universities can take some of these ideas and
implement them in ways that make sense at their institutions.

I may try to convince my cohorts to get together and write something
up for a future USENIX or something.  But we will definitely wait
until the system has been running for a while so we have more
experience with it.

%%  Christopher A. Bongaarts  %%  cab at tc.umn.edu       %%
%%  Internet Services         %%  http://umn.edu/~cab  %%
%%  University of Minnesota   %%  +1 (612) 625-1809    %%



More information about the unisog mailing list