[Dshield] New email spam

Coxe, John B. JOHN.B.COXE at saic.com
Mon Dec 22 16:40:30 GMT 2003


This serves various purposes.  The most important one to them is that each
message has a unique subject.  So those writing filters for the most
prevalent, by count, subjects entering their MTAs will miss them as their
entire campaign consists of lots of messages, each with a subject count of
one.  

If you want to see broken spam programs, note the subjects that come in
literally with "%RND_UC_CHAR[2-8]" or "%RANDOM_WORD".  Pretty easy to filter
those.

The hardest subjects to filter are those utilizing character encoding in the
subject line.  The quoted-printable is easy enough.  However, base64
encoding requires a decoder as part of the filter.  See RFC 1345 for
encodings.  The most prevalent one used is ISO-8859-1.  In fact, it accounts
for practically all of the encoded spam.  Funny that the spammers haven't
jumped over to use "latin1", which is exactly the same (just an alias for
iso-8859-1), to bypass folks who put in a general iso-8859-1 filter.  There
is sure to be a lot of growth in this area.  It takes every three characters
and transforms them to four other characters.  The entire encoding is
completely unreadable.  But it displays in MS Outlook and will render as the
decoded form when forwarded.

An example might be a Subject like "V2FudCBhIEJJR0dFUiBQRU5JUz8=", which
decoded has the "P" word in it.  (See, for example,
http://makcoder.sourceforge.net/demo/base64.php to decode this or your own
subjects.)  One can defend against this without an inline decoder to some
extent, by filtering on the encoding for " PE" followed by "NIS" (IFBFTklT)
and "PEN" followed by "IS?" (UEVOSVM/), to take advantage of two of three
offsets.  Even then, it only gets this particular uppercase case and with a
bang after it or a space before it.  The third offset would be "ENI"
followed by "S?" (RU5JUz8=), which does get this one.  As long as you are
comfortable with taking the chance that no real mail will come with a
subject ending with uppercase "ENIS?", it all is fine.  (I cannot think if
any such words.  However, "PENI" will also match "PENINSULA" -- something to
be careful about.)  Anyway, spammers will mix case, change punctuation, push
a star pr dash or space between each pair of letters, like "P*E*N*I*S", use
grave, accent, or umlaut over the "E", etc ... all of the tricks they use in
unencoded subjects.  The bottom line is that the only defense is to run
inline decoding prior to any filtering to be effective against this.

The same goes for HTML in the message body.  It should be rendered.
Spammers are obfuscating the content by adding nonsense tags, comments, tag
pairs, or font mods on every letter or two of commonly filtered words and
expressions to bypass filters.  Also hidden text colors are set to the
background color or one or two bits off from it, it is visually equivalent =
invisible.  A pseudo-rendering needs to be done to effectively cancel out
their content.  However, detecting these techniques present in the mail is a
pretty solid consideration for determining it is spam in the first place.

It is ugly out there and the spammers are doing anything they can come up
with to ram their spam past defenses.  One thing I am surprised they have
not done (apparently) yet is custom exploit whitelists.  Suppose they simply
autocrawled the target domain's public website, parsed out all of the words
from all of the pages, discarded dictionary words and then used words
appearing at least a few times as an effective corporate lexicon for the
target domain.  Then they simply insert these randomly in the spam subjects
in hopes that they would escape filtering through gateway whitelists used to
mitigate false positive impacts.  That could easily become a feature for
inclusion in top spam list databases.  For the lists compiled from trawling
through usenet (etc), such a lexicon could be created using the content of
the post, etc.


-----Original Message-----
From: list-bounces at dshield.org [mailto:list-bounces at dshield.org] On Behalf
Of Kenneth Coney
Sent: Monday, December 22, 2003 7:39 AM
To: list at dshield.org
Subject: Re: [Dshield] New email spam

I too have noticed a lot of those.  I also see a lot of spam junk with 
wrptu6 or w-8t5 dnb or similar gibberish in the email subject line.   I 
doubt if any human's english is that bad, so I have to assume it is 
autogenerated.  I suspect it is what we used to call padding. I have no 
idea why or what purpose it is supposed to serve.   Possibly a spam writing 
program is damaged?


Subject: [Dshield] New email spam
From: "Barton L. Phillips" <barton at bartonphillips.com>
Date: Sat, 20 Dec 2003 10:04:59 -0800
To: list at dshield.org

I have been getting 15 to 20 of these emails a day. They all seem to have 
bad links and the plain text and part of the html has a bunch of random 
words. I thought maybe this was an attempt to confuse the Bayes filters. 
Any thoughts?

EMAIL TEXT ****


_______________________________________________
list mailing list
list at dshield.org
To change your subscription options (or unsubscribe), see:
http://www.dshield.org/mailman/listinfo/list




More information about the list mailing list