[Dshield] New email spam

Coxe, John B. JOHN.B.COXE at saic.com
Wed Dec 24 16:22:33 GMT 2003


I don't want to carry this thread too far, as Johannes did point out these
tangents are outside the subject area (security) of the list.  But I will
briefly address this.  I was not advocating eliminating all email entering
as HTML as spam.  Yes, an overwhelming fraction of it is HTML-encoded.
However, many folks set their default email format (for G-d knows what
reason) to HTML and there are a lot of notification messages that come in
HTML format.  You might be 90% effective with an HTML filter.  But your
false positives would be expected to be unacceptable.  The point was that
the obfuscation employed within the HTML corpus by spammers itself is a spam
signature.

<html>
<head></head>
<body bgcolor=x000000 text=xffffff>
<H1>Want a BIGGER P<!394838>&#x0114;<b></b>N<gwb>I</gwb><font color=x000100
size=1>N</font>S<font color=x020000 size=1>ULA</font>?
</H1></body></html>

This can get by a lot of filters, as an example.  But the very signature
that it is trying to hide text and break up a key word can tag it as spam if
the filter is smart enough.



-----Original Message-----
From: list-bounces at dshield.org [mailto:list-bounces at dshield.org] On Behalf
Of Kenneth Coney
Sent: Wednesday, December 24, 2003 4:06 AM
To: list at dshield.org
Subject: Re: RE: [Dshield] New email spam

So what you are saying is email containing HTML or other coding should 
simply be refused.  That would end 90% of the Spam.  Then a simple 
dictionary filter on the Subject line would eliminate any messages with 
padding or coding.  I like it.  Back to .txt only we go.



Subject: RE: [Dshield] New email spam
From: "Coxe, John B." <JOHN.B.COXE at saic.com>
Date: Mon, 22 Dec 2003 08:40:30 -0800
To: "'General DShield Discussion List'" <list at dshield.org>

<SNIP>
The same goes for HTML in the message body.  It should be rendered.
Spammers are obfuscating the content by adding nonsense tags, comments, tag
pairs, or font mods on every letter or two of commonly filtered words and
expressions to bypass filters.  Also hidden text colors are set to the
background color or one or two bits off from it, it is visually equivalent =
invisible.  A pseudo-rendering needs to be done to effectively cancel out
their content.  However, detecting these techniques present in the mail is a
pretty solid consideration for determining it is spam in the first place.
<SNIP>




More information about the list mailing list