Outer Web Thought Log
May 24, 2003
Fooling SpamAssassin
After installing SpamAssassin a long time ago, I was very happy to not see any indecent proposal being dropped in my mailbox anymore. Since a couple of weeks however, some spam mails happened to survive the SpamAssassin tagging and filtering, and I started wondering why. I had been upgrading SA to a newer release some weeks ago, which apparently did not help very much. So I started a little investigation myself.

To start with, here's a screenshot of one of these mails in my MUA (Mozilla 1.3). Warning: some explicit wording ahead.
spam.png
To have this screenshot make sense, I must say I turned off image viewing in Mozilla Mail&News, also to fight spam. I guess you all know these images in HTML emails are often based on one-time, message-specific URLs, which means they form a great email address validity mechanism. The moment you preview such a message in your 3-pane Outlook Express setup, the image shows, and you've silently confirmed your existence for the spammers out there. Most likely, your address will now be distributed on a slightly more expensive CD-ROM, since it can be considered a validated or confirmed one.

Anyway, being the father of three kids already, I'm not particularly worried about my endowment, so I prefer not receiving such mails altogether. So why is SA failing on me? Let's look into the source of the mail message (my emphasis):
From - Sat May 24 12:37:01 2003
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Return-Path: <j_mack@fourd.com>
Delivered-To: stevenn-stevenn at outerthought.org
Received: (qmail 7277 invoked from network); 24 May 2003 10:30:18 -0000
Received: from unknown (HELO fdata.com) (218.50.64.171)
 by cocoondev.org with SMTP; 24 May 2003 10:30:18 -0000
Message-ID: <CFOKBCBJFCNLHBCDMICPHKDMBAAA.j_mack@fourd.com>
From: "Joel Mack" <j_mack@fourd.com>
To: stevenn at outerthought.org
Subject: Been busy lately?
Date: Sat, 24 May 2003 03:29:01 +0000
MIME-Version: 1.0
Content-Type: text/html;
 charset="iso-8859-1"
Content-Transfer-Encoding: base64
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
X-Spam-Status: No, hits=-0.8 required=5.0
 tests=BASE64_ENC_TEXT,DATE_IN_PAST_06_12,HTML_60_70,
 HTML_IMAGE_ONLY_02,HTML_MESSAGE,HTTP_EXCESSIVE_ESCAPES,
 MIME_HTML_ONLY,MSGID_GOOD_EXCHANGE
 version=2.53
X-Spam-Level: 
X-Spam-Checker-Version: SpamAssassin 2.53 (1.174.2.15-2003-03-30-exp)

DQogIA0KCQkJPEhUTUw+PEJPRFkgYmdjb2xvcj0iI2ZmZmZmZiI+DQo8cCBh
bGlnbj0iY2VudGVyIj48Zm9udCBmYWNlPSJ2ZXJkYW5hIj4NCmdldCA8WD5s
YTxRRD5yZzxYPmU8Q0JXPnIgbnV0cyBhbmQgcDxDTkk+ZW48WUs+7XM8Q1JQ
Uj4sIA0KICANCiA8UVU+bW9yZQ0KIDxaU0dLPnBsPFlCPmU8V01FRD5hPFhB
Sj5zdTxDRlFaPnI8S1FGUz5lLCAJIG1vPEs+cjxDPmUNCiBzYTxDPnRpc2Zh
YzxXRz50PEtPQT5pb248YnI+DQo8YSBocmVmPSJodHRwOi8vd3dXLk15V0VC
c1BlY0lBTHouQ09NL3BlJTZCL20yJTYzLnBoJTcwPyU2ZCU2MW49c3Q0diU3
MCI+UmVhZCBhYm91dCBpdCBoPFk+ZTxaQUVRPnI8UUU+ZTwvYT48YnI+DQo8
YnI+CTxhIGhyZWY9Imh0dHA6Ly93d3cubXl3ZUJTUGVDaUFMei5Db20vJTcw
ZWsvbSUzMiU2My4lNzAlNjglNzA/JTZEYSU2ZT1zdCUzNHZwIj4JDQogDQog
PGltZyBib3JkZXI9MCBzcmM9Imh0dHA6Ly93V1cuQ0hlYVBvTm5lVC5ORXQv
JTcwLmpwZyI+CQ0KICAgIA0KPC9BPjxicj48YnI+PGJyPg0KPHByZT4NCi0t
LS1PcmlnaW5hbCBNZXNzYWdlLS0tLQ0Kc3RldmVubkBvdXRlcnRob3VnaHQu
b3JnIHdyb3RlOg0KPiBUaGF0cyB3aGF0IGkgaGVhcmQNCjxDR0xLPg0KPC9w
cmU+DQo8YSBocmVmPSJodHRwOi8vd1dXLm15V2VCU1BFQ0lBbHouQ29tLyU3
MiU2NSU2ZCU2ZiU3NiU2NS8iPk5vIG1vcjxLT1A+ZSBwbGVhc2U8L2E+DQo8
YnI+LT0zZmtnMWsyd3QzZzl0cz0tPC9mb250PjwvcD4JCQ0KIA0KIDwvQk9E
WT4gICANCgk8L0hUTUw+DQoJDQoNCg==

Hm... looking at the tests triggering SA spam rating system, none of them where considered serious enough to warrant a decent score. So I initially thought SA didn't bother parsing the message text for known spam text patterns after it discovered it was a base64-encoded one. Getting curious, I manually decoded the encoded part and found the following:

  
      <HTML><BODY bgcolor="#ffffff">
<p align="center"><font face="verdana">
get <X>la<QD>rg<X>e<CBW>r nuts and p<CNI>en<YK>?CRPR>, 
  
 <QU>more
 <ZSGK>pl<YB>e<WMED>a<XAJ>su<CFQZ>r<KQFS>e,    mo<K>r<C>e
 sa<C>tisfac<WG>t<KOA>ion<br>
<a href="http://wwW.MyWEBsPecIALz.COM/pe%6B/m2%63.ph%70?%6d%61n=st4v%70">Read about it h<Y>e<ZAEQ>r<QE>e</a><br>
<br>  <a href="http://www.myweBSPeCiALz.Com/%70ek/m%32%63.%70%68%70?%6Da%6e=st%34vp"> 
 
 <img border=0 src="http://wWW.CHeaPoNneT.NEt/%70.jpg"> 
    
</A><br><br><br>
<pre>
----Original Message----
stevenn@outerthought.org wrote:
> Thats what i heard
<CGLK>
</pre>
<a href="http://wWW.myWeBSPECIAlz.Com/%72%65%6d%6f%76%65/">No mor<KOP>e please</a>
<br>-=3fkg1k2wt3g9ts=-</font></p>   
 
 </BODY>   
  </HTML>
  
So apparently, SA does effectively decodes and checks encoded mails, since I see plenty of excessively quoted URLs and one big image link, which triggered the appropriate rules in SA. So I digged a bit deeper, and found a couple of nonsense HTML-like tags, carefully distributed across the offending text, to fool SA's filters. Ah. That must be it. While my MUA doesn't care and hides unknown tags, SA cannot detect 'nasty' words when they are decorated with nonsensical markup. Guh. Clever trick.

I go looking in the SA mail archives, and I stumble onto these conversations, so it looks like I should upgrade my SA version (again). Sigh. Some lessons however: (1) had SA not been an OSS project, I would never have been able to quickly discover the issue at hand, since the discussion on the product would be on lists inaccessible to users. (2) Spam detection is a work of art, and the spammers are really reading up on this stuff. I'm pretty sure this particular message was coded in such a way that the spammer could predict its SA score. (3) SA is rock-solid and absolutely inobstrusive. This also means you tend to forget it is there, and consequentially forget to upgrade.

Overall, it was an interesting exercise in spamfilter avoidance techniques, and it looks like we won't likely win this war any time soon.
Posted by stevenn at May 24, 2003 03:11 PM ()