Skip to content


Google needs numeric spam scoring

No, having “Google mail” and “addressed” in the same sentence was not an attempt at a pun.

I currently use Google Apps for two of my domain names, mostly to handle Email than any other use. I figured that having a web-based interface and lots of (redundant) storage would be a good solution for me, which pointed to using Google Apps, which is free for small businesses that want up to a hundred 2GB mailboxes. Of course, I also have a regular @gmail.com address (who doesn’t these days).

Lately, my annoyance with my iandouglas.com Email is this: I miss SpamAssassin.

Don’t get me wrong, Google has identified spam messages clearly as accurate as I have with SpamAssassin over the years, though with more false positives (non-spam ending up in the spam folder). The problem I’m having is that many of the messages I get are all the same.

On any given day, I get a few dozen spam messages that have to do with seeing Oprah live, dealing with debt/loans, presidential polls, BurgerKing vs McDonalds, Coke vs Pepsi, or about a JC Penney or Kohl’s gift card. Rinse and repeat … some of these messages seem to show up on 2-hour timers. Over the weekend I had 270 pieces of spam in my spam folder at Google, all easily lumped into these categories, and a few others.

“But Ian, why do you care, Google trashes all spam after 30 days”

Yes, but when you get 100-150 pieces of spam every day, detecting false positives (non-spam in the spam folder) becomes increasingly difficult.

I debated setting up filters to automatically (permanently) delete these messages based on keywords/phrases, but (a) that’s a lot of manual work to maintain, (b) it runs the risk of losing legitimate Emails.

It boils down to this: Google needs to implement some sort of numeric spam scoring system, and let us create filters based on those scores.

Currently, because I train SpamAssassin 2-3 times per week with a lot of ham/spam messages, I have my ‘threshold’ value set at 3.5 — anything scoring higher gets flagged as spam. I also have a mail filter set up that anything that scores over 10.0 points should be permanently deleted. Using proactive training like my SpamAssassin training script, I never have to worry about losing legit Emails.

However, Google Mail (both as Google Apps and as their base gmail.com setup) has no way of telling me HOW spammy it thinks a message is, so I have no easy way (other than manually reading through sender names and subject lines) to determine whether something needs to be rescued from the spam folder.

I’m currently looking at how best to change my domains’ MX records to point back to a SpamAssassin-enabled hosting provider and build IMAP-based accounts there, and then have Google Apps load up its Email from those iMAP accounts. This way, I can continue to filter/train SpamAssassin as my first line of defense, and the occasional false-negative that slips into my Inbox will (hopefully) be caught by Google. Of course, I need to set this up in such a way that poses the least amount of interruption.

I’ve love to hear any feedback or solutions that you or someone else may have built, or ideas on how to approach the problem.

My thinking so far is that simply pointing my MX records to a SpamAssassin-enabled server (and having those domains hosted at that server for Email only) should send all incoming domain Email to that server, and push the messages into an IMAP heirarchy. From there, have Google retrieve the mail from those Inboxes into their appropriate Google Apps setup where I can reply to the messages.

The downside is obvious: Google doesn’t poll IMAP folders very often (every hour or so in my experience) so urgent messages would still need a web-based Email application like Horde or Squirrelmail. I’m not sure if I can force/encourage Google to check for new messages any more frequently or not (or if the timing is simply based on the frequency of how often it’s *found* messages there in the past).

Or maybe I just need to suck it up, and find a decent web-based IMAP Email client application and ditch Google Apps until Google *does* implement some sort of spam scoring system.

Posted in Uncategorized.


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

You must be logged in to post a comment.


Get Adobe Flash playerPlugin by wpburn.com wordpress themes