Everyone who uses the internet is familiar with spam and the problems it causes. Most people's familiarity is through its intrusion on email, but since running the Nomad site we've had to deal with a lot of comment spam. It goes in phases but we receive something like 10 to 20 spam comments per week all of which get flagged so don't appear on the site.
Our initial strategy was to delete spam comments as they appear. A notification email is sent whenever a comment is posted and while working I'm never too far away from a computer so spam comments wouldn't appear for too long on the site. Of course, having to delete spam comments quickly became tedious!
There are a number of common methods for blocking spam comments — forcing the user to register with the site, hiding comments until approved by a moderator, CAPTCHA images, etc. Whatever method I decided on I knew it had to require as little thought from the end user as possible. I don't want to stop the few legitimate comments I am getting in my effort to stop spam.
That cuts registration from my available options and, I would argue, CAPTCHA. While CAPTCHA is becoming more familiar to internet users and a lot of good work is being done by reCAPTCHA it's still an impediment to the commenter. This leaves comment moderation, but that feels too close to my current system.
After some research I came across honey pots as a security concept and found a lot of information in the article Stopping spambots with hashes and honeypots by Ned Batchelder. Using programming cunning to prevent spam is a bit more like it! I decided to try implementing the ideas outlined in the article one at a time, starting with a honey pot.
The honey pot idea is very simple: provide something enticing for the spambot but hide it from the user. Spambots vary in their sophistication but most operate by trying to fill relevant information into the form e.g. a field named 'email' will be filled with an email address. Hidden fields are left untouched as these often contain information vital to the form submission. The trick to the honey pot is that spambots will fill in all text fields.
With this in mind I created an additional text field in the comment form with the name of 'lastname'. It's randomly inserted into the form each time it is generated and removed from display by using CSS absolute positioning within the stylesheet. As far as a spambot is aware the field is there on the page — you can see it by viewing the source code — but doesn't appear to the end user. If the field is filled in when the form is submitted the comment is flagged as spam and not displayed on the site. I can then delete or allow the comment as I see fit. My only concern with this method is accessibility: screen readers will probably display the field. Currently the field is labelled 'Input not required' which I hope is enough to stop legitimate users from filling it in.
Over the past 3 months this technique has worked incredibly well: I've had a 100% success rate so far.
This article was originally posted at we-evolve.co.uk on 1st November 2009.