Fighting forum spam

As one of the managers of Ignite Taipei, I’m trying to come up with new ways to let the community communicate, new ways to share information, advice and all. A while ago I have set up a forum at http://bbs.ignitetaipei.tw/ and I thought that will be an interesting experiment. Well, so far it is useless for communication, but turned out to be a very interesting experience from the sysadmin point of view.

I used FluxBB, because it looked simple enough, seemed to be quite fast (for low traffic volume at least), and well configurable. Except that within a very short time I run into a spam problem, so many fake users registered, and lots of algorithmically generated garbage text with a bit of advertisement here and there.

First I looked into FluxBB’s own solutions, and looks like it might not have been a great choice, because many of the spam-fighting plugins are out of date, or not supported anymore, or just a real pain to set up. The immediate practical step I could take was updating my security questions, roll my own version of “written with words, how much is 5 + 4?”, the regular low-tech captcha on FluxBB. Looks like the original answers are already in the database everywhere, so had to write my own set, which seemed to work for a while, cutting down on red-flagged registrations. But it’s not ideal, since I want to make this a dual-language forum (Ignite Taipei has both English & Chinese as official language).

Instead I turned on email confirmation. When someone registers, the password is sent to their email and have to use that to sign in. It was okay for a tiny bit, then crazy registration boom happened. I think I might be the only one real member of the board (I said that it is a failure so far for communication:) and there are 500 other spam members. Looking at their email addresses, it seems all of them have Hotmail. That kinda suggests a giant failure at Hotmail to restrict automatic registration, which is probably a problem overall. I cannot just throw out Hotmail addresses either, because it’s a popular mail provider here in Taiwan too (my first email was Hotmail too, but that was a looooong time ago, before it was Microsoft property).

So captcha don’t work, email don’t work. What to do instead? At the time I was playing around with Cloudflare, to act as an easy to use CDN. I tried it before for our Ignite Taipei blog, which is hosted on Tumblr, and that doesn’t play well with Cloudflare unfortunately. Couldn’t use it for this blog before because of my DNS provider, but now I switched, so started playing with it again.

The dashboard of the Cloudflare interface
Cloudflare stats snapshot (parts of it)

Instead of enabling Cloudlfare for the entire ignitetaipei.tw domain, just turned it on for the forum, since it’s hosted elsewhere. And that totally did it. Spam stopped that very moment, and haven’t returned since. I think what happens is that Cloudflare knows globally a lot of web/forum/email span hosts, and can challenge them or generally ignore them. Can even see where those spammers are coming from.

List of captured threats on the Cloudflare threats console
Cloudflare Threats Console

One weird (but actually not that surprising) thing is that the most active web crawler on the site (Cloudflare gives that info as well) was Baidu by far, so I guess more people knew about the site in China than elsewhere. Why’s that? Some forums that share vulnerable sites, or something like that? I barely had any Chinese content at that time, so it cannot be that. And since I turned on the threat control part, Baidu seem to have dropped quite a bit (submitted the site to Google so now that’s the busiest crawler).

All in all, Cloudflare is an interesting experiment. I can really mess up my DNS with it, and could blocked my own site for several hours, but in general it worth it. Just have to be careful. For example when testing, use their own name servers to check the information, and maybe instead if “automatic” time-to-live, set some very short time first. I usually use Google’s 8.8.8.8, and they pick up the first wrong setting really quickly, then it takes hours to pick up the correction I made just minutes after the first one.

After a bit of playing around, at least I have no spam anymore (keep fingers crossed). Now just have to get people to use the forums. :)

Published by Gergely Imreh

Physicist, hacker. Enjoys avant-guarde literature probably a bit too much. Open source advocate and contributor, both for software and hardware.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.