How to Protect your Website from Unwanted Visitors, Part 4 – Blocking User Agents

Your website security wouldn’t be worth much if you didn’t block specific user agents. Whatever purpose they serve, some user agents will continuously crawl your website looking for… something. The “80legs” user agent alone was once responsible for making my web server unresponsive for more than an hour and of course, I block it now.

Don’t Worry about Referrers

I’ve purposely skipped anything about blocking referrers. If you ban IP address ranges and block user agents, you’ll nail the bad referrers in the process.

Everything on the Internet is temporary, with some things more temporary than others. Referrers seem to be the most temporary of all the things you’ll notice when you parse your access logs and that should be something you do on a daily basis, weekly at least.

Blocking User Agents on Apache Web Servers

There are a couple of different ways to block bad user agents (bad bots) using an .htaccess file, but the easiest way is to used the rewrite module and it looks something like this:

  RewriteEngine On   RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]   RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]   RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]   RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]   RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]   RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]   RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]   RewriteCond %{HTTP_USER_AGENT} ^Zeus   RewriteRule ^.* - [F,L]  

Those are just examples I lifted from somewhere else. You’ll probably never see those user agents. You need to remember that you only turn the rewrite engine on once in an .htaccess file.

Blocking User Agents on NginX Web Servers

It’s actually much, much easier with NginX and it’s done at the “http” level of the NginX configuration file:

  if ($http_user_agent ~* (80legs|VoilaBot|wotbox)) {  	return 444;  }  

The field separators act as “or” operators. There are many more user agents I block, but I don’t want to list them here. At least not without saying why. I’m doing more than I should by showing three.

Wrapping up the Series

I could go on and on with articles about blocking bad actors when it comes to your website, but what I’ve mentioned are the most important. Basically, if you block or drop the bad actors as fast as they connect, you’ve save on bandwidth and your website will be more responsive for real visitors and good bots (like the googlebot and the bingbot).

I’m going to be providing lists of server ranges, in both CIDR and range format, as soon as I start setting up the pages. I’ll also be providing the user agent strings that should be blocked on other pages. It doesn’t do any good to block ranges from residential IP address blocks because they’re dynamic in nature and blocking one bot today may mean blocking a real person tomorrow – If you can identify the bot by the user agent, more power to you.

«
»

One Response to “How to Protect your Website from Unwanted Visitors, Part 4 – Blocking User Agents”

Read below or add a comment...

  1. Katrin says:

    Thanks for sharing your useful tips, one of my blogs has been already hacked, I don’t want to happen it again. Honestly, I haven’t cared about security before, I thought that my blog isn’t important for hackers. Although your text sounds a little bit geeky for me, I bet my husband is going to explain me:)

Leave a Comment...

*