How to Protect your Website from Unwanted Visitors, Part 2 – Parsing your Access Log

Before you can begin to block specific countries, as I mentioned in part 1, you have to figure out which countries are hitting your website the most, which usually coincides with who attempts to spam it the most (if you have a comments section, whether it be a blog or a forum). The way to do this properly is to parse your access log. Of course, you need to have access to your access log, if you’ll pardon the expression. Most web servers use the Apache web server log format by default, but it doesn’t matter as long as you can determine what each part of each line means.

Parsing the Access Log, Line by Line

There are the functions I use to parse my own log files in PHP: explode, strstr/stristr and substr. I’ll just be using “explode” this time around.

With explode, I break each line into segments, split where there are spaces in order to get the IP addresses and split where there are full quotation marks to get the user agents. I’m not talking about user agents today, so let’s concentrate on the IP addresses.

Here’s how I would get mine and to make it complete so you can see the countries, I’ll include the GeoIP country code:

  include '/path/to/geoip.inc';  $gi = geoip_open('/path/to/GeoIP.dat', GEOIP_STANDARD);  $file = file('/path/to/access.log');  foreach ($file as $line) {    $a = explode(' ', $line);    $cc = geoip_country_code_by_addr($gi, $a[0]);    echo $cc . ' '. $a[0] . '<br />';  }  geoip_close($gi);  

If you want to display this through the console vs. a website script, then change ‘
‘ to “\n”.

You’ll probably find out as I did that the first countries you need to block are China (CN), South Korea (KR), Russian Federation (RU), Ukraine (UA) and Turkey (TR). Those five countries are the worst non-English countries to allow on an English-based website. As you progress with your blocking routines, you may end up removing your country blocks, one by one.

I can’t tell you which countries you should block because it depends on what kind of website you’re running. If you don’t allow comments on your website, you may not need to block comment spammers unless you’re running WordPress or some other popular software package. If you have a homegrown website, you may never see comment spammer IP addresses.

With WordPress, in my experience, it doesn’t matter if you allow comments or not. The robots belonging to the comment spammers will attempt to post to “wp-comments.php” no matter what and will come back to try almost every single day.

The Next Step

The next step in the process is to find out what servers are doing all the dirty work, in the countries you haven’t blocked. To help you do this, you can use the PHP “gethostbyaddr” function. Be forewarned, however, because that function is dependent on what the servers return and it could take longer than you’re willing to wait.

Because I’m already doing this dirty work on my websites, I may end up building a web service for a modest fee (something like $2/month or $10/year) that would save you the effort. I would do it for free if I could, but I’m not sure I could keep it within my current bandwidth limitations.

«
»

2 Responses to “How to Protect your Website from Unwanted Visitors, Part 2 – Parsing your Access Log”

Read below or add a comment...

  1. Cindy says:

    Nice but, what I read online is that they are using proxy servers which are located in the US. So I think you may have to re-think your strategy. Sorry :(

Leave a Comment...

*