Okay, you can color me as misunderstanding (or maybe not, as I’ll explain). It seems that blocking IP addresses by country isn’t a good idea, except when you’re first starting with your blocking routines. You see, the databases that provide the country codes always have mistakes and even when they don’t, an IP address listed for a specific country may actually be an IP address used by a different country – Kind of like Google using IP address ranges belonging to China.
User Agents
I’m already blocking by user agents and I also block specific IP ranges that belong to specific entities that report those user agents – like BAIDU. In fact, if you aren’t blocking the BAIDU search engine, you’d better start. Any entity that masquerades as regular users as often as showing itself as a search engine robot needs to be blocked. It means they’re doing something else besides indexing your website.
I have a whole slew of user agent strings I block. Some of them belong to defunct services, so don’t be surprised:
- 360Spider
- 80legs
- Baiduspider
- BlogPulseLive
- BlogScope
- EasouSpider
- EzineArticlesLinkScanner
- ezooms
- libwww
- linkdex
- MJ12bot
- MyNutchSpider
- scrapy
- Sosospider
- Spinn3r
- Voilabot
- Yandex
- YodaoBot
And those are just the ones I can remember off the top of my head without looking at my configuration files. Not all of the user agents that hit my sites are going to hit your sites and vice-versa. You really have to scan your access logs to find out which ones you need to block and which ones you don’t.
Bad Behavior
That’s probably not the proper term. Unwanted behavior is probably better. I have a script that runs every 15 minutes, looking for specific things like spam attempts, attempts to access never-existent pages (looking for exploits), attempts to register and log on when I have registrations turned off and things of that nature.
When an IP address is added to my “temporary forbidden” list, the server issues an HTTP 444 error (NginX only) and drops that connection the very next time. Then, each day, any IP address with a 444 error (which means the same IP address came back) is added back to the list. The only way they no longer appear on that list is if they stop coming back for a few days or are added to my “permanent forbidden” list, which I maintain manually.
Time-Consuming?
Not really. It was at first, but now I just scan some lists I generate a couple of times a day. Because bad actors are automatically added to a temporary list that the server loads every time I do a “graceful restart” (after it scans all the logs and such), I can skip my daily routines when necessary and just pick up where I left off later on.
The secret is to automate as much as possible and not just for things like this. Servers are normally set up with a lot of other housekeeping chores, which generally happen after midnight (server time). Adding things after those routines should have as little impact as possible.
Now, I’ve already blocked a huge list of server IP address ranges as well as specific user agents, from all kinds of countries, so unblocking specific country blocks isn’t going to have an impact anymore. If you’re just starting out, I would suggest blocking certain countries only until you get a handle on which specific IP address ranges to block.