I just spent more than an entire day trying to track down why the retarded search engine crawlers were generating 404 errors on this blog. I went through everything, including the sitemap.xml file. I couldn't find a single reason for the Googlebot to tack on words like ".More" (notice the period) or for the Yahoo Slurp bot to tack on the post title. What an incredible waste of time!


Redirections

I could have used the Redirection plugin to take care of this, but I've moved everything to my .htaccess file. Here's the rule I used to get rid of the words like ".More" and ".Back":

RewriteRule ^(.*)/\.(.*)$ http://www.untwistedvortex.com/$1 [R=301,L]

Notice the escaped period. It wouldn't work right otherwise since a period represents any single character with regular expression rules.

Here's the rule I used to get rid of the tacked on post titles:

RewriteRule ^(.*)/(.*)\s(.*)/$ http://www.untwistedvortex.com/$1 [R=301,L]

The 404 errors were displaying "%20" which translated into a space character. I used "\s" to isolate the 404 URLs with one or more spaces in them.

Performance Improvements

Over the past month, I've taken steps to kill off the spambot and other retarded crawlers because they've done nothing but slow down the rendering of the blog pages and consume bandwidth. I was even able to fend off a malicious hack attack on one given day, although I don't remember what day it was.

The point is, if you take the time to defeat the 404 errors, you'll catch the retarded bots with their pants down, so to speak. Don't wait until your site comes to a near stop to do something about it.

It goes without saying, but I'll say it anyway. Redirecting erroneous URLs to their correct locations can do nothing to hurt your site's ranking in the search engines. It can only make the ranking what it should be or even help to improve it.