As a website owner, you should be worried when someone scrapes all or part of the content of one of your pages. If you're not familiar with it, the Copyscape service is one way to find them and start the elimination process. There is a procedure, which you can do for free (Copyscape charges for more than 10 searches per month) over and over again, that will never cost you anything.


Link to Yourself

Pick a word, any word, on your page and create a link back to that page. As I mentioned in "A Quick Lesson in Website Surreptitiousness", you simply remove any decoration for that link. Don't use the title attribute, though, because you don't want a scraper spotting it when he or she hovers over it. It's not completely hidden; the word is still there and anyone who uses a browser to override or disable CSS can see the link. Not only that, but the link will still appear in the status bar of most browsers.

In the article by Matt Cutts, "Hidden links", he states "Now there’s nothing bad about changing the style of a link to some degree, but let me show you an example of going overboard…" and the example he gave "…crosses over into deceptiveness and violates our quality guidelines…"

It's clear to me that my suggestion doesn't violate the quality guidelines, especially since I'm telling you to link back to your own specific page. The robots that index websites won't follow the link, obviously, since it doesn't go anywhere.

Use Special Words

You don't want to use common words in this way. If you use something like "and", you'll be kicking yourself later. You can even make up a word. Do a Google search for your made up word and see if there are any results and keep trying if there are. Don't use "niebu", okay? (Go ahead and hover over "niebu".)

Setup a link to your special word or words EVERY TIME you publish a page or post. The closer to the start of the content, the better. If you have trouble using nonsensical words, try using a misspelled word on purpose (wrong spelling, transposed characters, etc.). Just make sure the word doesn't yield any search results before you use it. You could even say something like "here at Untwisted Vortex's fern grotto" (with your site, of course) and make the phrase into a link.

Use Google Alerts

You can set up to 1,000 Google Alerts to be sent to your email address. Specify your special word or phrase and use the "comprehensive" and "as-it-happens" options. Those are the options I use and I don't get overwhelmed with email yet. You can always change the frequency when managing your alerts.

You have to have a Google account in order to use Google Alerts. If you don't have an account, create one. You'll be glad you did. You'll need it later for access to the Google Webmaster Tools.

Destroy the Value of Duplicate Content

Once you receive a Google Alert showing you that someone has scraped your page, you need to examine the scrape.

Is it more than 150 characters? No? Leave it alone. Sometimes people link to your page with long links. Thankfully, most link anchors are only a few words, like when Sephy wrote "Where did WordPress go wrong?" and linked to one of my articles with just two words.

Is it the size of a paragraph or two, but well under half of your page? If so, does it include a link back to your page at the beginning or the end? If it includes a link, leave it alone. Good scrapers, like feed aggregators, do this all the time.

Is it a full page? Regardless if it's linked back to you or not, you can now report it as spam in Google's index from the Webmaster Tools dashboard. But should you?

What you should be worried about is getting penalized for duplicating content from somewhere else when that somewhere else is duplicating YOUR content. There are three ways to defeat the bad scrapers:

  1. Report it to Google. Make sure you have the information they need or Google won't do anything. If you do, they'll de-index that page or de-index the entire site. I've been very successful with this method.
  2. Rewrite your original page. This can be easy or hard depending on what you wrote about. Frank of OpTempo wrote about duplicate content filtering concepts back in May and explained how it can be made to work for you.
  3. Try to get the website that scraped the content to remove it. Again, this can be easy or hard. It all depends on who's doing the scraping. If it's a mistake by a newbie blogger, it's easy. If it's a professional scraper, then good luck.

Final Thoughts

If you use my suggestion and you get scraped by a bad scraper, you'll get a free one-way backlink in the process. If you're optimizing your content for search engines and using keywords properly, it's going to happen a lot. Don't let it bother you until it gets beyond 300 characters or so — that's a lot of text.

If you have a complete page scraped, I wouldn't bother with it unless and until you find the other party's page ranking higher in Google or Yahoo. Just keep notes and check up on it periodically. I've seen pages both de-indexed and removed without any effort on my part at all.

The reason I put undecorated links in more than one article is to show you that I'm not afraid of being penalized for it. I'm not hiding them from anyone but the scrapers and people who tend to be click-happy. I would never link to a bad site, hide a paid link, or anything like that. That's asking for trouble and I would never recommend it to anyone.

Now that I've said my 1,000+ word spiel about bad scrapers inspired by Grizzly's article, "Duplicate Content is Google's Weak Link", tell me what you do with them and how you do it. I'm all ears (or eyes).