Defeating Bad Scrapers the Free and Easy Way

As a website owner, you should be worried when someone scrapes all or part of the content of one of your pages. If you're not familiar with it, the Copyscape service is one way to find them and start the elimination process. There is a procedure, which you can do for free (Copyscape charges for more than 10 searches per month) over and over again, that will never cost you anything.


Link to Yourself

Pick a word, any word, on your page and create a link back to that page. As I mentioned in "A Quick Lesson in Website Surreptitiousness", you simply remove any decoration for that link. Don't use the title attribute, though, because you don't want a scraper spotting it when he or she hovers over it. It's not completely hidden; the word is still there and anyone who uses a browser to override or disable CSS can see the link. Not only that, but the link will still appear in the status bar of most browsers.

In the article by Matt Cutts, "Hidden links", he states "Now there’s nothing bad about changing the style of a link to some degree, but let me show you an example of going overboard…" and the example he gave "…crosses over into deceptiveness and violates our quality guidelines…"

It's clear to me that my suggestion doesn't violate the quality guidelines, especially since I'm telling you to link back to your own specific page. The robots that index websites won't follow the link, obviously, since it doesn't go anywhere.

Use Special Words

You don't want to use common words in this way. If you use something like "and", you'll be kicking yourself later. You can even make up a word. Do a Google search for your made up word and see if there are any results and keep trying if there are. Don't use "niebu", okay? (Go ahead and hover over "niebu".)

Setup a link to your special word or words EVERY TIME you publish a page or post. The closer to the start of the content, the better. If you have trouble using nonsensical words, try using a misspelled word on purpose (wrong spelling, transposed characters, etc.). Just make sure the word doesn't yield any search results before you use it. You could even say something like "here at Untwisted Vortex's fern grotto" (with your site, of course) and make the phrase into a link.

Use Google Alerts

You can set up to 1,000 Google Alerts to be sent to your email address. Specify your special word or phrase and use the "comprehensive" and "as-it-happens" options. Those are the options I use and I don't get overwhelmed with email yet. You can always change the frequency when managing your alerts.

You have to have a Google account in order to use Google Alerts. If you don't have an account, create one. You'll be glad you did. You'll need it later for access to the Google Webmaster Tools.

Destroy the Value of Duplicate Content

Once you receive a Google Alert showing you that someone has scraped your page, you need to examine the scrape.

Is it more than 150 characters? No? Leave it alone. Sometimes people link to your page with long links. Thankfully, most link anchors are only a few words, like when Sephy wrote "Where did WordPress go wrong?" and linked to one of my articles with just two words.

Is it the size of a paragraph or two, but well under half of your page? If so, does it include a link back to your page at the beginning or the end? If it includes a link, leave it alone. Good scrapers, like feed aggregators, do this all the time.

Is it a full page? Regardless if it's linked back to you or not, you can now report it as spam in Google's index from the Webmaster Tools dashboard. But should you?

What you should be worried about is getting penalized for duplicating content from somewhere else when that somewhere else is duplicating YOUR content. There are three ways to defeat the bad scrapers:

  1. Report it to Google. Make sure you have the information they need or Google won't do anything. If you do, they'll de-index that page or de-index the entire site. I've been very successful with this method.
  2. Rewrite your original page. This can be easy or hard depending on what you wrote about. Frank of OpTempo wrote about duplicate content filtering concepts back in May and explained how it can be made to work for you.
  3. Try to get the website that scraped the content to remove it. Again, this can be easy or hard. It all depends on who's doing the scraping. If it's a mistake by a newbie blogger, it's easy. If it's a professional scraper, then good luck.

Final Thoughts

If you use my suggestion and you get scraped by a bad scraper, you'll get a free one-way backlink in the process. If you're optimizing your content for search engines and using keywords properly, it's going to happen a lot. Don't let it bother you until it gets beyond 300 characters or so — that's a lot of text.

If you have a complete page scraped, I wouldn't bother with it unless and until you find the other party's page ranking higher in Google or Yahoo. Just keep notes and check up on it periodically. I've seen pages both de-indexed and removed without any effort on my part at all.

The reason I put undecorated links in more than one article is to show you that I'm not afraid of being penalized for it. I'm not hiding them from anyone but the scrapers and people who tend to be click-happy. I would never link to a bad site, hide a paid link, or anything like that. That's asking for trouble and I would never recommend it to anyone.

Now that I've said my 1,000+ word spiel about bad scrapers inspired by Grizzly's article, "Duplicate Content is Google's Weak Link", tell me what you do with them and how you do it. I'm all ears (or eyes).

Similar Posts:

37 Comments

  1. Stephan Miller from Clickbank Affiliate Blog says:

    If you notice at the bottom of the posts on my blog, I practically invite scrapers. I go through a service that will republish my content on other blogs. But I link from within the content to my tag based archives and I used a related posts plugin so I get multiple backlinks whenever one of my articles in used.

    My latest blog post: Affiliate Programs + Your Blog + PPC = Synergy

  2. Tim from Rare Antiques says:

    Those wacky scrapers. What will they get up to next?
    Our Redneck site has been scraped multiple times, and in most cases it is less than a full sentence. So if I understand you correctly, those ones don't hurt us and actually count as a good backlink?

  3. But Niebu is such a perfectly good word! Oh, wait… It's *my* word! Ha!

    Good stuff, good article, and thanks for the link.

  4. [...] Note: RT over at Untwisted Vortex has a good article discussing some of the things you can do with scrapers. See… Defeating Bad Scrapers the Free and Easy Way [...]

  5. hari says:

    One of the best ways to find out which server a site is running on is to use http://www.netcraft.com and find out.

    Reporting to the web host might just work. Of course if the scaper is using blogspot.com the task becomes easier. Simply notify google and they'll usually remove the content if it's a spam/scaper blog.

    My latest blog post: Boxi and Panjo – Modern Art

  6. Stevo says:

    Great idea, RT. I'll start doing it immediately. I usually notify Google and the site's host of any scraping activity. It's amazing what an email to a host will do….

    My latest blog post: bbq at the night market

  7. Craig says:

    If you do this, you should be able to handle all those nasty scrapers.
    However if you are a little paranoid another solution would be to set your feed so it only publishes a summary of your posts.
    You can do that in Settings>Reading on a Wordpress blog.

    • You're making the assumption that scraper only go after feeds. They go after much more than that.

      • Craig from Holiday Trips says:

        Well none of these steps are 100% guaranteed to get rid of scrapers. I however consider this the most extreme and effective step, as scraping a post off a web spider is much harder than doing it off a feed. I however don't do this myself because I think it's overkill.

        Something else you can do is to use relative and not absolute addressing when you add a picture to a post.

  8. Don from Making Sales Making Money says:

    Good post Rt , honestly I'm worn out from this fight, its not going to stop

    My latest blog post: Im Thinking Mark Ress Wont Use This Post

  9. Rhysr says:

    Hi RT!

    Having hidden (undecorated) links has the effect that we get a back link out of the particular act of piracy, even if we don't detect it, and for this reason it is probably important to hide your backlink in the beginning of the article so that it will appear even if someone only quotes the first par.

    My latest blog post: Now You Can Have a Top Money Making Blog

  10. Lin Burress says:

    Hmm, this is very interesting. I've experienced some scraping going on and wondered if there were an easier way of being alerted when it happened.

    Now I have to figure out what word(s) to use in my posts to be successful at defeating bad scrapers and the easier the better.

    Geez RT, now you're making me think harder than I intended to today. :mrgreen:

    My latest blog post: Computer Monitoring Software- Do You Know What Your Kids Are Doing Online?

  11. Leah Marie says:

    Nice Article. It's a big help and has very informative topics that can be useful. Thanks for the links.

  12. Justin from Electric Airsoft Rifles says:

    I have to complement you on the cleverness of that technique. Linking back to yourself… it's so simple, yet so effective. Nobody copies me though, so I don't have to worry about it :smile:

  13. But scraping can be in the form of stealing ideas and rewording. I confess I had done that myself but I reword, add original content.

    The Scraping issue is such that when you are the scraper and you are being scraped feels completely different.

  14. Chelle from Pittsburgh Junk Removal says:

    Something I've been doing is include my link with keyword anchor text in the first paragraph. Is that good or bad? In some ways, it's good since you get a one way backlink…but if that site gets in trouble then it wouldn't be good. What are your thoughts on that?

    • Linking out to bad sites is a problem, but if a bad site links to you, it's not.

      I don't see any harm in linking the keywords back, but if you're using something like Google Alerts to track it, you may end up getting a lot of alerts every day because of how frequent the keywords are used.

  15. Carrie says:

    You give me so much to think about, it makes my head spin! Awesome tips RT. I've been scraped three times in the past four months. Actually, it's been more than just a scrape. In two cases, they ripped off my entire website. I reported each case to Google and Blogger and followed through with the suggestions provided by Copyscape, but the process took days to get through. It was a complete pain in the butt.

    This sounds very simple and effective. Thanks very much for letting us know about it.

  16. If you can manage to get a link back to your blog in the scraped content then it's actually good for you that they are scraping, right? I don't see how Google could rank their scraped content above your own original content.

  17. Lin Burress says:

    Just as I thought RT, my post about my Staycation has been scraped completely; entire post (not just a few words), but the entire post and the darn thing is listed at number 3 on Google's search results.

    My Friendfeed "shared" link appears as number 7 on the first page, and my original post is kicked back several pages and doesn't even include my post title, which should have given my post much better ranking. Check this out:

    http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla:en-US:official&hs=8bV&q=What+I+Did+On+My+Summer+Staycation&start=0&sa=N

    I've notified Google just a few minutes ago and hopefully they'll do something about this, cuz I'm PISSED!

    When I was on Blogger I knew how to quickly report abuse, but now it's a bit different, and I'm hoping I get a quick response. Nothing makes me madder than crap like this.

    My latest blog post: What I Did On My Summer Staycation

  18. Simon from Rugby Coaching says:

    I have found Google alerts to be very useful for this. I have also found putting a graphic banner in the content of your articles makes it obvious if someone is scrapping your content, a thus is a sort of deterrent. Cheers.

    My latest blog post: Rugby Skill Training: Identify Key Factors and Plan

  19. bleuken from Busby SEO Challenge says:

    Me, i use the feedfooter plugin for wordpress. scrapers usually hit you by your RSS feeds. I also followed your advice regarding putting links on your posts, but in my case in my place he simply copy paste the item. I just report him to the blog hosting admins and so his blogs was suspended until he did not delete the copied article.

    My latest blog post: Behind Google’s Search Interface

  20. Lin says:

    :mrgreen: I just checked my Staycation post and it's back at number 1! The offending scraper link is at number 4 but using the link and searching through the site brings 0 results. I'm happy about that, but I don't like that I never got a response from G at all.

    Now I'm contacting them for some other blogger.com bozo who scraped one of my posts also in its entirety. :evil:

  21. Dave from BBQ techiques says:

    I like the idea of putting a link back to your own site on a page, if you do get sraped then at least there is some value for your own site. A link back to itself.

    My latest blog post: Delicious BBQ ribs

  22. Justin from How to become a police sheriff says:

    One really good trick for getting use out of scrapers: cloak your links so that they look exactly like regular text and cloakers who copy paste won't get them out.

    My latest blog post: Learning about becoming a sheriff

  23. Jeff Turner says:

    Hey nice to read this. Wow am using the Google Alerts and believe me it helps me a lot. Thansk for the wonderful post.

  24. Blogs Right says:

    I have created a blog just for those scrapes.It for my own use,anyway.You can add a button to your blog in fighting those content thefts .

    At: http://wearecopyrighted.blogspot.com/

Leave a Reply

CommentLuv Enabled

This site uses KeywordLuv. Enter YourName@YourKeywords in the Name field to take advantage.

Anti-Spam Protection by WP-SpamFree