How To Stop Scrapers
One of my maintenance clients was having an issue with a scraper stealing his content and
posting it on their site as original content without attribution, he asked me via email how to stop scrapers stealing his content.
What Is A Scraper?
A scraper is one of the dark denizens of the internet along with trolls and spammers.
Scrapers steal other people’s content and add it to their own site passing it off as original content, and usually monetise it in some way.
They might sell it as original guest post content or they may own the target site and wrap it in ads or sell products or services.
Steps To Stopping A Scraper
The first step is to identify that you are being scraped.
One good way is to setup a Google alert for you content. You can go this at google.com/alerts. With google alerts you can get an email alert if certain things appear on the net. Add an alert for something you have in your content such as a post title of your most popular post and you will be alerted if a scraper takes your content without asking.
Once you have a hit record the URL of the offending site
Look at your logs. In your hosting account there will be an option to look at your webserver logs, search for referrers of the offending URLs and you will be able to find the IP address of the scrapers. Record these for the next step.
Plugins To Stop Them
RSS Footer – the majority of scrapers will grab your content with one of the autoblogging plugins out there. This will grab your rss feed and add it to their site automatically. I personally use rss footer on my site and add an image link advert back to my hire me page so the scraper becomes my unpaid advertiser. But you can use this plugin to add a footer to your rss posts so people know it’s from your site.
WP Ban – once you know a scrapers IP address or referral site, you can add them to wp-ban a plugin that locks particular IP addresses out of your site and stops them scraping your content.
.htaccess – if you are more tech savvy you can do exactly the same thing by adding the following statement.
order allow,deny deny from 192.168.44.201 deny from 126.96.36.199 deny from 172.16.7.92 allow from all
Getting Scraped Content Removed From The Internet
Probably the best way to get scraped content removed is to send a DCMA notice to the offender.
If that doesn’t work approach the hosting company and get them to remove it. Remember you have copyright of all your content, here is my hosting companies policy https://www.bluehost.com/copyright-claims-policy
Please note if the offender is outside of your Jurisdiction you probably won’t get remedy.
My Personal View – Let It Go
In the words of Elsa from Disney’s Frozen, I let it go … (an apology to any parents of Girls from the age of 0-10 you’ve heard enough of this already 🙂 )
The effort of monitoring for and chasing down people who are scraping your content is just too much, I’ve already got too many things on my plate and policing the internet is not one I want to take on.
Adopt the mindset that it’s a compliment, people only steal good content and that the massive heads at Google will be able to spot a real authority site rather than a shoddy pasted together one. They will know when article X was published and when article X+scrape is indexes I’m sure they can see through it.
This kind of email consulting / help is something I offer to all my maintenance clients, so not only are you getting backups, updates security and monitoring, you also get a WordPress expert on your team for consults via email at a very reasonable fee.
Photo Credit: Tjook via Compfight cc