Google’s so dang nice. I could just hug them all.
Recently, they announced that we no longer have to worry about duplicate content. See, Google will sort it all out for us.
So, if you have the same article on your site at; www.mysite.com/article/, www.mysite.com/article/?from=home, and at www.mysite.com/article.html, they’ll be all charitable and figure out which one to use. We’re saved, Google says. Go about our business.
I immediately got 30 snippy e-mails from developers who already hate me, telling me I’ve been wasting their time making them clean up duplication issues.
I could elaborate on ‘No‘, but it’d require cursing, so I’m going to stick with ‘No‘ and explain why duplicate content still sucks.
Wasted crawl budget
I don’t care if Google can suss out every instance of duplicate content on the web. You’re still forcing them to suss it out.
If you have a 10,000 page web site, and 9,000 pages of those pages are duplicates, then Googlebot still has to crawl 9,000 pages it doesn’t need. There. is. no. way. that that is a good thing. Use rel=canonical if you want—Google still has to hit each URL. You’re wasting their time. No one likes having their time wasted.
This is all about crawl efficiency. Don’t waste a search engine’s time if you don’t have to. Let a visiting spider grab what it needs and go on its way.
Duplicate content still sucks.
There is another search engine
Ever hear of Bing? It’s not so speedy or clever. But it does generate 10-15% of all web traffic. If you think that’s not worth bothering about, you’re in better shape than I am. I’ll take any smidgen of relevant traffic I can get.
Duplicate content will still wreak havoc on Bing, as well as on many vertical search engines, Facebook’s proto-search engine and everything else people use to crawl the web.
Duplicate content still sucks!
I come to your site and find the article to which I want to link at www.site.com/?blah=foo, and then someone else finds the same content at www.site.com/?blah=foo&dir=dem and links to it there. Congratulations! You just split your link authority in half for that page! Nice job.
Except it’s not a nice job. It’s a stupid job. And again, rel=canonical may help sort out the link chaos, but not as well as just doing it right in the first f@#)($* place.
Duplicate content still sucks!!!
First thing you do to improve server performance is set up some kind of caching. Caching stores a copy of all, or most-accessed, pages on your site. But most caching schemes are based on page URLs. Say you have the same exact article at three different URLS. Your web server or caching server will have to store three copies of the same page.
That wastes storage, memory and resources on your server. It also means that, until all three versions of the page are cached, you’re still not delivering the performance improvement caching normally generates.
Duplicate content still sucks!!!!!!
Trying to track the attention a single page on your site gets? Duplicate content turns it into a shell game. Multiple versions of each page means tracking down each version, averaging time-on-page, averaging bounce rate, etc..
The irony is that many developers create duplication trying to make analytics easier: They’ll add something like ?from=topnav to all links in the top navigation so that these show up as separate clicks in traffic reports.
Not smart. You can track which clicks come from which areas using tools like ClickTale or CrazyEgg. And you’ve created a total mess for engagement analysis.
Duplicate. Content. Is. The. El. Sucko.
You get my point
Hopefully by now you get the point. Duplicate content is bad for plenty of reasons. Google’s latest questionable claim is another excuse for doing it wrong. Don’t buy it. Build your site right, fix duped content and you’ll have a faster, better-ranking, easier-to-measure site.