Google's so dang nice. I could just hug them all. Recently, they announced that we no longer have to worry about duplicate content. See, Google will sort it all out for us. So, if you have the same article on your site at; www.mysite.com/article/, www.mysite.com/article/?from=home, and at www.mysite.com/article.html, they'll be all charitable and figure out which one to use. We're saved, Google says. Go about our business. I immediately got 30 snippy e-mails from developers who already hate me, telling me I've been wasting their time making them clean up duplication issues. No.  I could elaborate on 'No', but it'd require cursing, so I'm going to stick with 'No' and explain why duplicate content still sucks. Wasted crawl budgetI don't care if Google can suss out every instance of duplicate content on the web. You're still forcing them to suss it out. If you have a 10,000 page web site, and 9,000 pages of those pages are duplicates, then Googlebot still has to crawl 9,000 pages it doesn't need. There. is. no. way. that that is a good thing. Use rel=canonical if you want—Google still has to hit each URL. You're wasting their time. No one likes having their time wasted. This is all about crawl efficiency. Don't waste a search engine's time if you don't have to. Let a visiting spider grab what it needs and go on its way. Duplicate content still sucks. There is another search engineEver hear of Bing? It's not so speedy or clever. But it does generate 10-15% of all web traffic. If you think that's not worth bothering about, you're in better shape than I am. I'll take any smidgen of relevant traffic I can get. Duplicate content will still wreak havoc on Bing, as well as on many vertical search engines, Facebook's proto-search engine and everything else people use to crawl the web. Duplicate content still sucks! Link loveI come to your site and find the article to which I want to link at www.site.com/?blah=foo, and then someone else finds the same content at www.site.com/?blah=foo&dir=dem and links to it there. Congratulations! You just split your link authority in half for that page! Nice job. Except it's not a nice job. It's a stupid job. And again, rel=canonical may help sort out the link chaos, but not as well as just doing it right in the first f@#)($* place. Duplicate content still sucks!!! Server performanceFirst thing you do to improve server performance is set up some kind of caching. Caching stores a copy of all, or most-accessed, pages on your site. But most caching schemes are based on page URLs. Say you have the same exact article at three different URLS. Your web server or caching server will have to store three copies of the same page. That wastes storage, memory and resources on your server. It also means that, until all three versions of the page are cached, you're still not delivering the performance improvement caching normally generates. Duplicate content still sucks!!!!!! Analytics mayhemTrying to track the attention a single page on your site gets? Duplicate content turns it into a shell game. Multiple versions of each page means tracking down each version, averaging time-on-page, averaging bounce rate, etc.. The irony is that many developers create duplication trying to make analytics easier: They'll add something like ?from=topnav to all links in the top navigation so that these show up as separate clicks in traffic reports. Not smart. You can track which clicks come from which areas using tools like ClickTale or CrazyEgg. And you've created a total mess for engagement analysis. Duplicate. Content. Is. The. El. Sucko. You get my pointHopefully by now you get the point. Duplicate content is bad for plenty of reasons. Google's latest questionable claim is another excuse for doing it wrong. Don't buy it. Build your site right, fix duped content and you'll have a faster, better-ranking, easier-to-measure site. Thoughts? | Ian Lurie - | 
| Ian Lurie is Chief Marketing Curmudgeon and President at Portent, an internet marketing company he started in 1995. Portent is a full-service internet marketing company whose services include SEO, SEM and strategic consulting. He started practicing SEO in 1997 and has been addicted ever since. Ian rants and raves, with a little teaching mixed in, on his internet marketing blog, Conversation Marketing. He recently co-published the Web Marketing for Dummies All In One Desk Reference. In it, he wrote the sections on SEO, blogging, social media and web analytics. Also hook up via My Google+ page Read More >> | |
|
More articles by this author
|
Comments
Just like the old arguments about whether validation helps ranking (when the argument should be does good coding help produce good websites) another ill-considered Google quote gives them all the excuse to carry on doing it wrong.
And if one more programmers tells me that "it uses a 302 redirect because that's the default" then I may just turn into Freddie and start slicing and dicing...
It's obvious for many of us, but having too many similar pages is a pain, for crawlers and people alike!
Most forum, blog, cart and CMS software is riddled with these problems. Sure, Google will "clean it up for you" - by taking a guess which URL to use. You can be quite sure that it will not be the one that you would have chosen.
Good post. Maybe this time someone will sit up and take notice, but I am not holding my breath.
Would love to read more about it...
Thanks!
Ana
Watch what they do not what they say!
RSS feed for comments to this post