Banner

Follow Along

RSS Feed Join Us on Twitter On Facebook

Get Engaged

Banner

Featured Article

Google Reconsideration Requests & Replies; understanding the evolutionGoogle Reconsideration Requests & Replies; understanding the evolutionHas anyone else noticed that there seems to be some changes to the replies one receives when filing a...
Read More >>

Latest Comments

Latest Articles

Ranting can be Therapeutic Ranting can be Therapeutic
We all have our hot buttons - things that just drive us up the...
Read More >>
Avoiding Search Strategy PitfallsAvoiding Search Strategy Pitfalls
Building a digital and search strategy isn’t the easiest thing in the world to...
Read More >>
What Google Knows & The War Against SEOsWhat Google Knows & The...
I don’t know how many marketers know about Google Think. Essentially, Google Think is the...
Read More >>

Our Sponsors

Banner
Banner
Banner

Latest Search Videos

Join Us

Banner
Banner
Why you should still care about duplicate content
Written by Ian Lurie
Monday, 24 January 2011 14:21

Google's so dang nice. I could just hug them all.

Recently, they announced that we no longer have to worry about duplicate content. See, Google will sort it all out for us.

So, if you have the same article on your site at; www.mysite.com/article/, www.mysite.com/article/?from=home, and at www.mysite.com/article.html, they'll be all charitable and figure out which one to use. We're saved, Google says. Go about our business.

I immediately got 30 snippy e-mails from developers who already hate me, telling me I've been wasting their time making them clean up duplication issues.

No.

Duplicate Content

I could elaborate on 'No', but it'd require cursing, so I'm going to stick with 'No' and explain why duplicate content still sucks.

 

Wasted crawl budget

I don't care if Google can suss out every instance of duplicate content on the web. You're still forcing them to suss it out.

If you have a 10,000 page web site, and 9,000 pages of those pages are duplicates, then Googlebot still has to crawl 9,000 pages it doesn't need. There. is. no. way. that that is a good thing. Use rel=canonical if you want—Google still has to hit each URL. You're wasting their time. No one likes having their time wasted.

This is all about crawl efficiency. Don't waste a search engine's time if you don't have to. Let a visiting spider grab what it needs and go on its way.

Duplicate content still sucks.

 

There is another search engine

Ever hear of Bing? It's not so speedy or clever. But it does generate 10-15% of all web traffic. If you think that's not worth bothering about, you're in better shape than I am. I'll take any smidgen of relevant traffic I can get.

Duplicate content will still wreak havoc on Bing, as well as on many vertical search engines, Facebook's proto-search engine and everything else people use to crawl the web.

Duplicate content still sucks!

 

Link love

I come to your site and find the article to which I want to link at www.site.com/?blah=foo, and then someone else finds the same content at www.site.com/?blah=foo&dir=dem and links to it there. Congratulations! You just split your link authority in half for that page! Nice job.

Except it's not a nice job. It's a stupid job. And again, rel=canonical may help sort out the link chaos, but not as well as just doing it right in the first f@#)($* place.

Duplicate content still sucks!!!

 

Server performance

First thing you do to improve server performance is set up some kind of caching. Caching stores a copy of all, or most-accessed, pages on your site. But most caching schemes are based on page URLs. Say you have the same exact article at three different URLS. Your web server or caching server will have to store three copies of the same page.

That wastes storage, memory and resources on your server. It also means that, until all three versions of the page are cached, you're still not delivering the performance improvement caching normally generates.

Duplicate content still sucks!!!!!!

 

Analytics mayhem

Trying to track the attention a single page on your site gets? Duplicate content turns it into a shell game. Multiple versions of each page means tracking down each version, averaging time-on-page, averaging bounce rate, etc..

The irony is that many developers create duplication trying to make analytics easier: They'll add something like ?from=topnav to all links in the top navigation so that these show up as separate clicks in traffic reports.

Not smart. You can track which clicks come from which areas using tools like ClickTale or CrazyEgg. And you've created a total mess for engagement analysis.

Duplicate. Content. Is. The. El. Sucko.

 

You get my point

Hopefully by now you get the point. Duplicate content is bad for plenty of reasons. Google's latest questionable claim is another excuse for doing it wrong. Don't buy it. Build your site right, fix duped content and you'll have a faster, better-ranking, easier-to-measure site.

Thoughts?

Ian Lurie -

Ian Lurie is Chief Marketing Curmudgeon and President at Portent, an internet marketing company he started in 1995. Portent is a full-service internet marketing company whose services include SEO, SEM and strategic consulting. He started practicing SEO in 1997 and has been addicted ever since. Ian rants and raves, with a little teaching mixed in, on his internet marketing blog, Conversation Marketing. He recently co-published the Web Marketing for Dummies All In One Desk Reference. In it, he wrote the sections on SEO, blogging, social media and web analytics.

Also hook up via

Read More >>


More articles by this author

Dear Google: This is warDear Google: This is war
Dear Google: With your announcement yesterday, you've become the enemy. My...
Read More >>
How to: Scrape search engines without pissing them offHow to: Scrape search engines without pissing them off
You can learn a lot about a search engine by...
Read More >>
Last Updated on Monday, 24 January 2011 14:23
 

Comments  

 
0 #1 Bill Marshall 2011-01-24 18:02
Well said sir, it often seems to be a constant battle against developers and their (usually) open source content management systems which find ever-more ingenious ways of producing half a dozen addresses for each page.

Just like the old arguments about whether validation helps ranking (when the argument should be does good coding help produce good websites) another ill-considered Google quote gives them all the excuse to carry on doing it wrong.

And if one more programmers tells me that "it uses a 302 redirect because that's the default" then I may just turn into Freddie and start slicing and dicing...
Quote
 
 
0 #2 Seo manager France 2011-01-24 18:17
Definitely a great post about duplicate content!
It's obvious for many of us, but having too many similar pages is a pain, for crawlers and people alike!
Quote
 
 
0 #3 seo india 2011-01-25 07:30
:D thanks.. I myslf have a site of 300 pages n some are similar.. helpful post for me.
Quote
 
 
0 #4 regional seo 2011-01-25 09:07
no doubt this post is very informative and have facts that there are many issues you can face because of duplicate content apart from only the typical "content duplication penalty". also the Analytic Expert should refer it for their practices because they do nourish duplicate pages.
Quote
 
 
0 #5 reverse seo services 2011-01-25 09:09
thanks for this information, as i do tend to use those tags for analytics and i also have site which has 40k around pages.
Quote
 
 
0 #6 Ben 2011-01-25 22:41
Totally agree. Sculpting is just as important now with competition on the rise.
Quote
 
 
0 #7 Doc Sheldon 2011-01-28 04:05
Good article, Ian. I've been somewhat on the fence regarding dup. content. But I've always leaned toward the cautious side. It's just not good practice, IMO, and it certainly does nothing for the user experience, even if Google does catch it all.
Quote
 
 
+1 #8 g1smd 2011-01-28 23:38
There's a seemingly endless list of ways to screw up - developers seem to getting ever more inventive at how to produce sites that are "teh suck".

Most forum, blog, cart and CMS software is riddled with these problems. Sure, Google will "clean it up for you" - by taking a guess which URL to use. You can be quite sure that it will not be the one that you would have chosen.

Good post. Maybe this time someone will sit up and take notice, but I am not holding my breath.
Quote
 
 
0 #9 Ana Hoffman 2011-01-31 02:14
Ian - do you have any resources you could share regarding this announcement from Google?

Would love to read more about it...

Thanks!

Ana
Quote
 
 
0 #10 Bill Marshall 2011-01-31 12:48
Personally I gave up reading Google announcements 'cos they always lack substance and are full of generalities and idealised situation (and often a bit of FUD too). They said 10 years ago that they could easily detect hidden text yet their serps have been full of sites using it blatantly ever since.
Watch what they do not what they say!
Quote
 
 
0 #11 Link Building 2011-07-27 23:07
I agree that we should remove duplicates, especially when we know they exist -- it is a waste of the search engine's time. While one may think the task is tedious, it makes for a better website.
Quote
 

Add comment


Security code
Refresh

Getting Around the Site

Home - all the latest on SNC
SEO - our collection of SEO articles
Technical SEO - for the geeks
Latest News - latest news in search
Analytics - measure up and convert
RSS Rack - feeds from around the industry
Search - looking for something specific?
Authors - Author Login
SEO Training - Our sister site
Contact Us - get in touch with SNC

What's New?