Solving Duplicate Content Issues Arising From Faceted Navigation
I’m a big fan of faceted navigation on ecommerce websites, also known as layered navigation. With faceted nav users can find exactly what they’re looking for with just a few clicks, even on websites that contain tens of thousands of products. A good implementation of faceted nav is a user experience dream come true.
Faceted nav also has SEO benefits, in that these facets serve as keyword-rich links and ‘tags’ of sort that add semantic relevance to the products contained within each facet.
But it’s not all good news: faceted nav can also result in problems with indexation, specifically duplicate content issues. As sometimes many different facets will contain nearly identical sets of products with little variation, search engine spiders could end up in crawling loops where they crawl slightly different product lists over and over again.
One example is an ecommerce site under development I came across recently. Build in Magento, this site uses faceted nav and contains about 1200 products. But when I unleashed Xenu on it, it kept finding new pages until I finally aborted the crawl at over 40,000 URLs crawled.
There are several different ways to solve these faceted nav indexation issues:
Block facets with robots.txt
Using robots.txt to block search engines from crawling faceted navigation pages is probably the most brute force approach to the problem. It will undoubtedly solve the duplicate content issue, as it will block search engines from crawling vast amounts of pages, but it has several side-effects that make this a less than ideal solution.
For one, it will mean that the flow of PageRank within your site will be severely distorted. A natural flow of PageRank within your site should come from a solid site structure. Blocking faceted nav pages with robots.txt effectively distorts your site structure as it is perceived by search engine crawlers, as large parts of your site are basically blacked out for search engines. Also, you lose any semantic SEO value the faceted navigation has.
One small upside is that if your site has a low PageRank and a large amount of products, you’re less likely to run out of crawl budget before all your product pages are indexed.
Verdict: don’t do this unless you really don’t have any other choice.
Nofollow your faceted nav links
You can tag your faceted navigation links with rel=nofollow, thus preventing search engines from indexing your faceted nav pages. A slightly less blunt instrument than the robots.txt blocking approach, this solution nonetheless suffers from similar problems: it distorts the flow of PageRank within your site, as nofollowed links cause PR to evaporate.
Verdict: don’t do this.
Use rel=canonical on all faceted nav pages
By using the canonical tag on all faceted nav pages and making sure they refer to the most relevant/important facet (or a ‘view all products’ single page), you can ensure the duplicate content faceted nav pages are not included in the search engines’ indices. The flow of PageRank is unaffected, and you also preserve the semantic value of your facets.
However search engines will still crawl all the duplicate content pages, which means your crawl budget could be used up before all product pages are indexed.
Verdict: best used in conjunction with one of the other preferred solutions.
This is a very solid solution, but it has one caveat: the semantic value of the faceted nav is lost.
Verdict: good solution if your main focus for faceted nav is user experience.
Meta noindex/follow tag on faceted nav pages
In order to prevent search engine crawlers from indexing all your duplicate content pages, you can tell them to keep these pages out of their indices but to still follow the links contained within. With the meta robots tag using the noindex,follow value, you do just that. The pages that have this meta tag will not appear in search engines, but crawlers will still find the products that are contained within these faceted nav pages. The flow of PageRank is preserved, and the semantic value of the facets is also intact.
However as with some other solutions, low PR sites may run out of crawl budget.
Verdict: a very good solution, especially when combined with canonical tags and static URLs.
Static URLs for faceted nav pages
Often a CMS that supports faceted navigation uses parameters in their URLs. Every time a facet is used to filter the listed products, another parameter is appended to the URL. As each URL is different, it will be treated as a separate webpage by search engine spiders, even if it contains the exact same products.
To prevent duplicate content issues arising from these parameter-driven URLs, you can configure the CMS to use static URLs for predefined facets, regardless of the order in which that facet was reached. This will drastically reduce the number of URLs on your site, and thus prevent duplicate content issues. So if a user refines a product listing first by price and then by colour, the URL of the page they end up on will be identical to the page reached by a user that refines first by colour and then by price.
Verdict: if you have faceted nav and you don’t do this, you’re an idiot.
Faceted navigation is a very potent instrument, but you need to implement it the right way. In my opinion the best approach to prevent duplicate content and indexation issues is using static URLs for your facets, combined with meta noindex,follow for facets that have no SEO value. Throw in rel=canonical meta tags that point to your core facets, and the result is the best of both worlds: a solid user experience and the full SEO value.
There are probably some other solutions out there to faceted navigation issues. If you know of other/better approaches, leave them in the comments.