The Search Engines Do Not Like Swiss Cheese
404s happen and they can leave your site looking like Swiss cheese when what you and the search engines really want is a nice, authoritative hunk of Cheddar.
For those who don’t know what a 404 is, it is the server response code when a page cannot be found. The harmless 404 is when a website visitor mistypes a URL for a page that never existed. The harmful 404 is when a page previously existed and has been removed from the site.
404s as a Search Signal
Everything you add, change and remove from your site is a signal to search engines and a 404 error code can say many things to crawlers when they come across it. Here’s a few:
The first time a search crawler stumbles across a 404 you will often see very little negative impact. Even the search engines know that shit happens (even on the largest of sites that require many levels of approval before changes are made). So, nothing really happens and the search engines will try to recrawl the page in the future.
It will take several crawls before the search engines begin to downgrade the URL that is serving the 404. For smaller sites that don’t have many incoming links to the 404’ing page the downgrade could appear to be slower. Whereas, a larger site with a large number of links to the 404’ing page could see the downgrade happen quickly as the crawlers are returning to the page much more frequently.
Yeah we took that down, purposefully.
This signal is the result of many recrawls and no change in the status of the page. The positioning downgrades will be noticeable as the search engines do not want to continue serving a 404 page to its users.
This can actually be a good signal when you are serious about removing a page or subset of pages. A scenario that might put this in perspective is: An e-commerce site has decided to drop a product, product brand or product type from their site.
They no longer want to be associated with selling any of these items so removing the content and allowing a 404 to be served will signal the search engines that the site is no longer relevant for searches of that nature. If this is your goal, may I recommend a 410 server response code of ‘Gone’ if you want to speed this process up.
This is a bad signal when you are removing the pages because they have been relocated. This is the case when you want to capture any in-bound links/authority that the old pages may have via 301 redirects and send the value along to the new pages. As an SEO this is probably the most common scenario you run into when changes are happening on a site, so be sure you know you 301 redirects.
Monitoring your 404s
Keeping an eye on unexpected 404s will help you to keep your site in shape and sending the proper signals to the search engines as they crawl.
Open Site Explorer
SEOmoz really hit it out of the park when they built Open Site Explorer. For monitoring 404s you are going to use the ‘Top Pages’ report. This report lists in decending order the top pages of your site, the corresponding Page Authority, number of in-bound linking domains and, most importantly for 404 monitoring purposes, the HTTP status code of that page. This is what a 404 will look like when you pull this report from OSE:
(URLs have been removed to protect the innocent)
OSE clearly calls out when they have discovered a page that is returning a 404, and with the additional data of linking domains and page authority, you can quickly prioritize the list as to which need action first. With a pro membership to SEOmoz you can download up to 10,000 pages for this report.
Google Webmaster Tools
Google Webmaster Tools continues to grow from its early days as a sitemap submission tool into a full set of webmaster tools that provide data on a number of important elements on a website. Like OSE, GWT gives you a report on all of the pages that their crawler has tried to access that returns a 404 code. Once in a website profile you can find this report located in Diagnostics -> Crawl errors -> Not Found. Here is what that report looks like in GWT:
(URLs have been removed to protect the innocent)
Google provides you with the status code in the detail column, the most recent date that the code was returned and the number links that Google has discovered linking to this page (very helpful). You can quickly grab all the errors Google is reporting and download them to an excel file (located at the bottom of the report in GWT). You can either sort this list more occurring offender or most recent offender. Both are valuable views of the data.
Fixes with SEO Intentions
Put the Page Back
This may seem like the ‘no duh’ solution. But, you may not have known that there were incoming links to the page or even anyone who was reading it (shame on you for not having analytics) so give the people what they want and put the page back.
If you have moved the content of this page or have taken it down with a replacement in mind, then you should be redirecting the page using 301 redirects. The redirect will signal to the search engines that some of the value of the old page should be passed on to the new page. This may help you maintain the position within the SERPs for this content, allow you to maintain overall site authority and help a new page of content get crawled and positioned quickly.
410 Gone Server Code
As I alluded to above, there is another server code that can send an even stronger signal if you want a page removed from the search index. That is the 410 Gone Server Code. This code holds more weight with the search engines when it comes to removing URLs from the index. This will help you to cut ties quickly with content you no longer want on your site and help to cut off inbound linking signals that the search engines are evaluating as well.
Unique Landing Page
A unique landing page can be used to replace the original content page. This landing page is specifically designed to help maintain the current positions held by that URL in the SERPs but funnel incoming link value to another page on the site. Applying paid search landing page techniques, you can funnel much of the value and traffic this page has back to a select couple of locations on your site. Try to keep these pages as unique and valuable as possible, this will help you to avoid soft 404 labeling by Google.
Make it Cheddar
Monitoring and repairing the 404 holes in a site is an ongoing task and can be a difficult one for both in-house and external consultants. But by setting up the proper tools you can become effective and efficient at it. And, always remember to deploy tactics that will send proper, strong signals to the search engines whenever they crawl.