The following article was originally posted on Polemic Digital by Barry Adams on 1/17/2018. I was sufficiently impressed with its usefulness that I contacted Barry and asked his permission to syndicate it here – which he graciously granted. So, without further ado…
SEOs love to jump on bandwagons. Since the dawn of the industry, SEO practitioners have found hills to die on – from doorway pages to keyword density to PageRank Sculpting to Google Plus.
One of the latest hypes has been ‘rendered DOM’; basically, the fully rendered version of a webpage with all client-side code executed. When Google published details about their web rendering service last year, some SEOs were quick to proclaim that only fully rendered pages mattered. In fact, some high profile SEOs went as far as saying that “view source is dead” and that the rendered DOM is the only thing an SEO needs to look at.
These people would be wrong, of course.
Such proclamations stem from a fundamental ignorance about how search engines work. Yes, the rendered DOM is what Google will eventually use to index a webpage’s content. But the indexer is only part of the search engine. There are other aspects of a search engine that are just as important, and that don’t necessarily look at a webpage’s rendered DOM.
One such element is the crawler. This is the first point of contact between a webpage and a search engine. And, guess what, the crawler doesn’t render pages. I’ve explained the difference between crawling and indexing before, so make sure to read that.
So we know the crawler only sees a page’s raw HTML. And I suspect that Google has a multilayered indexing approach that first uses a webpage’s raw HTML before it gets around to rendering the page and extracting that version’s content. In a nutshell, a webpage’s raw source code still matters. In fact, it matters a lot.
I’ve found it useful to compare a webpage’s raw HTML source code to the fully rendered version. Such a comparison enables me to evaluate the differences and look at any potential issues that might occur with crawling and indexing.
For example, there could be some links to deeper pages that are only visible once the page is completely rendered. These links would not be seen by the crawler, so we can expect a delay to the crawling and indexing of those deeper pages.
So let me show you how I quickly compare a webpage’s raw HTML with the fully rendered version.
Getting a webpage’s HTML source code is pretty easy: use the ‘view source’ feature in your browser (Ctrl+u in Chrome) to look at a page’s source code – or right-click and select ‘View Source’ – then copy & paste the entire code in to a new text file.
Extracting the fully rendered version of a webpage’s code is a bit more work. In Chrome, you can open the browser’s DevTools with the Ctrl+Shift+i shortcut, or right-click and select ‘Inspect Element’.
In this view, make sure you’re on the Elements tab. There, right-click on the opening <html> tag of the code, and select Copy > Copy outerHTML.
You can then paste this in to a new text file as well.
Compare Raw HTML to Rendered HTML
To compare the two versions of a webpage’s code, I use Diff Checker. There are other tools available, so use whichever you prefer. I like Diff Checker because it’s free and it visually highlights the differences.
Just copy the two versions in to the two Diff Checker fields and click the ‘Find Difference’ button. The output will look like this:
In many cases, you’ll get loads of meaningless differences such as removed spaces and closing slashes. To clean things up, you can do a find & replace on the text file where you saved the raw HTML, for example to replace all instances of ‘/>’ with just ‘>’. Then, when you run the comparison again, you’ll get much cleaner output:
Now you can easily spot any meaningful differences between the two versions, and evaluate if these differences could cause problems for crawling and indexing.
Sometimes a webpage’s source code will be minified, which removes all spaces and tabs to save bytes. This leads to big walls of text that can be very hard, if not impossible, to analyse:
In that case, I use unminify.com to put tabs and spaces back and make it a clearly readable piece of source code. This then helps with identifying problems when you use Diff Checker to compare the two versions.
Comparing two neatly formatted pieces of code is much easier and allows you to quickly focus on areas of the code that are genuinely different – which indicates that either the browser or a piece of client-side code has manipulated the page in some way.
Google Fetch & Render
The importance of a webpage’s raw HTML code for SEO is implied by Google itself. In Search Console’s ‘Fetch as Google’ feature, there are two options for looking at a webpage:
These two options highlight the different ways in which Google’s systems will interact with a webpage:
- Fetch: how the crawler sees the page
- Fetch and Render: how Google’s indexer will eventually render the page
Because Google’s crawler doesn’t fully render webpages, the raw HTML source code will continue to be an important aspect of any holistic analysis of a webpage’s SEO. Failure to take the source code in to account will leave you open to a whole range of rookie mistakes.
What tools do you use to compare raw HTML and rendered code? Share your own tips and tricks in the comments.