Press "Enter" to skip to content

View Source: Why it Still Matters and How to Quickly Compare it to a Rendered DOM

The following article was originally posted on Polemic Digital by Barry Adams on 1/17/2018. I was sufficiently impressed with its usefulness that I contacted Barry and asked his permission to syndicate it here – which he graciously granted. So, without further ado…

***

SEOs love to jump on bandwagons. Since the dawn of the industry, SEO practitioners have found hills to die on – from doorway pages to keyword density to PageRank Sculpting to Google Plus.

One of the latest hypes has been ‘rendered DOM’; basically, the fully rendered version of a webpage with all client-side code executed. When Google published details about their web rendering service last year, some SEOs were quick to proclaim that only fully rendered pages mattered. In fact, some high profile SEOs went as far as saying that “view source is dead” and that the rendered DOM is the only thing an SEO needs to look at.

These people would be wrong, of course.

Such proclamations stem from a fundamental ignorance about how search engines work. Yes, the rendered DOM is what Google will eventually use to index a webpage’s content. But the indexer is only part of the search engine. There are other aspects of a search engine that are just as important, and that don’t necessarily look at a webpage’s rendered DOM.

One such element is the crawler. This is the first point of contact between a webpage and a search engine. And, guess what, the crawler doesn’t render pages. I’ve explained the difference between crawling and indexing before, so make sure to read that.

Due to the popularity of JavaScript and SEO at the moment, there are plenty of smart folks conducting tests to see exactly how putting content in to JavaScript affects crawling, indexing, and ranking. So far we’ve learned that JavaScript can hinder crawling, and that indexing of JS-enabled content is often delayed.

So we know the crawler only sees a page’s raw HTML. And I suspect that Google has a multilayered indexing approach that first uses a webpage’s raw HTML before it gets around to rendering the page and extracting that version’s content. In a nutshell, a webpage’s raw source code still matters. In fact, it matters a lot.

View Source

I’ve found it useful to compare a webpage’s raw HTML source code to the fully rendered version. Such a comparison enables me to evaluate the differences and look at any potential issues that might occur with crawling and indexing.

For example, there could be some links to deeper pages that are only visible once the page is completely rendered. These links would not be seen by the crawler, so we can expect a delay to the crawling and indexing of those deeper pages.

Or we could find that a piece of JavaScript manipulates the DOM and makes changes to the page’s content. For example, I’ve seen comment plugins insert new <h1> heading tags on to a page, causing all kinds of on-page issues.

So let me show you how I quickly compare a webpage’s raw HTML with the fully rendered version.

HTML Source

Getting a webpage’s HTML source code is pretty easy: use the ‘view source’ feature in your browser (Ctrl+u in Chrome) to look at a page’s source code – or right-click and select ‘View Source’ – then copy & paste the entire code in to a new text file.

Rendered Code

Extracting the fully rendered version of a webpage’s code is a bit more work. In Chrome, you can open the browser’s DevTools with the Ctrl+Shift+i shortcut, or right-click and select ‘Inspect Element’.

In this view, make sure you’re on the Elements tab. There, right-click on the opening <html> tag of the code, and select Copy > Copy outerHTML.

Copy OuterHTML

You can then paste this in to a new text file as well.

Compare Raw HTML to Rendered HTML

To compare the two versions of a webpage’s code, I use Diff Checker. There are other tools available, so use whichever you prefer. I like Diff Checker because it’s free and it visually highlights the differences.

Just copy the two versions in to the two Diff Checker fields and click the ‘Find Difference’ button. The output will look like this:

Diff Checker Output

In many cases, you’ll get loads of meaningless differences such as removed spaces and closing slashes. To clean things up, you can do a find & replace on the text file where you saved the raw HTML, for example to replace all instances of ‘/>’ with just ‘>’. Then, when you run the comparison again, you’ll get much cleaner output:

Diff Checker Output Clean

Now you can easily spot any meaningful differences between the two versions, and evaluate if these differences could cause problems for crawling and indexing.

Unminify

Sometimes a webpage’s source code will be minified, which removes all spaces and tabs to save bytes. This leads to big walls of text that can be very hard, if not impossible, to analyse:

Minified HTML Code

In that case, I use unminify.com to put tabs and spaces back and make it a clearly readable piece of source code. This then helps with identifying problems when you use Diff Checker to compare the two versions.

Unminified HTML Code

Comparing two neatly formatted pieces of code is much easier and allows you to quickly focus on areas of the code that are genuinely different – which indicates that either the browser or a piece of client-side code has manipulated the page in some way.

Google Fetch & Render

The importance of a webpage’s raw HTML code for SEO is implied by Google itself. In Search Console’s ‘Fetch as Google’ feature, there are two options for looking at a webpage:

GSC Fetch and Render

These two options highlight the different ways in which Google’s systems will interact with a webpage:

  • Fetch: how the crawler sees the page
  • Fetch and Render: how Google’s indexer will eventually render the page

Because Google’s crawler doesn’t fully render webpages, the raw HTML source code will continue to be an important aspect of any holistic analysis of a webpage’s SEO. Failure to take the source code in to account will leave you open to a whole range of rookie mistakes.

What tools do you use to compare raw HTML and rendered code? Share your own tips and tricks in the comments.

Facebooktwittergoogle_plusmail

One Comment

  1. Alan Whiteside Alan Whiteside January 22, 2018

    Thanks for that article it was really informative as well as confirms what my thoughts of the process’s used by Google. It also highlights some of the toolkit required when looking into the code at page level.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.