It drives me nuts to see people publishing the results of “tests” they’ve conducted, that are so poorly designed that they could have gotten findings just as meaningful from a random drawing.
DOE, or design of experiments, is a strict protocol, focused on eliminating variables (ideally, all but one), so that a result can be identified as causative, rather than simply correlative. Designing such a test is a complex process and often takes as much as, or more, time as the actual experiment. It is also the most common point of failure for the validity of most tests, even among scientists.
I would say that in the last nine years, I haven’t seen more than a half dozen properly constructed tests published, dealing with SEO or online marketing, just on the basis of the tests’ construction. Then, of course, there’s the very significant matter of often inadvertently contaminating the results.
Over the years, I’ve worked on several teams that were formed for the express purpose of conducting definitive testing of some theory. They were often comprised of experts, scientists and technicians with an intimate knowledge of the topic area, many with master’s or doctorate degrees in the field, and all with extensive experience in constructing and conducting definitive testing.
The first cut at our DOE was invariably found to have holes in it. Every. Single. Time. The design is the first important step, and often is more difficult than either conducting the test or analyzing the results.
So, to be fair, it’s not a realistic expectation that every test we see of SEO or online marketing will be flawless. If one out of fifty provides results that’ll stand up to scrutiny, we’re probably doing well.
And given the obscure nature of the data we’re usually dealing with in this business, I would say it’s literally impossible to totally limit the variables so as to get verifiable and repeatable results. I’d even go so far as to say that anyone that claims to be able to do so is either woefully ignorant of scientific testing procedures or has a relaxed moral compass.
That said, I don’t believe the latter is the case in the vast majority of instances. Still, it’s important to remember that if you’re analyzing the results of a poorly designed test, well”¦ a really well-done analysis of crap data will nearly always yield a crap analysis.
Testing and Troubleshooting Have a Lot in Common
Even though most of us don’t hold science-related doctorates, and we can’t isolate sufficient variables to define reliable findings, there are still things we can do to at least give the correlative data we find a higher probability of accuracy. And that’s really the key, in our business”¦ probability. (I smile happily when I see the tests reported by some SEO, in terms of “probabilities””¦ they get it!)
When testing to see what effect “X” has on rankings, authority, QS, click-through, conversion, etc., there are so many variables, it’s daunting. It’s virtually impossible for all of them to be eliminated or isolated. What we can do, however, is stop increasing the number of variables that can affect that long list of possible catalysts.
So, why is testing similar to troubleshooting? Because if you don’t change only one thing at a time, you’ll have learned nothing about the situation when you’ve finished.
Test One Aspect at a Time
For example, if you wanted to test conversion differences between a red button and a green button, that would be a fairly simple test ““ you would simply run both independently, and track your results.
But what if you want to test several different combinations of eight different aspects? For instance:
- Large button vs. small
- Green button vs. red
- Flashing vs. steady-state
- left sidebar vs. right sidebar
Now it gets a bit more complex, with more possible permutations, but still very manageable. But if you determine that your conversion with a small, green, non-flashing button in the left sidebar converts at 12% and then try a large, green , non-flashing button in the right sidebar and see a 22% conversion factor, what caused it? Was it the larger size, or the change from the left to the right of the page?
You could just be satisfied with10% higher conversions, but what if that same test in a flashing version would yield 25%? Would you still be happy with 22%? My guess is that extra 3% would be nice to have.
Maybe this example will be clearer:
If your car suddenly suffers a dramatic decrease in gas mileage, and you make your gas mixture leaner, retard your timing and change to a higher octane fuel, even if it returns to normal mileage, would you have any idea which change was the one that fixed the problem? For that matter, if it got even worse, would you know which one made it worse? Nope! You’d have to go back to where you started and try all over again.
Or, you could just be happy with it, and maybe post something on your personal blog or Facebook page about it. Just don’t expect it to work the next time!
I know, this is pretty basic stuff, right? Yet I see people ignoring this simple concept all the time. They want to improve their site rankings, so they rush out and tweak half a dozen on-page items, go on a social media promotion blitz, build some links and kill a chicken under a full moon. A few days later, when their site gains several positions for their favorite keyword, they publish a blog post to share their success.
If they’d had the patience to isolate those tasks and test them independently , they might actually be able to repeat the results and have some findings that are worth sharing.
For the folks out there that look to others to provide them with guidance on what to think , before you accept anything at its stated value, you might do well to try to determine whether they really have a clue ““ sadly, there’s a lot of self-proclaimed experts and gurus out there that are full of nothing but hot air (trying to keep it clean here). You’d probably do better looking for the folks that’ll help you remember how to think.
And for those of you that want to share your “findings”, please”¦ try to be realistic regarding what are truly findings or conclusions, and what are no more than probabilities. Sharing isn’t helpful when it’s misleading.
You’re certainly free to base your actions on anything you like but try not to mislead others into believing in fact where there is only conjecture.
This is one of my pet peeves as well. Structuring the test is actually the most important part of the process and very few seem to give it the attention it deserves.
Yet the thing that caught my eye the most was the passage about making a number of changes at once. Knowing what worked and didn’t can help you optimize the right way in the future.
Too often I see teams get happy feet and make change after change after change without letting those changes be fully digested and reflected by Google.
While it’s gotten vastly faster, the element of time is still pretty darn important.
Great article Doc. 🙂
It reminds me of the various A/B testing and multivariate testing which one should do to see what works best for their website.
Forums are the absolute worst places for the publishing of ‘test results’. The worrying thing is that lots of people believe and take action based on other people’s findings even though their findings are completely worthless.
Comments are closed.