|Query classification; understanding user intent|
|Written by David Harry|
|Tuesday, 31 May 2011 13:00|
What exactly is the function of a search engine? In simplest terms it acquires, stores and returns information (from the web). Ok, simple enough. But we're talking about people here, people seeking information. How they interact with the search engine is often a huge problem. What is the user intent? A good way of starting to pick apart that puzzle is by classification of query types. And that's what we're going to be looking at today.
To understand what a user truly wants when searching for something you'd need to ask each user what it is they are after. While that works for the offline mom and pop store, it isn't at all feasible for a search engine. Thus an automated approach needs to be taken. What's more limiting, is that they have to infer intent from very few words and little in the way of (explicit) interaction.
Why would that be important to SEOs? Well, that's even easier. If we understand how people search for things, we are in a far better position to actually target our programs to ensure the highest, most relevant, levels of traffic for our sites and our clients. Understanding how search engines attack the problem, can be VERY useful in our own targeting and programming.
The basics of query classification and beyond
One of the reasons a search engine looks at classifying queries is to better understand user intent. To do that they will look at the search task process as such;
Interestingly searchers have certainly evolved over the years beyond mere information needs, into commercial and navigational (seeking known entity) as well. We also can consider that, unlike say.. a library, web searchers are looking for a wide variety of mediums (text, images, multimedia) as well as from various locales (work, home, mobile). Search engines perform social networking functions. They Act as dictionaries, spell checkers and thesauruses.
This is why classification has become more and more important to search engineers over the years. Understanding, as close as possible, the intent, is paramount. The interesting part is the ever changing landscape of exactly what the intent is.
Some studies make the case that users are prone to a higher level keyword approach to simply get near the vicinity, preferring to click through at that point and search the local site for the exact information need. These have been referred to as 'teleporting queries'.
Classification of query types, among SEOs at least, have generally come in three flavours;
But, these are simply broad categorizations that we should play hard and fast with. Many queries fall into more than one category and that's actually quite important for SEOs to understand. The reason we care about these is that they play a strong roll in keyword research and ultimately targeting and content programs.
The following table gives you a sense of the various classification types in this area (click for full size);
A different mind set
So let's go beyond the traditional understanding of query types. We have looked at the core types so far, but another paper I came across broke things down a little differently. The reason I decided to bring this up is because we need to understand things aren't always the same.
Here's a chart from the paper which helps understand this approach;
To a search engineer, on the larger level, there are two simple aspects to a query; intent and satisfaction. Each person using a search engine has a goal and classification helps break down these goals into bite sized pieces.
Not to be taken too seriously is some of the data this particular group found in their research;
I say not to take it to heart because as we all know a single data set never tells us the entire picture. This was actually taken from Alta Vista data, soooooo... take it for what it is.
Associating Goals with Queries
If we consider goals as understanding intent, then we can break associations into two areas familiar to most of the search geeks reading my ramblings over the years; implicit and explicit.
Another common element we see in query classification is building machine learning approaches based on training sets. As you'd imagine, the larger the data set of query data you have, the better associations you can make between queries and intent/satisfaction. Just because behavioural data may not be overly-valuable for ranking elements, doesn't mean it's off the table altogether.
In the paper I cited earlier, they talk about using behavioural data to seek out telling signals the user might give;
There are actually other actions that can be tracked such as;
You get the idea. Behavioural data, on a large scale, can bring a great deal of data to further understand the goals and potential intentions of users through implicit and explicit data. We can also see this in recommendation engine elements (Google Suggest, refinements etc.).
What Can SEOs Learn From Query Classification
To begin with, let us look at the core goal of classification; assessing user intent. That of course should be obvious as far as why we, as SEOs, would want to also understand this. There are no tools out there that really give us this. Which means, to some extent, we have to look at potential query spaces and establish what user goals we're trying to service. If you use the classic informational/transactional/navigational approach or the above 'resource' model, is inconsequential. What we need to do is align targeting and content programs to best serve these needs.
When do we look at it? For the most part understanding query classification plays into one of the first elements of an SEO program; keyword research. Out SEO programs live and die from the efficacy of the keyword research. Keyword research is focused on matching user intent with our (ranking) targets.
It should be noted that most queries are informational in nature. Even quasi-classifications such as seeking information on a product prior to purchase. In fact, much of the research seems to show that navigational queries are often seeking information about a product and the query is often refined to reflect this. As such we must consider having a content program that reflects this.
Below are some tables from a recent research paper that shows the percentages of each (in the traditional model) for various topic areas.
The main goal here today was to give you a sense of how search engines are dealing with this so that you can start to adapt your own keyword research and content programs accordingly. If you're interested in more detailed planning, be sure to sign up for the SEO Training Dojo as we will be putting this (and much more) into a keyword research section being posted in the next week or so.
I hope you enjoyed the ride... I know I did.
|Last Updated on Tuesday, 31 May 2011 13:13|
Home - all the latest on SNC
SEO - our collection of SEO articles
Technical SEO - for the geeks
Latest News - latest news in search
Analytics - measure up and convert
RSS Rack - feeds from around the industry
Search - looking for something specific?
Authors - Author Login
SEO Training - Our sister site
Contact Us - get in touch with SNC