A journey into Googles patent on generating suggestions
Search engines are always looking to make our lives easier, or at least accessing the worlds information in a timely manner. But in the old days they had to wait for the user to take action before they could begin to deliver potential results for a query not these days starting to gather search results and even implementing search assist can happen with each keystroke.
You know the one, the suggestions they make as youre typing in a given query? It looks something like this;
I know more than a few SEO peeps have talked about this as a potential problem for some long tail targets and others have pondered if it would make a good keyword research tool. They can also use the same systems for query analysis as far as which documents to return and rank. But what if there is a potential for personalization of this data? Because there just might be; and that would certainly limit its overall effectiveness from a SEO perspective – At least as a research tool.
Which brings me to a recently assigned patent to the mighty Google;
Method and system for auto completion using ranked results filed November 11, 2004 assigned; February 3, 2009 – Gibbs; Kevin A. (San Francisco, CA), Kamvar; Sepandar D. (Palo Alto, CA), Haveliwala; Taher H. (Mountain View, CA), Jeh; Glen M. (San Francisco, CA)
What? A personalized search connection?
Of course, I am nothing if not obsessed with that particular topic; so connect we must. You see, of note in this one, is the presence of our old pal; Sep Kamar (and Kaltix crew) from our recent look at Personalized PageRank and lead tech in the PS dept. There is also another patent which is referenced in this offering from Sep;
Anticipated query generation and processing in a search engine; Filed June 22 2004 and assigned; Dec 22 2005
That patent, as best put by Bill Slawski, looks at, returning search results quicker, and enabling personalization to make those results more relevant for the person searching. The main crux of the systems is that they intercept the queries in a partial form and begin to process and set out probabilistic matches. Why wait for the user to actually type in a full query when you could already be processing before they hit enter?
Or as Bill put it;
If the search engine captures keyboard strokes as they happen, and starts sending partial queries to the search engine based upon a prediction of what the searcher is looking for, it may speed up the process.
He then, being the wise turtle that he is, mused;
What we dont see with Google Suggest is some of the technology used to create that query list it displays in a dropdown under the query window. We also dont see the ability to personalize those predictive searches.
Well it would seem the patent filing on that system would be mere months after the original, it simply took a few passes to make it out in the wild; some 3yrs+ later And it does cite the other, has related authors; and could certainly be another link in the chain (or the thought pattern at least).
The filter factor
In the simplest terms, a predefined set of common query types can be stored and then Google can start thinking about the results, even as you type in your search term. Whats interesting to us is how they go about that. In some ways it can give us some insight into other ranking processes. A great deal of search engineering these days is in probabilistic learning and predictive capabilities (therell be a test on that junk later he he)
Triggers for the system can include;
- When system receives a partial query
- Upon completion of the query
- Not choosing a suggested query in a given time frame.
- Number of characters received
- Pause in query input
As well as filters which could be used;
- Time of day factors
- Geo-graphic filters (language, IP)
- Temporal and historical (query type spikes)
- User types grouped by search activity
- Categorization dog and breed indicate Animal > Dog category
- Based on Personalization information
That last one of course is of particular interest. There are layers of personalization everywhere with Google. And so for those of you thinking of using search suggest as a keyword research tool, you may want to reconsider and do some testing first.
As you can see there are more than a few factors that seem to play into how search suggest works and even these patents are 4+ years old, so things are likely evolved from this. But lets at least look at some ranking mechanisms, always enlightening oui?
How they might calculate suggestion ranking
For starters there are the basics of query analysis; submissions with a higher frequency would be ranked higher than terms searched less. Thats sensible. They can also use personalization in the form of search and browsing history. This could also be done on a more granular level as far as using the data from that current search session; although in the official FAQ they state they dont, base its suggestions on your personal search history which was later updated with the fact they do, in some cases, log data, like IP addresses, in order to monitor and improve the service.
Take that for what U will
From there they look at using layers of ranking criteria such as first ranking via popularity score and then re-ranking based on secondary information. The potential query suggestions that score well on the given factors, is displayed first and so on down the line.
Also of interest is that they do talk about adaptive ranking schemes.
Lets say we were ranking our predictions with a heavy lean towards the personalized data. If the user doesnt choose from the query predictions, then we would potential take user group data as dominant next time or maybe geo-graphic factors. Look at the filters above and use your imagination as to the variations available.
They may even take this into account for ranking busier query spaces in as much as a potential result must satisfy one or more criteria to be considered in the seed set for ranking.
Order from the chaos
I know so friggen what? Well, it’s an interesting glimpse, once again, at the ways search engineers think. When your job is to rank documents in search engines, it’s good to have some insight. At the end of the day, it seems to me that we can take a few things away from it all;
- Keyword research – As we noted early on, Id be wary of using search suggest for more than anecdotal data.
- Themes and concepts if you have important concepts, geo-graphic attributes or categorizations related to your query space, make them clear. Ranking and recommendation concepts relating to filters and personalization are best fed by being rock solid on you theme development and targeting.
- Trend setter being fast on current news and using generic page titles, that satisfy the query space, is potentially much like the QDF. A lack of links means temporal signals would be important and having more generic targeting early one seems sensible.
- Rank and stick remember that the suggestions are part of a process already ranking known queries; you cant win if you dont play. Having the money terms would be important to get the extra bump. Being sticky, having strong user engagement, will help with the personalization.
And there we have it, a ride inside the world of Google search suggest. As always, we must take patents with a grain of salt. This was written in 2004 and simply was assigned recently. You should be more thinking about the mindset; it seems we keep running into Sep a lot these days (people are starting to whisper).
The next time you see the drop down with search suggest, think of me wont U?
Other facts; a few notable parts that I didnt get to
Connection devices mobile searches tend to be different as far as the length of queries and thus they would potentially establish which type of connection device you were using to determine the search suggestions.
Storage trade off they discuss how there is a trade off in the amount of data that can be stored and the accuracy. Obviously the less data you retain, the weaker the predictions. This is interesting because all too often peeps for get the cost of processing and storage.
Anti-spoofing aka Spamming another interesting tidbit, if not for the name alone, is the process for detecting artificially generated queries. Sadly, this was only defined by tagging multiple queries from the same user or client computer. Its sort of lightweight there IMHO. But I like the name