Follow Along

RSS Feed Join Us on Twitter On Facebook

Get Engaged


Featured Article

Getting a grip on social signals in searchGetting a grip on social signals in  searchIf there's one thing that has driven me nuts over the last 6 months it's the non-stop chatter and search...

Latest Comments

Latest Articles

Will Googles Agent Rank Ever Become a Ranking Factor?Will Google's Agent Rank Ever...
I've seen some interesting discussions recently on the question of whether authority (Agent Rank)...
Algorithm Updates vs Manual Penalties - Some People Still Don’t Get ItAlgorithm Updates vs Manual Penalties...
In the fallout of the last publicly announced (sorta) Panda update and as the...
3 Quick Fixes to Enterprise-Level Technical SEO3 Quick Fixes to Enterprise-Level...
As Google continues to transpose the idea and essence of the real world, physical marketplace...

Our Sponsors


Latest Search Videos

Join Us

Information Retrieval, Or: How I Stopped Worrying And Learned To Love The Data
Written by Steve Gerencser
Monday, 05 March 2012 10:06

Lately people have become more and more aware of just how much information is collected about them. Leading the pack is Google. Ad re-targeting, personalized web search, instant pairing translations, and universal search are all ways that Information Retrieval is being used and studied to improve Google's products. But Google certainly isn't the first, simply the most honest about doing it.

IR is used all around us every day to analyze, interpret and suggest products and actions. From grocery stores to search engines, IR is used all around us, many times in ways you would never expect.

IR: It's Older Than You Think

Information Retrieval was first described in an essay by Vannevar Bush titled Mechanization and the Record in 1939. He theorized that a machine could be used to achieve a higher level of knowledge organization by combining lower level technologies. By using a piece of furniture he dubbed the memex, multiple screen viewers and microfilm reels could be searched quickly and easily to recover the information the user was searching for.


In July, 1945, an expanded version of Mechanization and the Record, As We May Think, was printed in The Atlantic. The ideas presented at the time were simple compared to current technology, but that one paper held the seed of hypertext markup by allowing users to link pages of information creating a map of the thought process of the user. As We May Think can legitimately be called the true birth of the internet nearly 45 years later, regardless of Al Gore's comments.

Google, Learn All That Is Learnable

In 1998 two Stanford University graduate students published The Anatomy of a Large-Scale Hypertextual Web Search Engine.With this paper, and their creation known as Google, Sergy Brin and Larry Page stepped out to the forefront of information retrieval. Google was originally created to provide a better way to gather, organize and make available the millions, eventually billions, of documents on the World Wide Web.


The core of Google's success was the use of human signals to help determine the relevance and accuracy of the search results. Links became the key signal about whether a site was worthy or not because they required more than just the site owner to say "this is a good page". So, links became a key signal when deciding whether a site was worthy or not. As Google's index grew, it became more and more capable of learning what the user wanted nearly as quickly as the user realized they wanted it.

The second key to Google's success was its voracious appetite for data. The index infrastructure was created from the beginning to scale as quickly as possible. This allowed the search engine to gather as much data as possible as quickly as possible and made other search engines scramble to keep up. Magically, as Google gathered more data, their results got better.

Artificial intelligence requires data. As the amount of data increases it becomes easier and easier to see patterns in actions and logic and that allows a machine to make better decisions. To say that Google's algorithms are intelligent may be a stretch, but they are closer to true machine intelligence than nearly anything else available in the public sector. What can be said is that the sheer volume of data in Google's servers made it the smartest search engine on the internet. It knew nearly everything.

Would You Like One Of Our Frequent Shopper Cards Ma'am?

Long before Google was even a dream, grocery stores were at the forefront of retail IR. Often referred to as frequent shopper cards or discount cards, these simple cards are used to tie all of the purchases from a single shopper together. Over and over the data collected by stores was used to refine their sales process and to target their customers with ever more accurate offers.


The information gathered isn't tied to just a frequent shopper card. Don't use one? If you use a credit card your data is also tagged to your shopper ID. Use a credit card one time and cash with your frequent shopper card the next? Again, your data is tied together.

A recent article about Target in the New York Times clearly illustrates just how much data is being mined and used to predict shopping habits. Andrew Pole works at retail shopping giant Target as a statistician. His assignment was to sift through the vast amount of data Target had acquired of the years to find a way to predict when shoppers were about to be come new parents. As creatures of habit, it takes a life altering event to make a shopper change their habits, and few things are more life altering than having a child.

An incident in the article demonstrates just how accurate information retrieval can be when paired with other types of research such as habit formation. In a mailer designed to target potentially pregnant women a 15 year old girl received a collection of advertisements for maternity clothing, nursery furniture and pictures of smiling babies. Her irate father accused Target of trying to encourage his daughter to become pregnant only to admit a few days later that he had just found that she was pregnant and Target knew before he did.

If You Don't Want Us to Know About It, Don't Do It

In 2009 the CEO of Google admitted that Google's real goal was to know everything about everyone and everything when he said, "If you have something that you don't want anyone to know, maybe you shouldn't be doing it in the first place." This was really the first time a major corporation openly admitted that they would gather any and all data they could gather on everyone they came into contact with.

In the past, companies such as Target held the fact that they collected and stored all of their customer data as a closely guarded secret. They understood that if someone suddenly got a stack of pregnancy related advertising before they had told anyone that they were pregnant, they may justifiably become upset and never shop there again.

Google doesn't seem to care about public perception in many cases. The lofty goal of gathering all of the world's data outweighs any potential negative impact people concerned about security may have. This attitude goes even further and challenges many laws around the world in an effort to do what their corporate culture feels is right.

This came in to sharp focus again on March 1st when Google presented the unified privacy policy across their entire collection of corporate holdings. Previously, a user of a Google product could opt to not share their user data across Google's collection of platforms. Want to use Google Analytics but not share your site's data with the AdWords group? You could opt-out. This is no longer the case. If you choose to use one Google product all Google products will have access to your habits and data.

Has Anything Really Changed?

We now have privacy rights groups, government agencies and private users up in arms about how their information is gathered and used. But has anything really changed?


Companies have been collecting information about us for decades. But they have always been very careful to keep that fact a secret. When people know that they are being watched, they behave differently, unless you happen to be a reality TV star and live with cameras in your life 24-7. The only difference is that the curtain has been pulled back and people have become far more aware of just how important their personal data and habits are to companies. We are more aware that we are being watched and cataloged, and thanks to Google, companies are beginning to admit that they need this information to provide us with better services.

Learn To Love the Data

To think that we can somehow turn back the clock and put the IR genie back in the bottle is fanciful at best. Instead, we should be more aware of how much value our data has and leverage that knowledge for better services from the companies that we choose to shop with and work with. Google Analytics and the multiple keyword research tools are fantastic tools when you use them to their full extent.

Some companies already understand that the information playing field has tilted again and are quickly become the new game changers. Wolfram|Alpha is at the leading edge of this newest change. Wolfram|Alpha Pro allows you to analyze your data, any data in nearly any format, using their hardware and software. No longer is IR analysis the private domain of corporations with buildings full of computers, anyone can do their own research using any data they can gather.


The world has changed again and rather than demanding that companies such as Google stop gathering our data, we should be demanding that they give us better access to the data they have harvested from us. We should demand that they level the playing field and instead of being a closed repository of data where they dole out the data they deem necessary, we should be getting full, unrestricted, access to that data. It is ours, after all.

Steve Gerencser -

Steve is the founder of Steam Driven Media, an internet marketing and website development company. He built his first commercial website 1995 and never looked back. Over the past 17 years he has been an internet marketer, SEO, and full service web developer. Having taken time off every few years to explore other opportunities, Steve always returns to internet marketing and helping clients find success online through competitive online marketing that focuses not just on search engines but also on engaging potential customers wherever they may be found.

Also hook up via


Last Updated on Monday, 05 March 2012 10:16


0 #1 Barry Adams 2012-03-06 05:46
Great post Steve, even though your selection of images betrays your age a wee bit. ;)
+2 #2 Steve Gerencser 2012-03-06 09:34
Thanks Barry. And it may show my age, but I think Dr. Strangelove should be required viewing for anyone that wants to be an adult.

Add comment

Security code

Getting Around the Site

Home - all the latest on SNC
SEO - our collection of SEO articles
Technical SEO - for the geeks
Latest News - latest news in search
Analytics - measure up and convert
RSS Rack - feeds from around the industry
Search - looking for something specific?
Authors - Author Login
SEO Training - Our sister site
Contact Us - get in touch with SNC

What's New?

All content and images copyright Search News Central 2014
SNC is a Verve Developments production, the Forensic SEO Specialists- where Gypsies roam.