Recently I put a panel together of folks that had some experiences and perspective with the Google Panda update in hopes of learning more. It was a private gig, not public, and we learned a lot along the way including the fact that Google certainly doesn’t seem to like obtrusive ads very much. In some cases people had seen some recovery by removing or downsizing the ads placed at the top of the contextual area of the page.
It seems that ads in side panel or footer locations weren’t nearly as problematic as the big-ass ones at the top of the actual content area of the page. Interestingly, this is one aspect of Panda I haven’t heard a lot about in the public sphere.
And so one has to think; what’s the problem here? Well, according to a patent that was awarded this week, you could be quite annoying.
Detecting and rejecting annoying documents
Filed; May 20 2011
Awarded; September 8 2011
Ok, a few things right away;
- It is an update of an earlier patent from 2005.
- We have no idea if it is actually Panda related.
The abstract reads;
"A system and method for evaluating documents for approval or rejection and/or rating. The method comprises comparing the document to one or more criteria determining whether the document contains an element that is substantially identical to one or more of a visual element, an audio element or a textual element that is determined to be displeasing."
Breaking down the parameters
One of the interesting elements is that they most certainly talk about using a seed set of documents and then running new documents they discover through the algorithm. This was something we talked about when trying to figure out the Asian menace. While not unique, it did catch my eye upon first reading.
While they maintain that advertising is a positive thing on the web (as one might imagine) they also state that;
“Internet advertisements may contain characteristics that are often found annoying or otherwise displeasing to persons who view the ads. For instance, ads may contain offensive language or annoying actions such as flashing or strobing or be of poor image quality. “
Some of the ad elements they discuss include;
- Flash
- Animated GIFs
- Offensive language
- Text
And the analysis options includes;
- text analysis
- OCR analysis
- Voice analysis
While much of this seems to be about accepting ads in their network, they do mention various other uses including web search.
They also discuss various metrics that they might look at including;
- document information
- document performance information
- document characteristics rating information
- sensitivity rating information
- suitability standard information
- trust score information
- provider information
- link information
- and other information
Elements in identifying document types include;
- subject matter
- characteristics rating
- aggregate characteristics rating
- sensitivity score
- characteristics type
- language
- geographic origin (e.g., country or city of origin)
- geographic area of target audience
- document source
- owner of content
- creator of content
- target demographic
- actions (such as image flashing)
- image movement
- hardware usable by the document (such as a mouse, game controllers, camera, or microphone)
- whether user interaction is provided by the document (which may indicate a game)
- whether the document’s programming involves random number generation
Does trust play into it?
Along the way they even discuss the nature of the site and the one being linked to. In a sense we can consider this a form of TrustRank for advertising. Which I also found interesting. Also notable is the use of OCR (largely developed for Google Book Search) to actually try to establish what the text in an image ad may be.
“The link information may comprise the link quality rating (e.g., whether the link works or has any problem) as well as the content (e.g., content ratings) of the link and any linked documents (e.g., linked websites). The information may be obtained in any manner of rating documents as described herein. Any link-associated information may be stored in the link database or the characteristics database “
AND
“(…) link-related information may be passed to the document rating module, e.g., so that the characteristics of any linked documents (or the link itself) may be factored into a document’s rating. For instance, an ad may receive a rating of inappropriateness if it links to a site relating to sex, drugs or alcohol or if it links to a document that flashes, contains streaming audio or video, contains infinitely looping animation, involves game playing, etc. “
As you might imagine, flashing/strobing ads and auto-play audio/videos are certainly in the basket of being annoying. Go figure.
Part of the Panda puzzle?
At the end of the day one has to ask this. For starters this patent is most certainly not directly targeting elements which we have seen to be issues with Panda. It seems to be more about ads people may be using as part of the Google advertising network. But the reason it was interesting is the fact of actually seeking to establish the value of ads to the end user. It gives us some more insight.
We are fairly confident that ads have played a part in Panda. The fact that Google has an interest (and system) for establishing the usability of ads speaks volumes. As always with patents, we’re looking for insights not a smoking gun.
So take it for what you will.



