Do you remember the whole ‘Borker Affair‘? You know, that fella that went on the NYT bragging about how his crappy customer service and negative reviews only got him MORE traffic from Google? Now you’re with me right? As a refresher, a visual review;
At the time, when Google defended themselves, there was a link (in that post) to a paper they’d done relating to sentiment analysis. Well, being the geek that I am it seemed an interesting idea to go back and take a look through it and some other patents/papers to get a feel for what Google might be doing and how this episode may have played out.
Along the way we can also start to understand reviews as a search metric and how they might be used. They’re on the rise in local and shopping, so it’s high time we did.
Before we get into it, there are a few points from the actual blog post worth noting;
- They talk about sentiment analysis beyond merely looking at review sites. They are also looking at the sentiment of a news story/post as well.
- They talk about how trusted domains can give a boost, nothing new, but more TrustRank inclinations
- There is a mention of the fact that negative sentiment can effect search quality (think politics)
- They suggest putting reviews along side listings, so consumers can decide (instead of a dampening factor)
- And of course, they eventually throw up a white flag and state they will, “will continue trying”.
Ok? So as always we’re not taking this paper on face value, we’re simply looking for some insights.
Large Scale Sentiment Analysis for News and Blogs
The paper itself deals with sentiment (and entity associations) for News venues and Blogs. That means we’re not talking about merchant reviews here. Just in case you were wondering. They are also noting temporal effects and how time can change sentiments around a given entity.
Some of the examples they used were George Bush and Enron. Anyone familiar with the respective histories, can understand why they would use those (lol).
Other examples they tested include;
Sports Teams; to test the system they looked at sentiment for popular sports teams (regionally) after a loss. They noted a +1 day lag to negative sentiment which correlates with news stories that would come out the day after a loss.
Stock Index V World Sentiment Index; Using text sentiment analysis they can measure the ‘happiness of the world’. They reasoned that world sentiment often follows the state of the economy and by extension, stock markets. By comparing their ‘World Sentiment Index’ to the Dow Jones there was a correlation coefficient of +0.41 (and the standard +1 day lag)
Seasons V World Sentiment Index; the last one was a fairly obvious one (to one living in rural Canada). They took the World Sentiment Index and compared it to seasons of the year. As you may have guessed, in the ‘industrial world’ there was far less sentiment in the summer months.
Dealing With Reviews
So let’s go beyond sentiment to the other related issue, reviews. Over the last while we’ve seen reviews becoming more and more common with local and shopping universal results in the SERPs. They certainly have the power to draw people in or chase them away from your listing in the results, so it’s worth looking at.
But, the question remains; how does Google decide what to show? Let us first consider what we took away from the above document, that Google IS interested in sentiment, but how should they deal with it? Sometimes it might be beneficial to the end user to actually show negative reviews as well as positive ones.
And of course the obvious, how will they deal with those looking to manipulate the system? So first I decided to look at a paper that deals with both;
Monitoring algorithms for negative feedback systems
In this paper they discuss the goal of the effort as looking at,
“(…) how to monitor the quality of negative feedback, that is, detect negative feedback which is incorrect, or perhaps even malicious.”
One of the problems with any system such as this is the chance of a false positive. There will always be a trade-off between flagging true manipulations and legitimate (negative) reviews. Once upon a time they would actually look at this explicit data (reviews) to assist in separating the wheat from the chafe on the web. Oddly, seems some people decided to use them for their own benefit (looking at YOU ya dirty rotten review spammer!! lol).
Where can this happen?
“Flags may be simply incorrect due to an error on the user’s part (mistaken about an answer to a query), or maliciousness (flag and remove a good review for a competitor), or even cluelessness; in some cases, there may be revenge flags.”
I really loved the ‘clueless’ one, just seems funny when I see colloquialisms in papers. Beyond that, there are even processing consideration involved as doing analysis system wide can problematic as far load is concerned.
They also talked about this particular system only looking at the users, not the content. This is akin to the more familiar calculations such as PageRank, that looks at links, not the content itself. One obvious advantage of this is being able to analyze various content types (text, video, images etc..). Also, like PageRank, it’s more of a random approach. As stated earlier, it tests random flags (bad reviews) not everything it comes across (via randomized algorithm called Adaptive Probabilistic Testing).
Interestingly, this particular approach does treat suspect locales with a closer eye than ones that aren’t. In simpler terms, sites KNOWN to be more spammy as far as reviews are concerned, would get more attention than ones known to not be infested with malicious usage.
Switch and bait? Not so fast. As discussed before this is a user level system. What if the user starts out being a good citizen (squeaky clean sock puppet) but then, once the algo has assigned him a ‘good boy badge’ starts to put the malicious plan into place? It’s been considered. Your entity will be revisited. I do see some ways around this, but this article isn’t about me teaching spammers to be better. Duh.
And that really is the core of that paper, dealing with those having a malicious intent as far as negative reviews are concerned. There really isn’t any mentions of specifics other than using some ‘Google properties’ as well as testing on YouTube and Amazon. Nothing on the authors gives us much more insight. So let’s move along…
Rating the Reviewers
The next stop on our journey is some analysis of those rating (reviewing) web items and of course, in particular, those that might seek to game such systems.
Traditional review metrics might include;
- length of the review,
- the lengths of sentences in the review,
- values associated with words in the review,
- grammatical quality of the review.
But what about the people that actually do the reviewing? You know…
“(…) a user who makes a particular product may seek to submit numerous falsely positive ratings for the product so as to drive up its composite rating, and to thus lead others to believe that the product is better than it actually is. Likewise, a user may attempt to decrease the score for a competitor’s product. “
So they’re looking at not only negative reviews but positive as well for anomalies. Much the same as the first one we looked at, there is an element of probabilistic judgements based on the larger data set of users. Doing this ‘should’ produce anomalies that can lead to a given users interactions being negated. This can be tested against the mean averages for a give item, user or user type.
In short, once again, we’re looking for red flags. If a user has access and reviewing patterns that do not (statistically) fall within a certain degree for error, they maybe have their ratings completely thrown out. Some points of interest include;
- Users can have negative reputations which can mean their reviews aren’t counted
- Negative reviews may not be counted
- Scores for reviews as well as reviewers (users)
- Like trustrank, the validity of a review can be dependent on the reputations of the reviewers
Interestingly they also mention the concepts of link relationships of connecting documents to documents and as an extension, users to documents (web pages). From the oddity category, they depict ‘bad users’ as Scrooge McDuck and ‘good users’ as ‘Mother Teresa’… lol.
The spam mechanisms in this one are a lot like the others with a common approach being to watch for a flux in a users patterns. Meaning, users that come in, leave a bunch of reviews, then leave or have no history in the system. Items that have only a few reviews by a set of reviewers with limited reviews in the system themselves can also be discounted.
I am gonna spam the hell out of it!
And of course I can hear you thinking of ways to try and legitimize your bots to deal with it right? They also briefly touched on that;
”Such tactics can be at least partially defused by comparing a user’s ratings only to other ratings that were provided after the user provided his or her rating. While such later ratings may be similar to earlier ratings that the fraudster has copied, at least for items that have very large (and thus more averaged) ratings pools, such an approach can help lower reliance on bad ratings, particularly when the fraudster provided early ratings. Time stamps on the various ratings submissions may be used to provide a simple filter that analyzes only post hoc ratings from other users.”
Other elements that can be used include looking at time stamps. Not only the obvious, the speed at which a user has left reviews, but also relative temporal data. Meaning; what is the average time it takes an item to garner a review for similar products/services.
The inverse may also be looked at. Are the average reviews for a category throttled? If a user is adding reviews on items well after it was posted and activity around it, that may be another sign that they’re less the scrupulous users. Think of a blog comment on your site 10 months after the article went up. This can also be used as a metric when analyzing a reviewer.
Another filter that might be employed is looking at the values given. If a user consistently posts a ‘3’ vote, or far too often has a similar voting pattern, their reviews may be discounted.
As with all things, it is often a filtering process. The crawl process is intensive, by trying to establish which users are more valued and which can be removed from the system, not only can assist with malicious users, but systemic performance as well. Which seems like a good way to go about it.
Ranking the Items
Of course we’ve only been looking at the reviewer aspect, we need to consider how they will interact with the valuations for ranking purposes. That part isn’t so clear in my research, (and wasn’t what I was looking for) but there were a few notable points of interest;
“(…) For example, if a person submits a search request 322of “$300 digital camera,” the search engine 318 may rank various results 320 based on how close they are to the request $300 price points, and also according to their ratings (as modified to reflect honest rankings) from various users.
Thus, for example, a $320 camera with a rating of 4.5 may be ranking first, while a $360 camera with a rating of 4.0 may be ranked lower (even if a certain number of “bad” people gave dozens and dozens of improper ratings of 5.0 for the slightly more expensive camera. Uniquely, for this example, a price point of $280 would be better, all other things being equal, than a price point of $300, so distance from the requested price is not the only or even proper measure of relevance. “
Another extension of this type of system may be for ranking results for articles or written material based on a author rating system. This was best explained with;
“As shown, the adjusted author ratings may also be provided as a signal to the search engine 318. Thus, for example, a user may enter a search request of “conservative commentary.” One input for generating ranked results may be the GOOGLE PAGERANK system, which looks to links between web pages to find a most popular page. In this example, perhaps a horror page for an on-line retailer like Amazon would be the most popular and thus be the top rated result.
However, ranking by authors of content may permit a system to draw upon the feedback provided by various users about other users. Thus, for a page that has been associated with the topic of “conservative commentary,” the various ratings by users for the author of the page may be used. For example, one particularly well-rated article or entry on the page may receive a high ranking in a search result set. Or a new posting by the same author may receive a high ranking, if the posting matches the search terms, even if the posting itself has not received many ratings or even many back links–based on the prior reputation generated by the particular author through high ratings received on his or her prior works. ”
While I am fairly uncertain how this may or may not have been used, it is interesting to get a glimpse of what we tend to refer to in the industry as ‘authority’. If you noticed that last part, a trusted author might have an article ranked even if it has few ratings or links. In short, they’re a trusted source. One does wonder how that thinking, combined with a social graph, might play out.
And hey, it does sort of make on curious about the various ‘voting‘ elements for blogs and even the comments therein. Would a system such as this be useful for looking for comment spam? Curious indeed.
It’s a matter of trust
One thing that is certainly prevalent in the world of reviews is the type of trust signals that we’re used to (in the SEO community) for many years now. What affects you will get from reviews is largely dependant on the quality of the reviews AND the reviewers on an item. Or that’s the working theory at least.
I haven’t had the time to really dig into some review sites to see how things are actually playing out. If you’re working in some query spaces where reviews are prevalent, I’d appreciate hearing from you so we can look into this even further.
For those of you looking to start spamming the review space, it may not be as easy as it may seem at first. As long time readers know I don’t really advocate such things and thus have actually left some elements out of this article that I felt may have been a bit too much information :0)
That’s a wrap!
It should be noted that there are a wide variety of approaches I came across and as always, we’re gaining insight, not a road map. Most of this article focused on the sentiment and reviewers themselves. There are also mechanisms for analyzing the reviews themselves, which should not be discounted. That being said, it seems more sensible to approach things from the methods we did look at today.
Tips for a quality review;
- Use good grammar (poor grammar = less readable)
- Lose caps lock – can be seen as rude
- Use sentences
- Not too long, not too short
- Build trusted profiles
There are more resources for reading that the end. Be sure to continue the journey.
So, how exactly did the Borker affair happen? Even after all of this research it is hard to say definitively. Yea, I know, not much of an ending huh? What does seem to have happened is that the sheer numbers and quality of the reviews themselves, even though they were negative, was enough to actually legitimize this guy’s offering. What wasn’t being caught was the actual sentiment or at least it was being drowned out by the other signals.
Did Google rectify this with a quick algo fix? I find that highly unlikely to be honest. With the complexity of this type of undertaking, it just doesn’t seem that easy.
It will be an interesting space to watch though, that’s for sure.
Papers from the post;
Patents from the post;
- Rating the raters
- Systems and methods for reputation management
- Identifying clusters of similar reviews and displaying representative reviews from multiple clusters
Posts from Bill;
- How Google may Manage Reputations for Reviewers and Raters
- Google’s New Review Search Option and Sentiment Analysis
- Opinion Summaries in Google Maps Reviews
- Google Approach to Making Online Ratings Easier…
- Innovating Product Reviews at Google
- Google Reviews: Reputation + Quality + Snippets + Clustering.