It’s all about attribution
On Friday it seems (Google’s) Matt Cutt’s was talking about one of the more prominent algorithm changes (remember, there’s hundreds per year) that we’ve seen over the last while. For those that missed it see; Algorithm change launched
Now, at first blush I didn’t really pay much attention to it. But then I read it a few times and also one of the posts it referenced. The blogosphere and social channels were buzzing about ‘content farms‘ and ‘thin content‘ sites being the focus. But that’s not really geeky enough now is it? Of course not.
Walk with me….
Google’s new emphasis on fighting spam
Over the last while there has been a spate of articles declaring Google’s results to be… well… getting worse. Granted, I am talking about the mainstream media as those in the search game have been whining for years.
A few weeks back there was a post from Matt Cutts (Googleblog this time) discussing some things they are working on over at Googly as far as dealing with spam is concerned. Here’s the post and some related postings to it.
Now, as a search geek this stuff does interest me as it’s an integral part of the evolution of search. One would have to imagine that the job of dealing with web spam has gotten a lot harder. Not only can we look to social’s rise into the SERPs (and spam to be dealt with there) but even last year’s Caffeine update which, in theory, means more signals and deeper processing.
But the other shoe was yet to drop, so we wait.
Attribution, Syndication and Duplication Front and Centre
The next week, the post mentioned off the top, Matt made a post on his personal blog detailing some changes apparently related to the earlier announcements. Ok, great, so let’s look at the highlights.
In the original post he had mentioned;
“(…) we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content.”
And this time went on with;
“The net effect is that searchers are more likely to see the sites that wrote the original content rather than a site that scraped or copied the original site’s content.”
Matt also highlighted a post on CodingHorror which discussed a situation that had the owner of a given piece of content being outranked by those syndicating it. If it’s a scraper or a legitimate distribution, Google was having a problem attributing the original source.
This is course by inference, should help not only in the logical mistaken identities, but also for spammy less than desirable ones as well. Does that go for cases such as Demand Media et al? Hard to say at this point. I would also have to wonder about link building via article marketing as duplication of the content (past the article site) would theoretically be hurt by this as well.
I’ve been down this road before
What was most interesting to me is that we’d actually discussed this very topic as recently as the summer of 2010 when my pal Samir Balawani was having legitimate feed issues amongst various distribution channels including Facebook and elsewhere. See; How Content Syndication Can Backfire
At that point we were advised that the best case scenario would be to ensure there is a link back to the original, use a partial feed or to delay the feed depending on crawl/indexation levels. Obviously most nafarious types aren’t going to link back to the original, so that was obviously limiting for them (Google).
I wanted to highlight this because there is every indication already that SEOs are running around screaming IT”S ABOUT CONTENT FARMS!! IT’S ABOUT CONTENT FARMS!! And as always, as a geek, there is more than may at first meet the eye.
From what I can see at this point (much of) the algo update is about attribution. This can be via syndication (often legitimate reasons) or (non-authorized) duplication on sites other than the originator. And considering it has been a known issue for quite some time, this is NOT some type of knee jerk reaction. If anything, it might just be some well timed PR (not the Google kind) meant to keep the dragons at bay.
Where’s Google headed with all this?
We can assume there are going to be, as always, winners and losers from all of this. Each and every change over the years has helped some, while hurting others. Our job, as SEOs, is to be the former. I generally call this future proofing.
But what are some of the signs that make this particular set of statements of value? What can we expect moving forward?
More resources architecturally/ processing wise ““ last year Google’s major move (search geek wise) was the Caffeine infrastructure update. This means, in theory, they not only have the resources to start processing new signals, but we can also surmise more tools for dealing with (web) spam.
More resources in Matt’s department ““ also of interest, was a statement from Matt at Pubcon that they were getting more resources/staffers back from ‘other projects’. That will also help things no doubt and does show a concerted effort on their part to beef things up.
All in all, I do get the feeling that the search quality (and derivative web spam team) have a small fire under their backsides right now and this latest (public) algorithm change is in response to that. Will there be collateral damage (false positive)? Sure, usually is. Will there be winners and whiners in the SEO world? Deffo can count on that. But I personally have talked to a lot of people that believe things are improving, (othes aren’t so sure).
Anyone remember the SERPs of yesteryear? Not very pretty. I for one am happy about this change as it can help legitimate programs, it’s not all about chasing spammers (hopefully it helps with proper attribution).
Note; search geek or not, if you’re interested in helping reduce spam, Matt suggests adding this Chrome extension to your arsenal.




The one thing that struck me about this was that it really had more to do with content scraping and duplication than what I see is the real source of the problem – really crappy content that’s being distributed all over the internet – likely in the interest of link building.
Some of this content seems to be ranking pretty high – especially in the blog search.
I’d like to know if they have any plans to do anything about that – and if so, what does it mean for future link building?
Hard to say Marjory, but most certainly, from what’s been said publicly, this isn’t really about ‘thin content’ such as the Demand Media and other crap. It’s looking like this was aimed at getting attribution right, scrapers just get caught in that same net. So, I can’t see this actually sorting some of the crap content that ranks because it’s from an authority domain or is an exact match to a query (aka Q&A sites).
So many theories, so many experiments and so much speculations ! –
What I believe is – its not yet the time to come to a conclusion on new algo. shift – we still need few weeks to get the dust settle and then only we could comment on how this new algo effected the SERPs – on a broader field.
Well, hey, I am merely reporting on what’s been said. And as far as that is concerned, this is about attribution. As I mentioned in the post we actually had talked to Google about this very problem last year (and 07 before that). So this is a problem that has existed for some time. And in many cases, it wasn’t even spam that was the problem, it was attribution of syndicated content and how Google handled it.
As for testing, I haven’t had the time to see if it is actually working, so I shall spend some time later on to see if it is ‘as advertised’. I have though already talked to some folks that have seen an increase in traffic over the last week, we shall see. As you say, time will tell.
This seems like a good move on the part of Google, I’m sure everyone would like to see their articles properly attributed.
As far as crap content is concerned, it’s not just about rubbish articles, it’s far more difficult than that.
For example, I did a search this morning looking for an example of an exponential graph (or something similar, I can’t quite remember) and on an initial scan of the first few results, it looked like spammy content. However, hidden between the adverts and hundred or so links to other sites there was a two line answer. It was the most relevant result, it just looked like a crock.
How can Google fix that?
I personally hope they don’t. If life gets too easy I’ll just get fatter and forget how to read. I’ve already forgotten where my bookshelf is.
Google has always stated in their SEO Bible that the best websites for them are the ones that are originally created and have compelling content for the visitor. To me, this seems like an extension of that original principle. I think the spammy sites that are clogging up the SERP’s are the ones that simply have lists and lists of links and a few pieces of content created elsewhere. While these sites might rank due to excellent SEO tactics, the sites themselves are threadbare.
Like your analysis, Dave, and agree. With Google putting more infrastructure & processing power in place, Caffeine, and the integration of social signals, the stage is set for better SERPs. I’m encouraged.
I ponder, if social is more of a factor, then these content farms will have fewer social influences. Will lack of such signals be of the algo?
I used the Chrome extension to report a #1 SERP content/link farm result in a space in which this long-standing crap result does not produce the best user experience (blinking ads) nor the underlying related content for the query.
Of note, Google does send the “spam reporter” an email acknowledging receipt of the grievance. Given that communication from them in other properties like Google Webmaster Tools has been lacking, this acknowledgment email offered another bit of hope.
I think it’s about time. I’m getting tired of some SEO firms using blackhat techniques pumping all these spam and dup content on the internet to achieve ranking. If you stick with the books, you should not be affected.
For me the web has always been about finding what you want as quickly and as accurately as possible. There is nothing more annoying than finding a “website”, which is in fact just a directory of links to other sites advertising crap from face wash to tyres.Great sites should get good ranking and the others, well…..
Two years later from the original post and still see these same problems in search results. I think google likes to claim they can do more with their algo than they actually can when it comes to reducing spam results. Do a search for any city “real estate” and check the results. 70+ of the top 100 results will be various pages or trulia or zillow rather than actual local real estate offices and realtors who can actually assist people. The listing information found on trulia and zillow is often years old.