If there's one thing that has driven me nuts over the last 6 months it's the non-stop chatter and search...
Algorithm Updates vs Manual Penalties...
3 Quick Fixes to Enterprise-Level...
Link Pruning - is it...
In this video interview link building expert Jim Boykin explains[…]
Join Nick Mihailovski and Ikai Lan from the Analytics and[…]
Could you please give details on what should be included[…]
| Are you ready for the next Penguin assault? |
| Written by David Harry | |
| Monday, 20 August 2012 12:44 | |
| According to comments being reported from SES, Matt Cutt's has said the next incarnation is the "the father of all Penguins" - "jarring and jolting" and that SEO's won't “want the next Penguin update." Strange, considering many of them I know didn't want the first one. Anyway.... Accordingly many search types are scurrying around talking about it on various communities and social sites. Even my Skype fired up as folks started asking for my take. Not sure I really have one beyond the same mantra I've preached for years on this kinda crap (link spam etc). Wait. Wasn't this the 'web spam' update?Back when we first got to know this destructive little flightless foul, it was called 'the web spam' update. That's actually kind of important. This is NOT part and parcel to Panda other than sharing the same project name (Search Quality). Here's how SEO was defined in one Stanford paper;
They went on to note,
As such, if we truly want to get a sense of what has been happening or what may be on the horizon, we need to get a bit more of an understanding of various tactics search engines use to combat SEOs... erm... I mean web spam. Boosting and hidingIn simplest terms we can break it down into two camps; Boosting techniques, and Hiding techniques, (from my guide to webspam); Boosting, is just like it sounds and some examples are;
Hiding is more about tricking the engines with various on-site tactics. Some of these include;
Getting the idea? I'd never advise taking one's eye off the ball as far as looking for simple answers. If anything, that's what got a lot of SEOs into the dung heap to start with. But I digress... Potential On Site Penguin IssuesGiven that these ones are seemingly less the focus for Google, we'll just look at a few that I haven't seen mentioned a whole lot, that might actually be part of the algo. Language: they might treat different languages on levels. Research has shown that French, German and English tend to have higher levels of spam. There could be trust elements to the updates. Top Level Domain; domains such as .INFO and .BIZ traditionally have higher levels of spam. This could lower a trust score for these. Words per page: apparently the sweet spot for spammers is 750-1500 words. Could there be a classifier to look at this? Keywords in page TITLE: a classic boosting technique and research has found spam pages contain far more keywords than non-spammy (classified) pages. Amount of anchor text: they might look at the ratios of text to anchor text. Interestingly, that approach could be Panda related as well. Compressibility: As a mechanism used to fight KW stuffing, search engines can also look at compression ratios. Or more specifically, repetitious or content spinning. Host-level spam: looking at other domains on the server and/or registrar levels. Certainly easily found networks are on the table one would have to imagine. Phrase-based: With this approach, a probabilistic learning model using training documents looks for textual anomalies in the form of related phrases. This is like KW stuffing on steroids. Looking for statistical anomalies can often highlight spammy documents. Outgoing links: a website might link out to well-known pages seeking to raise their 'hub score' (see TrustRank concepts earlier). Although any use of this I'd imagine a low threshold to deal with false positives (think of sites that scrape entire sections of Wikipedia). And most certainly one can look around their CMS to ensure they're not cloaking, sending odd redirects, hiding text and so on. Obviously we'd never do that knowingly right? That's what I thought. Moving along... Potential Linking Penguin IssuesCertainly this area is the one getting the most attention after the first two rounds. And I don't mind that, it is just more about needing to think beyond popular theories (anchor text comes to mind). So let's look at some things that might be in the mix; TrustRank: better known as neighbourhoods (more here). Good sites link to good sites, spam sites (generally) link to other spammy (feeder) sites. Trust in general, does seem to be involved in the Penguin evolution. (also see harmonic rank) Link stuffing: while it can be used on-site, it is the concept of creating tons of pages that have a link pointing to a given target page. This may be site-wides, or multiple domains as well as on-site. In fact, to a degree the practice of low level directories for SEO could play here as could forums, link spam, widgets and infographics. Nepotistic links: the well known usual suspects such as paid links, link exchanges and their ilk. We certainly do know that Google isn't much of a fan of this type of approach. We can surely go out on a limb and infer many of these types of link spam are in consideration. Topological spamming (link farms): search engines will often compare % of links in the graph against known entities ('good sites'). Do you have a disproportionate number of links compared to those in your query space(s)? It may be an issue. Temporal anomalies: better known as 'link velocity' and 'link decay' to most. Again, when looking at relative pages in the index, spam pages will generally stand out. Those manufacturing links will have a different graph than those considered 'good pages' to Google. Anchor text spam: it's no secret that those trying to manipulate hold a high value here. As with other thresholds mentioned, this can be compared to other sites in the query space(s) considered 'good sites' as part of a seed set. This one has certainly seen play since Penguin launched. Expired domains; when the spammer buys expiring domains that have link equity, to point to the target site(s). Or simply replacing, changing the content on the domain to take advantage of the existing equity.
A Matter of LevelsRight then, enough of the geekery. The main goal here was to start and think beyond the everyday. Stop trying to nail down what Penguin is (or will become) with things like 'anchor text ratios' and 'networks'. If this truly is a web spam update, then there's a lot more on the table for Cutts and Co to chew on. I like to consider much of this in terms thresholds. It's a common theme. We might consider;
The fact we have (for now) a definitive line between Penguin and manual actions (Webmaster Tools messages), I would also imagine that Penguin itself may still have relatively lower thresholds. Whatever does happen next, most SEOs that have been paying attention, should be fine. Man, I kill myself.... weeeeeeeeeeeeeee
ADDED; I have also published a post on Search Metrics to help you diagnose a Google Penguin problem. Though it worth sharing here as well. More stuffSeriously, I've been saying it for years... but REALLY, read a few of these. Realize just how much there is to this. It wasn't the 'anchor text' update, it's a web spam update. Aight? Web Spam Research Papers
TrustRank Concepts
Link Spam
Social Spam
Language/Semantic related
More articles by this author | |
| Last Updated on Monday, 27 August 2012 13:36 |
Home - all the latest on SNC
SEO - our collection of SEO articles
Technical SEO - for the geeks
Latest News - latest news in search
Analytics - measure up and convert
RSS Rack - feeds from around the industry
Search - looking for something specific?
Authors - Author Login
SEO Training - Our sister site
Contact Us - get in touch with SNC
| Digital Marketing Weekly - Issue 4 Linked to last week's Google I/O event in San Francisco there is a lot of new features and updates [ ... ] | Playing with Google Conversational Search Earlier today Danny Sullivan, via SEL, was writing about Google Conversational search, which was an [ ... ] |
Comments
I still can't see how this can be as aggressive as they make out other than to devalue all the obvious link building tactics like directories, sitewide and column links etc. One could argue that disproportionat e anchortext is also an easy one, but how do they differentiate between those that have become synonymous with a product or which have the product (anchor text) in their name e.g. "cheap flights".
One can't help wondering whether Matt Cutts' reference to G+ signals "not requiring too much attention from SEOs yet" could be a red herring?
Thanks David!
As for the anchors, that part is more about named entities. Google is pretty good with that stuff. It is natural to see the domain name, brand or product names in anchor texts. (more on entities here; http://searchnewscentral.com/20120110229/Technical/named-entities-associations-for-seo.html )
At the end of the day I simply wanted to ensure folks weren't myopic on what might be in play. It pays to think outside the box and had more SEOs done so before Penguin, there might be less pain hee hee
@Raf - lol... very good. Not sure about that, likely more a post about information retrieval. It's translating that into SEO and yer everyday activities that is the art
I kind of agree with you, on the social signals and the login, but you can still use public pages in those social sites; profiles, and use those to link to/from your site and get shares etc.
while even more so; those public pages can be linked with other profiles, and by linking ... associating ...them all together you have an easy, convenient way of getting links between all of your off-page content, especially natural shares and likes, not so much to gain value, but to pass it on, to spread more, while at the same time bundling the value on the profile pages, that's where you then put your embedded anchor text link; on the profile pages - with unique text/bio/profiles for all of them of course.
there's not much to be gained from those social sites themselves, but you can use them to pass on value from other profiles to those pages you want to give an extra push, you know like a bridge or ... a link
RSS feed for comments to this post