Source Attribution meta tags: solving news syndication duplicate content issues?

It looks like Google is working hard to solve one of the oldest problems in online news: source attribution.

News sites are increasingly turning towards content syndication,
– to monetise their own content;
– to save costs by republishing content from press feeds.

As a result many news sites are publishing increasing amounts of syndicated content from press feeds like AP, Reuters, and the Press Association, which leads to all kinds of issues for search engines like Google that want to give proper credit and link to the original publisher of a news story.

Previously Google made deals with the likes of AP to host AP’s news stories and link to those instead of the same stories on other news sites. But that is a bit of a patchy solution, depending too much on what kind of deals it is able to make with content syndicators.

Last Tuesday, November 16th, Google announced what might be a proper and definitive solution to this issue: source attribution meta tags.

These meta tags should enable Google to see which is original content and which is duplicated content from syndication feeds. There are two types of meta tags: original-source and syndication-source:

  • original-source allows a publisher to indicate the first publisher of a given story. For their own original content a publisher can put their own article URL in the meta tag, and for content that is taken from or inspired by other stories it can use this meta tag to give appropriate credit.
  • syndication-source is a meta tag that allows publishers to give a preferred URL for a piece of syndicated content. For example if news site B republishes a story from news site A, both articles should contain the syndication-source meta tag with the URL of the story on news site’s A website.

On paper this looks like a great solution, but it is obvious that these meta tags can easily be abused by unscrupulous publishers claiming original content when they’re actually copying content from others.

That’s why Google is first going to wait and see what happens ‘in the wild’. They’ve also issued a stern warning on the appropriate help pages: “If we find sites abusing these tags, we may, at our discretion, either ignore the site’s source tags or remove the site from Google News entirely.” It would take a particularly brave news publisher to risk exclusion from Google News.

While it’s too early to say if these source attribution meta tags will indeed serve as the news industry’s version of the canonical tag, it is definitely an encouraging sign and I expect many news publishers to make haste with implementing these tags, if only to protect their own original content.

Facebooktwittergoogle_plusmail

11 Comments

  1. I’ve already “installed” them, so testing is in process on the site I work for ( Project Syndicate ) and will let you know how it goes.
    So far no side-effects !

  2. @Boris, yes please do share your findings! I’ve submitted a proposal to the folks here at the Belfast Telegraph, but as I start my new job next week I doubt I’ll be able to see through the implementation of these source attribution meta tags on the beltel site.

  3. If to rank 1st it would suffice to cheat with the meta tags, Google results would be just an infinite long string with one single result because we would all rank in 1st position.

    IMHO META TAGS are old stuff for SEO, except for one the META DESCRIPTION.

    This is ignored by Google for ranking, but it’s shown as a snippet in Google results. When it’s well written it might lead to more people willing to click on the result that brings to your site.

    To go specifically to your table, keywords are ignored by Google too. And taking care to be 1st with serach engine other than Google is waste of time for two good reasons:

    1.

    if you look ta the stats of any of your websites I bet only 1 on 1000 accesses come from a search engine that is not Google.
    2.

    When you are 1st in Google you usually rank excellent also on other search engines.

    You don’t need to set index/follow in a meta tags robots either, search engine look 1st if you have a robots.txt file unless you Disallow some pages there they will go on crawling your site, it’s not that writing follow makes Google say: “ohh, let’s follow it because he suggests me to do it!”

    The language tag can bring some benefit but I think Google is able to detcet the language of your page on its own, I use it to easily detect page language from JS, so I use it anyway without thinking at SEO.

  4. My cousin recommended this blog and she was totally right keep up the fantastic work!!!! in addition, your article information really help me full
    Thanks.

  5. the artile is very usefull thankx for sharing

  6. Watch More Than 50+ Most Popular Indian T.v Serials Daily Episode In H.d Quality Video With Good Sound To Watch Please Visit at: http://www.SerialChaska.blogspot.com

  7. Great Post Thankx For Sharing

  8. Google was granted a patent this morning that describes how Google might identify duplicate or near duplicate web pages and decide which version to show in search results, and which ones to filter out. It’s a process possibly close to what Google has been using for a while.

  9. To do that, Google introduced a new set of Source Attribution Metatags for Google News articles, which allow the publishers of breaking stories to tag those stories with an “original source” meta tag, and publishers who are syndicating those stories to use a “syndication-source” metatag. Google controls which sources show up in Google News results, and they note in their page about the source attribution metatags that

  10. Forum | Bye Video

    Hi Thomas,

    Unfortunately there are sites using scraped content that end up ranked ahead of the same content from the original publishers. I’ve experienced that a few times myself.

    These new meta tags are only for sites that are included and ranked in Google News, and there’s a good chance that we may not see their use broadened for use with other kinds of web sitea, and in Google’s organic search results.

    I don’t think instant crawling would be sufficient – if the pages are dynamic – created when clicked upon, the time stamp associated with those show the current time of the crawler’s visit. The page visited first would seem to be the oldest, even if it were a copy. 😛

  11. Like any meta tag, I’m sure that it will open to abuse, although I’m not sure what would be a better way to attribute the source.

Leave a Reply

Your email address will not be published. Required fields are marked *