Press "Enter" to skip to content

Named Entities; associations for SEO

If there’s one area of search that has had my interest over the last while it’s entities. And interestingly, the SEO world has also been interested, but they tend to think of in in simpler terms; as brands. Year after year we hear about Google being biased towards brands. But that’s not really the case from where I sit.

It really doesn’t make sense to create algorithms that favour brands. That’s asking the system to make the kind of subjective decisions that are temporally prone. It really doesn’t scale well. What makes more sense is to understand things. Entities.

You like certain brands. You like types of music, movies, sports. When a bunch of other people seem interested in the same ones, it makes sense as a search engine to make them more prominent results. Most (old school) SEOs are just pissed because we can’t outrank brands in their own spaces like we used to.

What are named entities?

Part I – What is an entity?

Ok before we go on let’s look at entities. An named entity (in information retrieval terms) is essentially a person, place, thing, event and so forth. These entities can now be associated with temporal data, actions or even other entities.

While not limited to, one patent on the subject stated;

An “entity,” as used herein, may refer to anything that can be tagged as being associated with certain documents. Examples of entities may include news sources, stores, such as online stores, product categories, brands or manufacturers, specific product models, condition (e.g., new, used, refurbished, etc.), authors, artists, people, places, and organizations. “ from Query rewriting with entity detection

Some examples;

[Holiday Inn, downtown, Toronto] ““ here we have two entities of a business (and inferred type), a location (city) with a refinement.

Holiday Inn

[Lenny Kravitz, concert, Chicago, 2012] ““ we have 2. A group, a city with date and event type modifiers.

Lenny Kravitz

[David Harry, SEO, Search News Central] ““ a really cool dude, a website with a topical refinement.

Dave Harry SERP

Simplistic, but it highlights the concept. One can start making entity associations to better understand how they (entities) relate to concepts and categorizations. The search engine would look for common words, phrases and other data points commonly associated with said entity. The might also then (internally) re-write the query to produce potential results that it might not otherwise find.

On the other hand, it’s one heck of a trick at times because there is always room for confusion and false-positives. Consider;

[time travel] ““ which the engine might try to assign TIME as entity and return the travel section of the website, instead of a funky machine that Stewy built.

Other elements they look at include when common words themselves create an entity;

For example, the term “Blu-ray” is a new word that describes a blue laser-based, high-density optical disc format for the storage of digital media. Once a new word is generally accepted, it can become part of the lexicon and be included in dictionaries. “ – via Detecting name entities and new words

As with all things, it is one of many processes that would be used to further refine search quality. It plays nicely with temporal, semantic, geo-localized and other signals. But it’s importance, should be understated.

New words and named entities

Uses and challenges

This becomes quite useful in that traditionally the search engine might look at a term like [New York travelling] and break it into [New] / [York] instead of realizing that [New York] is an entity. As you might imagine, this has implications not only for semantic analysis, but localizations, personalization and more.

There are even implications in the mobile world because many times not only are the queries more refined, but they relate to entities (products, song downloads, locations, movies). There are some patent awards that looked at this particular segment which shows us (as SEOs) that not all searches, nor applications are treated the same. More on that another time…

Named entities of course can also be categorized which is particularly handy for advertising platforms (such as Google AdSense).

“(…) associating an entity with a category includes determining a probability value for each of at least a subset of a plurality of categories, the probability value representing a likelihood that an identified entity belongs to the respective category and determined using information about the entity. The method includes recording one of the plurality of categories for the entity, the category identified using the probability value and a rule set for the plurality of categories. ” from Associating an Entity with a Category

While we have talked in the past about some issues with sentiment analysis, let’s take that leap anyway. Entity associations can move into the more implicit areas such as reviews and sentiment. How are these entities thought of? There’s a big difference between negative political commentary and dangerous break lines, but at some point these kinds of deeper entity associations are likely to become part of the search landscape.

In fact, in that scenario, the users leaving the reviews or sentiments themselves become entities which can be weighted/scored. This can be a handy quality control/spam tool.

In short, names entities are a multifaceted tool which works well with the others in the tool box.

Authors and authority

This of course is where it get’s interesting. The search engine now can look at users, authors and people for deeper connections and valuations. It might be social channels, review sites, blogs or anywhere else an entity can be associated.

Consider;

  • Temporal ““ velocity which an entity is mentioned
  • Trust ““ think TrustRank type concepts.
  • Authorship ““ more on AgentRank in a moment
  • Social graph ““ connections and activity

Of those of course, the recent goodies from Bill on AgentRank, are a pretty good example. Here, let’s have him take over for a second, I need to stretch my digits.

“The patent describes how authors can mark the content that they publish on the Web with a digital signature, whether that content might be a web page, or article, or blog post, or even comment, and how their authorship might influence the rankings of that content by associating a reputation score with it. “

and…

“In addition to using authorship markup to identify who the author and possibly originator of content might be, the Agent Rank process also involves adding a quality score to a document based upon the reputation score of its author. “ from – Agent Rank, or Google Plus as an Identity Service

Thanks brother, ready to take over again…

This of course we’ve seen glmipses of with Google’s author mark-up and some of the annotations that are showing up in the SERP over the last year.

Author annotations in SERP

You see my friend? Entities and their associations are an important study for those engaged in SEO. Search engines are more and more looking at them to better make sense of the web and it’s best we knew what they were thinking. It is the lack of knowledge that leaves one susceptible to mailings from cuddly panda’s. Ya feel me?

The SEO Connection

Well, if you were trying to be sneaky and jump to the end and get the goods, then you have miscalculated grasshopper. You see this is going to be the first in the two part series. Ha! This post was getting long and it seems best to get in the head space first, then sort out what it means to the average search optimization geek.

If you did skip the rest, go back and read it. It will be important for the next part. Sometimes there are no shortcuts.

I hope this part of the series was enough to get you into the groove without putting you to sleep. It only hurts for a little bit. I promise next week we’ll make some sense of this new found knowledge and turn it into some actionable goodiness for you. Ok?

Until then, here’s some more reading for the adventurous;

Google Patents;

Microsoft patents;

Yahoo Patents;

Til next time… keep the gospel of the geek strong!

Facebooktwitterlinkedinmail

7 Comments

  1. Bill Slawski January 10, 2012

    Hi Dave,

    Excellent post and great list of related patents.

    The brand bias that I’ve seen a lot of people write about has always been myopic in that it doesn’t look at the bigger picture and attempt to understand all the different ways that entity association can impact what we see in search results.

    That Google and the other search engines have been heavily engaged in entity association is very clear when it comes to local search, where the value of a mention of a business along with some location information is seen as helpful to get a business to rank well in Google Maps.

    A 2010 paper from Microsoft told us that up to 20 percent of queries at Bing are solely entities, and more than half of all queries include entities within them.

    And yet this is an area where there seems to be very little discussion on the topic in search circles.

    Looking forward to the second part.

    Thanks.

    Bill

  2. SNCadmin January 10, 2012

    Lo Bill…good to cya as always.

    As you noted, it seemed like a topic that really wasn’t covered a lot. Also given the tin-foil set and their ‘brand bias’ it was also an area that seemed like it could use some discussion.

    Once you get into this stuff, it becomes amazing at how much entities play into queries in most cases (as U noted with Bing data) . I also found that it played well with other elements (link valuations, semantic analysis, spam etc..). That’s always seemingly important as to which elements get used and don’t.

    Then of course, we move into the social graph, we see more and more where this plays in. MetaWeb purchase was telling and equially uncovered in the SEO world by and large.

    Will publish the rest next week. It started getting too long so I had to chop it in half. You know how that goes I’m sure lol.

    (noticed the named entities in the TOP list…. can’t agree more there, thus was already writing about it)

    Great to cya as always Bill. Talk soon. I want to get ye on the podcast once the series is up. Talk about them all.

  3. Dan Cruz January 10, 2012

    Great write up as usual and thanks for sharing. Been digging into some of the patents you referenced and the alleged “brand bias” by Google is starting to make more and more sense…

    Looking forward to part 2

  4. Alan Bleiweiss January 11, 2012

    Love the write-up Dave! Having been off the grid for a full month, I’m just diving back into my reading and this is a top-shelf article on the concepts. People get so blinded by the false belief that a “brand” automatically equates to “big corporation that garners favor from other big corporations”, and instantly, half the search community jumps into rebellious, ignorant school child mode.

  5. SNCadmin January 11, 2012

    Thanks Dan, it is an interesting area of study and it should clear up how ‘brands’ are dealt with in some ways. enjoy

    Yes Alan, it always kinda irked me because in all my years of geeking I’d never seen any papers/patents that really said ‘give this company a boost’ or the inverse. It’s not scalable (users tastes change). What I did see lot’s of was of course, entities and various treaments thereof. That’s why I thought, some time ago actually (took a while to getting around to writing about it) , it would be a good topic to cover for those less prone to panic and tin foil.

    Will it matter? Of course not. But if I can make a few more geeks out there, then it’s done it’s job.

    Oh and welcome back… ;0)

  6. Jun Baranggan January 11, 2012

    Very informative post Dave! Can’t wait for the next installation now for the actionable goodies. 🙂

Comments are closed.

Copyright© 2010-2022 Search News Central (SNC) | No material on this site may be used or repurposed in any fashion without prior written permission.