Named Entities; associations for SEO
If there’s one area of search that has had my interest over the last while it’s entities. And interestingly, the SEO world has also been interested, but they tend to think of in in simpler terms; as brands. Year after year we hear about Google being biased towards brands. But that’s not really the case from where I sit.
It really doesn’t make sense to create algorithms that favour brands. That’s asking the system to make the kind of subjective decisions that are temporally prone. It really doesn’t scale well. What makes more sense is to understand things. Entities.
You like certain brands. You like types of music, movies, sports. When a bunch of other people seem interested in the same ones, it makes sense as a search engine to make them more prominent results. Most (old school) SEOs are just pissed because we can’t outrank brands in their own spaces like we used to.
Part I – What is an entity?
Ok before we go on let’s look at entities. An named entity (in information retrieval terms) is essentially a person, place, thing, event and so forth. These entities can now be associated with temporal data, actions or even other entities.
While not limited to, one patent on the subject stated;
“An "entity," as used herein, may refer to anything that can be tagged as being associated with certain documents. Examples of entities may include news sources, stores, such as online stores, product categories, brands or manufacturers, specific product models, condition (e.g., new, used, refurbished, etc.), authors, artists, people, places, and organizations. ” from Query rewriting with entity detection
[Holiday Inn, downtown, Toronto] – here we have two entities of a business (and inferred type), a location (city) with a refinement.
[Lenny Kravitz, concert, Chicago, 2012] – we have 2. A group, a city with date and event type modifiers.
[David Harry, SEO, Search News Central] – a really cool dude, a website with a topical refinement.
Simplistic, but it highlights the concept. One can start making entity associations to better understand how they (entities) relate to concepts and categorizations. The search engine would look for common words, phrases and other data points commonly associated with said entity. The might also then (internally) re-write the query to produce potential results that it might not otherwise find.
On the other hand, it’s one heck of a trick at times because there is always room for confusion and false-positives. Consider;
[time travel] – which the engine might try to assign TIME as entity and return the travel section of the website, instead of a funky machine that Stewy built.
Other elements they look at include when common words themselves create an entity;
“For example, the term "Blu-ray" is a new word that describes a blue laser-based, high-density optical disc format for the storage of digital media. Once a new word is generally accepted, it can become part of the lexicon and be included in dictionaries. ” – via Detecting name entities and new words
As with all things, it is one of many processes that would be used to further refine search quality. It plays nicely with temporal, semantic, geo-localized and other signals. But it’s importance, should be understated.
Uses and challenges
This becomes quite useful in that traditionally the search engine might look at a term like [New York travelling] and break it into [New] / [York] instead of realizing that [New York] is an entity. As you might imagine, this has implications not only for semantic analysis, but localizations, personalization and more.
There are even implications in the mobile world because many times not only are the queries more refined, but they relate to entities (products, song downloads, locations, movies). There are some patent awards that looked at this particular segment which shows us (as SEOs) that not all searches, nor applications are treated the same. More on that another time…
Named entities of course can also be categorized which is particularly handy for advertising platforms (such as Google AdSense).
“(…) associating an entity with a category includes determining a probability value for each of at least a subset of a plurality of categories, the probability value representing a likelihood that an identified entity belongs to the respective category and determined using information about the entity. The method includes recording one of the plurality of categories for the entity, the category identified using the probability value and a rule set for the plurality of categories. “ from Associating an Entity with a Category
While we have talked in the past about some issues with sentiment analysis, let’s take that leap anyway. Entity associations can move into the more implicit areas such as reviews and sentiment. How are these entities thought of? There’s a big difference between negative political commentary and dangerous break lines, but at some point these kinds of deeper entity associations are likely to become part of the search landscape.
In fact, in that scenario, the users leaving the reviews or sentiments themselves become entities which can be weighted/scored. This can be a handy quality control/spam tool.
In short, names entities are a multifaceted tool which works well with the others in the tool box.
Authors and authority
This of course is where it get’s interesting. The search engine now can look at users, authors and people for deeper connections and valuations. It might be social channels, review sites, blogs or anywhere else an entity can be associated.
- Temporal – velocity which an entity is mentioned
- Trust – think TrustRank type concepts.
- Athorship – more on AgentRank in a moment
- Social graph – connections and activity
Of those of course, the recent goodies from Bill on AgentRank, are a pretty good example. Here, let’s have him take over for a second, I need to stretch my digits.
“The patent describes how authors can mark the content that they publish on the Web with a digital signature, whether that content might be a web page, or article, or blog post, or even comment, and how their authorship might influence the rankings of that content by associating a reputation score with it. “
“In addition to using authorship markup to identify who the author and possibly originator of content might be, the Agent Rank process also involves adding a quality score to a document based upon the reputation score of its author. ” from – Agent Rank, or Google Plus as an Identity Service
Thanks brother, ready to take over again…
This of course we’ve seen glmipses of with Google’s author mark-up and some of the annotations that are showing up in the SERP over the last year.
You see my friend? Entities and their associations are an important study for those engaged in SEO. Search engines are more and more looking at them to better make sense of the web and it’s best we knew what they were thinking. It is the lack of knowledge that leaves one susceptible to mailings from cuddly panda’s. Ya feel me?
The SEO Connection
Well, if you were trying to be sneaky and jump to the end and get the goods, then you have miscalculated grasshopper. You see this is going to be the first in the two part series. Ha! This post was getting long and it seems best to get in the head space first, then sort out what it means to the average search optimization geek.
If you did skip the rest, go back and read it. It will be important for the next part. Sometimes there are no shortcuts.
I hope this part of the series was enough to get you into the groove without putting you to sleep. It only hurts for a little bit. I promise next week we’ll make some sense of this new found knowledge and turn it into some actionable goodiness for you. Ok?
Until then, here’s some more reading for the adventurous;
- Content Author Badges
- Query rewriting with entity detection
- Agent Rank
- Agent Rank (II)
- Query rewriting with entity detection
- Delegating authority to evaluate content
- Associating an Entity with a Category
- Detecting Entities and New Words
- Query-independent entity importance in books
- Query rewriting with entity detection
- Webpage entity extraction through joint understanding of page structures and sentences
- Providing comparison experiences in response to search queries
- Named Entity Recognition in Query
- Applying model of a persona to search results
- Web-scale entity relationship extraction
- Scalable Incremental Semantic Entity and Relatedness Extraction from Unstructured Text
- Identifying location names within document text
- Comparisons of entities of a particular type
- Entity category determination
- Named Entity transliteration using corporate corpra
- Entity-specific tuned searching
- Entity-specific search model
- Identifying synonyms of entities using web search
- Identifying synonyms of entities using a document collection
- Mining knowledge sources for improved entity extraction
- Mining knowledge sources with auto learning for improved entity extraction
- Implicit Name Searching
- Method and system for socializing events
- Large scale entity-specific resource classification
- System and method for adding identity to web rank
- Scalable semi-structured named entity detection
- Extracting entities from a web
Til next time… keep the gospel of the geek strong!