A lot of people seem to want to know what the best way is to write in order to maximize the benefit from semantic search. So rather than sit on the secret, I decided to share it here. But first, let’s talk a little about what exactly semantic search is.
There’s a lot of confusion about what semantic search is, strangely enough. Some people seem to want to paint it as some sort of a dark art.
They give it new names, assign mystical powers to it, liken it to artificial intelligence… while others over-simplify it, claiming that it’s nothing more than an index of synonyms – that semantic analysis is still out of man’s reach.
Both are wrong.
Certainly, the earliest steps toward semantic search involved the recognition of synonyms. Car, auto, automobile, coach, vehicle… the search algorithms were first designed to see all of these as possible similarities. But that soon presented other problems.
If a user entered the search term transmission, was he looking for information on a gearbox or a radio signal? Or perhaps he was interested in how communicable diseases are passed from host to host.
The most logical way to determine which pages were the most relevant was to look at the general context of the page. That means looking at the other terms and phrases around the search term.
If the term transmission is prefaced with the word automatic, one might assume that the page is talking about automatic transmissions in vehicles, whereas airborne transmission is much more likely to deal with communicable diseases, and high frequency transmission with radio or television signals.
Is it Really that Simple?
While this may seem like a simple enough relationship, but it’s more complex that it may look at first glance.
Technologies like LSI (latent semantic indexing), LDA (latent Dirichlet allocation) and pLSA (probabilistic latent semantic analysis) are just a few that are hotly argued now and then amongst those that make their living working with information retrieval.
Certainly all have aspects that apply to semantic interpretation of the content of a document. But with 6.89 billion pages in the indexed web, scalability is obviously a major factor in the feasibility of using any such technology.
And unfortunately, none of these offer the necessary ability to scale for the needs of indexing and sorting the entire Internet.
So What’s the Solution?
Okay, that brings us back to the original “secret” I promised to share:
How do we write in such a way as to cater to the search engines’ ability, however limited it may be, to comprehend the content we generate – in short, how do we write semantically?
It’s really quite simple. So simple, in fact, that I’m sure some of you are already doing it, even if you never imagined it was the secret sauce for allowing the search engines to semantically interpret your pages.
Profound, huh? Don’t you just love it? So simple, it’s been staring you in the face for years. Didn’t you ever recognize its wisdom?
What’s that? You don’t recognize it? Oh, sorry…. try this:
Write As Though You Are Writing To Human Beings.
Think about it… it’s really the only approach that makes sense.
First of all, the search and retrieval algorithms are written by human beings. So they have a natural tendency to try to make machines understand content in a manner that’s similar to the way a user might be searching for it. So similar “thought” processes make sense.
Second, they have to develop those algorithms in such a way as to make them effective in the broadest possible fashion. That means following those similar processes.
So anything you do that departs from those natural processes will inevitably take you further and further from where you want to be. The search engines are striving to listen in the same language as the users, so you’d better be speaking that language.
As my buddy, Dave Harry said to me the other day, “NOT caring about semantic analysis IS caring about it.” Which I think is exactly correct – that’s the ideal approach.
In a way, it’s like the old story about the gal that looks and looks for a likely candidate for a husband, with no luck. Then, once she stops looking, she finds one.
The bottom line is, if you’re supposed to be talking to people in a language that they understand, and you hope that some eavesdroppers will pick up on what you’re saying, too, it makes sense to speak in the same language that everyone is listening in.
And the best way to do that is to concentrate upon who you’re talking to.