In the field of search, we all know and appreciate that images have an impact on search results. Mostly this comes down to making content attractive, which in turn generates superior inbound metrics on a shallow gradient. But search engines like Google have also demonstrated that they can look at images more directly, interpreting the pixels with which they are presented.
If you upload an image of a Coca Cola can to Google reverse image search,Google can scour the web for deployments of that same image. Once results run out, Google will begin showing you ‘visually similar’ images:
Obviously AI and machine learning are being leveraged here. Maybe it’s just that clicks from certain users on certain images, denote their innate similarity – but maybe machines can see more than they could before?
Machine Learning Image Experiments
There are two which I’d like to cover here, which I found very interesting. One is a simple web-app which you can access here:
This is a very interesting experiment. Initially, stock images were uploaded into many categories. The user can combine these, and also combine their own images with the creations of others. Each image has certain properties which the AI picks out:
The concepts which the AI ‘understands’ are the genes. By combining more and more images together, each successive image is composed of more genes. Users can even ‘crossbreed’ images, which yields some particularly interesting results, especially when messing around with creature / animal based imagery.
These are some of the images I was able to create through image splicing:
… so not only can machines categorize images into their base components, they can also take those base components and sensibly re-compile them into semi-photo-realistic compositions. Very impressive!
Pure Information to Images
So we have seen that images can be deconstructed and reconstructed, but what about creating images purely from text-based input? Surely that might show us how any AI might begin to ‘visualize’ concepts, entities (as per Google Hummingbird) and contextual information.
This experiment is pretty cool:
It’s in Beta right now and you can download it for free. Firstly, you install Runway ML, then you can join in with several image-based experiments which are currently running. You need to choose a ‘model’ to begin, this is the one which most interests me:
This little experiment allows you to type text and see a generated image:
This experiment is still in its early phases, so sometimes you will think “hmm, I can see what it’s trying to get at but the result isn’t quite there yet…”
Some images which it produces though, are quite interesting.
“A Potato on a Table Surrounded by Horses”
“A Lady Crying”
Some of these are mediocre, others may shock you slightly more in terms of, how a machine is piecing together common concepts and compiling images to represent them. The images will only become more accurate over time, as more queries are processed.
But What Does this Tell Us?
In my opinion it’s quite shocking to see how a mechanical mind blends visuals and attempts to ‘see’ the human world. In the previously posted images, you can already begin to draw some parallels between “A Lady Crying” and “Baby Jesus”.
In “A Lady Crying”, it almost looks as if the generated character may be holding a baby (or more than one). In “Baby Jesus”, the generated baby is mysteriously wearing a blue robe, similar to the robe which Mary (the purported mother of Jesus Christ) is depicted to wear in many human illustrations.
Religion aside (let’s not get into that right now), the machine seems to understand that there is a connection between “baby”, “lady” and “mother”. I find this quite cryptic and fascinating, maybe you will share that outlook.
Largely though, what this tells us is how an AI (including those utilized by search engines like Google) might ‘interpret’ an image. This is actually much more interesting than assessing what a search engine might ‘see’. If a search engine can process an image, it sees exactly the same JPG / SVG / PNG file which we see. But how does it interpret what it is seeing? In what muggy, mechanical mind does it deconstruct and reconstruct visual elements, in order to get to the ‘meaning’ of the media with which it has been presented?
We can certainly see that machines can understand linked contextual entities. Machines can break down and reconstruct images into and from those base components. Machines can be given direct, text-based context and attempt to create something entirely new. Machines can draw parallels between their own created media.
In summary:Â my personal take-away from this is that machines are already quite good at understanding very specific images (“A Lady Crying”) and very broadly contextual images (“Red”). But in-between those two rotating hemispheres, there are ambiguous images which are neither specific, nor overtly broad in meaning. These are the images which mechanical minds (maybe including Google’s) still struggle somewhat to comprehend.
If I were working on an eCommerce store selling rolls of fabric, I’d say that an image of a rolled up bit of fabric would be good for a mechanical mind to interpret. A zoomed in image of just the fabric’s texture, would also be pretty good! A lady standing by a fireplace with a wine-glass in one hand and a fabric-roll in the other? That would be veryÂ difficult for a mechanical mind to interpret.
At the end of the day, the same can be said of the human mind. If there are too many competing ‘features’ within a media piece, we lose our sense of ‘what is this image about?’
In a way, machines and people (maybe even search engines) aren’t so different after all.
Image Search Results
Though we may speculate as to how Google deconstructs, visualizes and reconstructs images to / from raw contextual information, one fact remains – Google’s image results are heavily influenced by trend data.
For example this image query:
… used to present an array of frozen landscapes and food-stuffs. Now it revolves around a Disney movie! In Google’s image results, you’ll now see a filter feature running along the top. Whilst this has demonstrable UX utility (it helps users to narrow down the images they want to see), it’s probably the case that Google takes these clicks and utilizes them to determine which images are heavily related.
Other notable examples of trends influencing search:
https://www.google.com/search?q=fury&tbm=isch – until recently, this query space was more heavily dominated by images of the WW2 tank command movie. That is still mostly the case, however since Tyson Fury has been in the news a lot lately (famous boxer) – you can see some images of him too.
Old but gold: https://www.google.com/search?q=matrix&tbm=isch – do you see images of server farms and computer components, or math equations here? Me either, it’s all dominated by the Matrix franchise’s movie posters and movie-shots. This query-space has remained this way for decades.
As Google becomes better at handling the interplay between raw-data and contextual images, will this change? Will trends remain the top influencer of Google’s image results?
Time will tell.