As humans, we use all our senses to build a rich picture of our environment. We tell stories, steeped in detail, to make experiences real for our listeners. When we look at a photograph, we don’t just see objects, we imagine. If you want a good image of a car on a street to illustrate an article, you don’t just pick the first image of “a car” on “a street” you find. You want the image to tell the right story. While you can limit your search to relevant imagery using standard tagging, there’s still the need for a manual search. If AI is going to fulfil the promise of making life easier for marketers, it’s important to move beyond the basics.
One of the areas of research we’ve been looking at is how can multiple information paths help with understanding the story of an image, without removing all the easy search terms that tagging can provide. How much information do you need to tell the story of an image?
The first step in this has to be detailed recognition and tagging of the objects and scene of the image. While it’s important to get as much detail as possible, showing all of this, or even searching on it directly may not be helpful. While you’ll be able to filter out some of the noise it can be trial and error to get the terms you need. We call this phenomenon of too much tagging “tag fatigue”. A list of objects does not tell a story and we believe it’s important to focus on what technology can do to help rather than adding to the workload.
What we’re looking at is how to define story elements in a fashion that will make it easy for the marketer to focus on the best images quickly. Understanding that there is more to automotive imagery than just the vehicles, we’ve made sure that you can understand the scene and the broader context of the image and any associated text.
In addition to search terms, we’re making it easy for our users to automatically categorise and filter incoming images according to their own rules – you can define the story you want to tell.
When using an image to tell a story, the context is critical. Are there other things in the image that are undesirable: people drinking alcohol, a competitor’s logo, or something that breaks your specific brand guidelines?
So how do we do this? Within Aura we have specialist skills that look at different aspects of the image and text. What makes Aura different is that these skills can communicate with each other, so the tags are not created in isolation. Each result has the potential to reinforce, override, or combine with other skills to get the full context of the image.
This is non-linear, multi-dimensional classification, but we just call it Aura. It’s pretty similar to how our brains understand the world. Where we are lacking information, we fill in what we can’t see with the other information we have. It’s like taking a second glance at a scene, not only do you reinforce what you saw the first time, but you also pick up extra details.
How these skills interact can be changed based on the needs of each story within the platform. Each tag has a score that can be used to decide its impact on other skills. This can also help you balance your own needs between precision (whether you only want clear confident results, but might not get as many results) and recall (where you get more results, but they will be less precise). Different projects have different needs, but with Aura you can determine this with each skill to create the experience you need and power-up your images.
But what about creating images? The first half of 2019 has been pretty exciting for AI with some fantastic examples of photo-realistic image generation that was just not possible even two years ago. It’s already possible to create realistic people from scratch and landscapes from some simple lines.
Above is a completely autogenerated image from thispersondoesnotexist.com where you have no control over what sort of face is generated.
The video below shows realistic landscapes being generated from rough doodles – source Nvidia.
With these examples you can see that creating a bespoke image from some ideas in your head is not that far away. This won’t remove the need for understanding the context of images. While image generation may remove the need for stock photography, we’re seeing from our data that customers respond best to authentic images taken by other like minded customers.
And we’re doing everything we can to help you find them!