Third Light Chorus and Artificial Intelligence features
Recent years have seen an explosion in the number of artificial intelligence features available to help busy marketing teams manage their digital media libraries. In this update, we'll look at how Chorus uses these advances to save time and improve the effectiveness of metadata by using automatic, computer vision or machine learning-based tagging features.
Chorus uses AI and computer vision features in various ways. The objective is to reduce manual labor, add additional insights or textual tags without requiring human intervention, and ultimately to make search and re-use of content simpler. As many commentators have already said, there are some practical limits on how these features work, so these need to be balanced against the benefits. We'll go through each in turn!
Dominant Color detection
Every file uploaded to Chorus is scanned to determine its dominant color palette. We use a smart approach to this, with emphasis on the subject matter, so that distractions like pure white backgrounds don't stop the correct color being detected.
This is a simple and incredibly useful feature. For example, if you are working on a magazine cover and need to find a file which meets certain criteria and matches your design brief, with Chorus you can easily perform a search: in this case, maybe we are looking for a dominant blue scene, in landscape orientation and with more than 6 megapixels. Here's how Chorus does that with its search engine.
Let's also consider another common problem that computer vision has solved for us - how to crop an image. A very common problem occurs when a central square crop is used, but the subject of the image is not central (for example, this happens all the time with portraits of people). In Chorus, we use computer vision to detect the key subject of the image, such as faces or other objects, and we made sure that we crop to include them. It's often unobtrusive, but the difference can also be very significant.
Above, is an example of how this feature quickly improves your experience working with your digital media. The left thumbnail used conventional cropping, and the right thumbnail used Chorus Smart Cropping.
Adding metadata (object detection and landmark detection)
So far, we've looked at how Chorus provides assistance with computer vision. Now, let's look at a feature that uses Artificial Intelligent (AI) to solve a very old problem, namely creating metadata keywords to describe an image. This is one of the most important challenges that has been solved in recent years, and saves a considerable amount of time for a wide range of common subjects.
In this example, we can see an image with no metadata has been scanned using the Google Cloud Vision API, and tagged with good quality keywords in Chorus. We can also ask Google to identify landmarks, which can provide very precise descriptions of buildings or well-known locations such as coastal features, mountains or other notable places around the globe. In addition to integrating with Google, Chorus also integrates with the on-premises software provided by Imagga, an independent specialist in AI object recognition. This is ideal for on-premises deployments where a connection to Google is not feasible.
This feature works incredibly well on images with strong, commonplace items or locations, but it's not effective when the images are highly specific to a particular user. For example, it can tag different kinds of vehicle, but it will not describe individual parts inside an air conditioning component. Because of this, we always store computer-generated keywords in a separate metadata field that helps isolate computer-generated metadata from manually-entered metadata.
Chorus can also perform keyword "expansion" - that is, increasing the number of words used to describe a file - using a thesaurus approach. To take an example, entering a single keyword such as "presentation" can automatically expand to cover other keywords such as demonstration, ceremony, proposition and so on. This is different from synthesising new keywords based on an image that has no pre-existing metadata, of course.
Today, AI can translate text between languages quickly and more accurately than ever. In Chorus, we can provide the ability to translate existing metadata in any of 100 languages, and store it either alongside the original metadata or in a derivative. Using derivatives quickly connects the translated file to its master original, and helps preserve both versions completely independently.
In this example, a monument in Budapest is described in Hungarian, and faithfully translated to English while preserving the local name of the monument.
Have you ever wanted to translate everything in your Chorus site into Mandarin or Hindi, to help with international collaboration? Now you can, and the results are very high quality thanks to the machine learning approach used by Google Translate.
Optical Character Recognition
When a photograph contains text, the process of extracting the text back into searchable text - the kind you can select with your mouse and cut-and-paste, or enter into a search box - is called Optical Character Recognition. It's usually made more complicated by the text being skewed, distorted, partly obscured or unclear due to focus or lighting issues. While past OCR software provided very high levels of accuracy for clear scans of documents or printed content, the latest AI-powered OCR services can go far deeper into an image, extracting smaller fragments of text and returning far more readable, accurate results.
In Chorus, we can support OCR scanning of content using Google Cloud Vision. This service allows us to populate new metadata fields in Chorus quickly and efficiently, by scanning and adding the text returned so that it's searchable. We extract text content from PDFs and Word documents, so that they're easily searched, too.
Locations and GPS
Our final example is about interpreting location data. Where was an image taken? Can we search for locations by name, or see other images nearby?
When a photograph is captured from an advanced camera or smart phone, the location is captured in the Exif (camera data) along side the image. This gives us latitude and longitude data (and altitude). We use this data to power search options that can find files tagged within a radius of 10 miles of a specific location, for example.
Latitude and longitude numbers are relatively difficult to work with, though. For example, the latitude and longitude of 40.6892° N, 74.0445° W identify a significant landmark - but can you guess which one? (Answer further down the page!)
Here's another situation. Looking at the photo of the church, it's not immediately obvious where this is, unless you happen to recognise it.
This church is in the highland village of Ballater, Scotland, and was taken on an Apple iPhone with GPS enabled. We can see that Chorus has displayed it on a map, and also tagged it as "Ballater, Scotland". This image is now searchable using the word "Ballater", or "Scotand", too.
This helps us find the image, and of course provides valuable searchable metadata that we haven't had to enter by hand. Metadata suggestions (facets) in Chorus are automatically populated with this new information.
Bringing it all together
Chorus is using all of these cutting edge tools, either developed by Third Light or sourced from Google to harness billions of dollars of research and development work. They provide extremely powerful, time-saving tools than can categorise, crop and tag content in ways that we have traditionally had to do manually. These tools help us manage and organize content in our Chorus media libraries, and they make it easier to search without investing manual labor in many cases.
We plan to continue to expand our features in this area, including GDPR (privacy) compliant features that can detect faces or allow tagging of people.
Third Light's Customer Success team will also be hosting a webinar on 27th February to explain some of the AI features available in Chorus media library. To sign-up click on the register link below.
Chorus Webinar: AI & Google Vision
Thursday, February 27, 2020
3:00PM - 3:30PM GMT
(If you're still wondering about 40.6892° N, 74.0445° W, it's the Statue Of Liberty!)