Published on
3 min read

AI Guided Image Selection

Authors
  • avatar
    Name
    John Moscarillo
    Twitter

Introduction

AI image generation has made huge strides, but generated images often have subtle flaws that make them unsuitable for professional use. For example, an AI might create a beautiful beef roast, but on closer inspection, the outside is ham and the inside is beef. These "tells" make AI-generated images unreliable for representing real-world concepts. I have started a food related blog and am using AI to create the articles based on titles I created earlier using AI. The blogs are only text and have one header image. I started creating images using AI image generation, but the results are always unsatisfactory. I believe this will get better but for now to create an affordable image the models are still unusable.

The Problem with AI-Generated Images

Despite advances, AI-generated images frequently contain artifacts, oddities, or unrealistic details. These issues are especially problematic for food. Although, hard to describe there is something about AI images that give them away as AI. This is usually obvious like 6 fingers, but sometimes the tell is just the vector, smooth style which is common with DALL-E.

A Better Solution: Multimodal AI for Guided Image Selection

Instead of generating AI images, we can use AI in a multi-step multimodal AI workflow to select the best real image from the web. Here’s how it works:

  1. Keyword Generation: Use AI to analyze the blog title and content, generating a set of three relevant keywords.
  2. Image Search: Use the generated keywords to perform a Google, Flickr or Getty image search, retrieving a set of candidate images.
  3. AI-Powered Selection: Use AI to evaluate the search results and select the image that best represents the blog title and content.

Results are Surprising

By employing this multimodal AI approach, the selected images often exceed expectations in terms of relevance. The combination of keyword extraction and real image selection leads to visuals that are not only authentic but also resonate more deeply with the blog's theme.

Conclusion

Using AI for guided image selection rather than generation offers a practical solution to the challenges of AI-generated images. This approach leverages the strengths of AI in understanding context within a digital image. What I've enjoyed most about AI is discovering how to best use it's capabilities. I am discovering more and more that AI is better at helping with existing tasks rather than recreating existing practices. Just as we used it help select an image from api's we already could use, but we no longer need to rely on those APIs to provide custom filters as we can do this ourselves giving the power to the user and not forcing the api to adapt to our needs.

What I've enjoyed most about AI is discovering how to best use its capabilities. I am discovering more and more that AI is better at helping with existing tasks rather than recreating existing practices. Just as we used it help select an image from APIs we already could use, but we no longer need to rely on those APIs to provide custom filters as we can do this ourselves giving the power to the user and not forcing the API to adapt to our needs.