How AI allows us to paint pictures with words

Analysis: new AI applications can generate images based on natural language prompts provided by human users

Digital image generation can be a difficult and time-consuming process. Nevertheless, artificially intelligent applications have recently pushed the limits of creativity when it comes to generating images from mere descriptions. One of the basic ideas behind AI is to build models that can learn from examples provided by humans and make intuitive decisions for new instances. This idea has been applied across text, image and video processing, and has given rise to building AI models that can be more creative.

When we think about creativity, we usually associate it with art and writing. There have been many attempts to teach machines the same kind of creativity that we humans have. One way is to tell the machine about a bunch of examples in text, and let it figure out the rules on its own. This is handled by Natural Language Processing (NLP), which is a subfield of AI that seeks to give computers the ability to process human language as it is written or spoken by humans.

Another way of teaching machines to be creative is to show them a bunch of images and tell them what to look for in those images. This process is called Computer Vision, which is famously applied in self-driving vehicles, drone monitoring and facial recognition applications.

We need your consent to load this rte-player contentWe use rte-player to manage extra content that can set cookies on your device and collect data about your activity. Please review their details and accept them to load the content.Manage Preferences

From RTÉ Radio 1's Ray D'Arcy Show, Dr Susan Leavy on the capabilities of AI

More recently, we have seen advances in the creativity of machines using such AI models as DALL-E, DALL-E 2, Imagen and Craiyon (formerly known as DALL-E mini). These are AI applications that can generate images based on natural language prompts provided by human users.

So can we generate an image from a description?" Yes, this is where these text-to-image AI models come into play.

How do text-to-image models work?

All of these models come up with a picture based on the sentence. Such models started out by training themselves on millions of images from the internet, and their text captions. From these, they learn how to associate elements, individual objects and features in a picture to words or phrases that describe them.

When it comes to making a picture based on a new description, it takes that description as an input, and first breaks it down into words or phrases. Having learned from a large number of images, the AI is able to decide which are relevant to the task at hand. It can then take these similar images, merge them into one, and then turn them into a different image. In the process, it can also create a new concept from existing images.

From DALL-E to Craiyon

DALL-E

It all started with OpenAI's DALL-E. Here's an illustration of the images generated by DALL-E for the prompt: "an armchair in the shape of an avocado".

Here, it identifies keywords such as 'armchair', ‘shape’ and ‘avacado’ from the text and comes up with multiple versions of what it thinks might be "an armchair in the shape of an avocado".

DALL-E 2

Following DALL-E, OpenAI released DALL-E 2 in April 2022 which creates more photorealistic images from text prompts. More recently, DALL-E 2 was used to generate Cosmopolitan's June magazine cover ("meet the world's first artificially intelligent magazine cover") in a mere 20-seconds.

Imagen

Rivaling DALL-E 2, Google rolled out their version called Imagen which creates super-high resolution images by learning semantic information from text.

Craiyon

Inspired by DALL-E, machine learning engineer Boris Dayma built DALL-E mini as an open source project accessible to the general public. DALL-E mini is not associated with OpenAI's model and hence, has been renamed as "Craiyon".

Ethical issues surrounding AI-powered image generation

They're not real

It is true that the ultimate goal of these images is to come up with realistic looking images, but since these images are fake, it raises concerns on how people will use them.

Biased images

These automated systems can be trained using data from the world, and sometimes, it can pick up societal biases that exist in that data.

Who is the creator?

With an AI-generated image, "who is the creator?". Is it the human who chooses the words, or the AI that creates the image by analyzing how humans describe things? This will always be a concern that raises copyright issues and creative claims on the image.

Despite such ethical issues around how these images are going to be used and the proper attribution of creativity and copyright, the potential of creating high-resolution, photorealistic images within a matter of seconds delivers massive advantages and is a leap towards encoding creativity into AI models.

The views expressed here are those of the author and do not represent or reflect the views of RTÉ

How AI allows us to paint pictures with words

How do text-to-image models work?