Basics of image generators

Overview

Using artificial intelligence you cannot just train models to output textual information, but they can also be trained on visual information. Using information related to shape and colour, and the recognition of objects and subsequent visualisation of these objects based on a textual input is also possible.

How do image generators work?

Most of these image generators currently use diffusion technology. These generators start with an ‘empty’ image, containing nothing but noise. This noise is comparable to the sparkle-like static you see on the screen if you unplug an old television. With each step the model takes, a level of noise is taken away and the image is slowly being ‘drawn out’ of the noise. These models are trained to connect visual information (shape, location, colour, etc.) to a textual description, and connect that information to a pattern on pixel-level, allowing the image to take shape over time. The more steps the model goes through, the sharper the image becomes, but the more processing time and power is also required.

With most online image generators the initial noise pattern cannot be pre-defined, but with some models (such as the locally installed Stable Diffusion model) you can specify the seed number, which determines the starting pattern before the models begins its work. This way you can even guarantee that the output is always the same if the same seed number and prompt is used in that model. This allows for greater transparency and documentation.

Figure. Example of the iterative removal of noise from an image as achieved by the Stable Diffusion model. Source: By Benlisquare - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=124800742

Prompting for Image Generation

In prompt engineering for image generation, there are two main aspects to consider: the content and the formatting of the image.

Description of image content

Pay attention to the level of detail you're providing in your prompt. A helpful tip to keep in mind is to describe the image you want in the level of detail you'd describe something in front of you to someone who cannot see it. When prompting for 'a university professor' or 'a nurse', most image generators will provide you with an old white man or a young white woman, respectively. This can largely be explained by bias in the training data (i.e., there are more pictures of old, white, male professors than of young, Black, female ones). This can be prevented by specifying more detail, e.g. on the gender, age, time period, skin colour, etc. of the person you want to visualise.

Image formatting and style

In image generation, the formatting and style possibilities are endless. Below are some examples of the options available.

Art style

You could specify the art style of the output. For example, do you want a pencil drawing, an oil painting, or a photorealistic image?

Amplification

You could add (subjective) adjectives or adverbs to change the output: beautiful, sweet, etc.

Angle and lighting

Especially in photorealistic images, it might be helpful to specify the angle and type of lighting you want. For example, a bird's eye view, or a portrait - soft side lighting or a spotlight.

Positive and negative prompting

Some image generators, like StableDiffusion, have introduced the concept of 'negative prompting' on top of positive prompts (i.e. telling the model what you want to see in its output). In negative prompting, you can specify what you don't want to see.