The prowess of DALL-E2
In Jan’21, OpenAI, an artificial intelligence research & development company, introduced DALL-E, a tool which automatically creates and edits images simply by following natural language instructions given by the user. DALL-E 2 is the next evolution of OpenAI’s DALL-E system introduced in 2021. It builds on that system by offering more realistic, accurate image generation with 4x the resolution. According to the team behind the project, it can combine concepts, attributes and styles to create original, realistic images and art from a text description. It does so by making realistic edits to existing images from natural language captions, add or remove elements while taking shadows, reflections & textures into account. DALL-E 2 uses sophisticated deep learning AI called a “generative model” (powered by neural networks) to not only create images from natural language, but also understand the relationships between objects in the image.
It uses a process called “diffusion,” which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image.
This small technical detail is actually a huge milestone in the AI industry, as historically, neural networks like the one used in DALL-E 2 have not been good at understanding the relationship between the objects.
DALL-E 2 recieves a text prompt from the user, in natural language, let's say: "An astronaut riding a horse", and then proceeds to identify the objects present in prompt. It then connects the relationships between the objects, and tries to create an aesthetically pleasing output.
Here's an illustration:
Image credits: MKBHD YouTube Channel
with great power, comes great responsibility
OpenAI seems to recognize this fact. They are releasing DALL-E 2 responsibly to work out safety issues, making it available to only a selected group of researchers at the moment. So far, OpenAI has shared a couple of ways they are thinking about safety issues related to their tech:
The company says that they have limited the ability for DALL-E 2 to generate violent, hate or adult images, by removing such content from the training data, to minimise the network's exposure to such concept.
The company is making sure that DALL-E 2 can not create photorealistic images of real people, including public figures.
disrupting the market
When DALL-E 2 becomes commercially available, we can expect some serious opportunities and disruptions. Designers, artists, visual content creators & photographers might be the ones hardest hit when services like this are opened for commercial use. Why hire a designer when DALL-E 2 can edit & create content based on instructions from anyone in the company? Why pay for a stock photo license when DALL-E 2 can create any image you want?
For a fun experiment on this topic, check out this video by Marques Brownlee's team:
On the other hand, there is an argument that DALL-E 2 will make many business professionals more creative as any visual idea can become a reality. If used effectively, it can prove to be a big time & money saver for marketers, as their general tasks would become faster and cheaper with machine assisted image generation.
"A decade ago, the conventional wisdom was that AI would first impact physical labor, and then cognitive labor, and then maybe someday it could do creative work. It now looks like it’s going to go in the opposite order."
We can only speculate what path OpenAI decides to take with this project. We could look at the GPT-3 language model, OpenAI’s previous groundbreaking research project. Upon being released for commercial use, it caused an explosion of AI powered content generation tools. Finally, it was exclusively licensed to Microsoft. OpenAI has stated that it will license its technologies when the time is right to fund its research.
Fun fact: The name DALL-E is a nod to the artist Salvador Dali and the Pixar movie robot WALL-E. Here's an Image generated by DALL-E 2 based on the prompt: "Vibrant portrait painting of Salvador Dalí with a robotic half face":
This article is a part of the May'22 edition of our Startup Newsletter. Here's the complete publication: