“The Little Story About Generative AI: The Drawing Challenge” is a story that aims to give an intuitive understanding of how Generative AI works through a format that is simple and easy to digest. It will not necessarily be apparent throughout the story how it relates to the Generative AI, but the last section, “Closing Thoughts,” will explain how they relate. Have fun reading and feel free to comment!
Imagine that you and one of your good friends just registered for a challenge you read about online. You are yet to learn what it is about as it only says “Secret Challenge,” but you are participating together, and sure, it will be fun!
It is the day of the challenge, and you and your friend just met up with the administrator outside the building where the challenge is taking place. She (the administrator) tells you to follow her as she will show you where the challenge occurs. You are both brought into an empty room with a large orange floor and four colored walls. There are no other challengers, tables, chairs, or anything else but two doors at each end of the room and the anticipation of what is about to happen.
The administrator starts telling you the rules: “The rules are quite simple: There are three rooms in total, the main room and two smaller rooms. The challenge is split into six rounds. Only one of you can be in the main room simultaneously while a round is going on, but you can switch who is in the room multiple times during each round. This rule means that you cannot see or hear each other. One person is placed in one of the smaller rooms with four canvases and drawing materials, and the other is placed in the other small room with four pieces of paper. There will be something on each of the four pieces of paper; The goal is to come as close to drawing what is on the papers as possible. You will know whether one is in the main room by a lamp above your door; it will turn green when no one is in the room and otherwise red. You may talk to each other between the rounds.”
Before going to your individual rooms, the administrator places a rectangular cube on the floor in the main room. The administrator explains that while a round is going on, only the cube and one of you can simultaneously be in the main room. This means you can leave the cube in the main room while you rotate between those in there. You take the cube and notice that it is a bit sticky but think nothing more about it as the challenge is about to start and you are hyped and very confused!
Round 1: A Clean Slate
As your friend is the better painter, you have decided that he will be in the room with the canvas and you will be in the room with the papers. You also agreed with your friend that you should take turns going into the room and trying to communicate one piece of paper at a time.
You walk into the room and see a green light shining above the door and four papers lying on the ground, just as the administrator said. You pick them up to see an image on each piece of paper:
- One image of a cat
- One image of a kitchen
- One image of a burger
- One image of a tree
You pick the first one, an image of a cat, and go into the main room.
In the main room, you see the wide floor and large walls again, a lonely door on the opposite side, and the cube on the floor. You are pretty confused about how to communicate with your friend as you didn’t agree on anything beforehand. With nothing else to do, you leave for your room to ponder on what to do.
Soon after getting into your room, you see the light turn red, indicating that your friend just went into the main room, laughing a bit about the confusion he must have right now, just as you had yourself. Soon after, the lamp goes green again, indicating that your friend just left the room to try his luck guessing what was one the first of your papers. You pick up the second paper, an image of a kitchen, and go into the room again, not quite confident that this time will be any different.
And lord behold, you were right; nothing had changed! A bit irritated, you hit the cube with your shoe and see it roll a bit. You can at least play a bit of awkward cube football, even if there is nothing else to do, so you hit it a couple more times before leaving for your room again.
This continues until you both have visited the room four times and if you should say it yourself, you have gotten quite good at cube football! But not closer to winning the challenge…
Retrospective on Round 1
After round one, you and your friend meet up again, you feel a bit down as the prospect of winning is slim, but you are surprised to see that your friend, for some reason, is in a higher spirit than you. You both bring the things from your room to see how close you are to each other’s paintings. And to no one’s surprise, nothing matched! Everything drawn was of random things that had nothing to do with your papers. You look at your friend in disbelief, not because the images don’t look like each other, but rather because of your friend’s positive energy, starkly contrasting your own.
You ask your friend why he is in such a good spirit. He tells you that he has figured out how to draw what is on your paper without talking together! You look puzzled at him and ask him to explain in more details. He tells you that he was quite confused the first time he entered the room as there was nothing to indicate what should be drawn; he went inside his room quite fast to get the rotations going. He expected the same the second time he went in, but to his surprise, the room did not look the same! “Not the same? You must be crazy,” you told your friend. “It is a big room with nothing in it; how can it look different? There are not even windows”.
As it turned out, the difference in the room was not significant; however, it was essential. Each time he came into the room, the cube on the floor was lying in different places. Knowing you, he knew you probably used it for football, but that was not important because this was the key to communicating!
“Yes, that is it!” you yell excitedly. You can use the floor to indicate what your friend should paint. In high spirits, both of you look at your papers again to see what he was supposed to paint. You agree to split the floor into four equal squares, one for when it is a cat, kitchen, burger, and tree. Easy!
You tell the administrator that you are ready for the next round.
Round 2: Simple Groups
The second round begins and you are so ready for it! You go straight for the papers as you now know what to do. You pick up the first paper, and it shows, as expected, a cat, one of the forms from the last round. You check if the light is green. It is. You run into the room to place the cube in the area you decided would be for cats. You walk into your room again, waiting to see the light going red. You smile to yourself, knowing you are on the right path.
After a few seconds of looking at the red light, you turn around and go for the following paper. A steer runs through you as you look at the paper in your hands. You take the next one, still panicking at what you see. You go for the last paper hoping it is different, but no. What you see on the papers are:
- An image of a mouse
- An image of a dog
- An image of a horse
This was not like anything that you and your friend agreed upon, and you don’t know what to do… Or actually, there is only one thing to do: End the round straight away. At least you got one right this time!
Retrospective on Round 2
You meet up with your friend again. He looks as happy as you were the first time you entered the room with the image of a cat, again a stark contrast to how you feel now. As expected, your friend shows four paintings portraying a cat. His face gets stiffer as you show him the papers of different animals. You agree that you got closer this time but certainly far away from getting everything right.
After a bit of pondering, you get the idea of splitting the floor into eight areas, seven of which are for each type you have seen up until now and one reserved for when the image is of something new. The chance of your friend guessing correctly in the empty area will be low, but at least there will be one.
You are quite confident as you go into each of your rooms again; Even if there should come something new, you now know what to do.
Round 3: More Groups
As expected, there is more familiarity this time. You look at all the papers from the beginning this time to see what is on them:
- An image of a mouse
- An image of a burger
- An image of a wolf
- An image of a raccoon
You remember that the mouse should be in the lower left corner and therefore starts with that one. As you get back, you take the following image of a burger. It is a long time since you last had something that wasn’t an animal but remember that it is in the upper right corner of the floor! You go in again and accept the losses on the last two papers as you place them in the blank area.
Retrospective on Round 3: More Groups
You are not as disheartened this time as you got two correct and you knew there was a good chance that not everything would be something seen before. You agree that the new distribution should look like the following, hopeful that you will get even more correct this time:
You got the process down this time and can quickly go into each of your rooms again.
Round 4: Too much to remember
You go into your room again… It begins feeling a bit familiar here. How long has it even been? Days? Weeks? You look at your watch… 45 minutes… Okay, maybe not that long… You take a moment to admire how fast your friend is at making all those paintings.
But life must go on, so you take the first paper. You see a tree, you know this one, it was in the middle to the left. You go into the room and place the cube as you agreed upon. You leave the room, no longer taking time to look at the light as you go straight for the next paper. An image of a horse, right, that was the one in the middle.
Again you go into the room to place the cube on the ground. You feel proud as you stand in the middle of the room with your hands on your side, admiring the sensation of progress and excitement. Will you get more than two correct this time? You leave the main room again to see what will appear this time. An image of a zebra and a tiger, tough luck; guess you must change things again.
Retrospective on Round 4
You meet up with your friend again; sure, you have gotten two out of four correct. You look at the paintings that he has made and see an image of a tree and a cat while you nod to yourself. As expected, there is an image of a tree and a horse… Wait? A cat? Not a horse? Confused, you ask your friend why he has drawn a cat rather than a horse? He looks as confused as you and answers that you put the cube in the cat area! You talk about where the horse and cat areas are placed and find that your friend is indeed correct. You forgot the right place for the horse.
You could not even remember the place of nine different categories, and now you have 11…? You express your worries to your friend and agree this is not a variable strategy as more types are introduced. You look at the examples you have gotten so far and see that most are animals. Your friend gets an idea, what if we place things that look like each other closer together? You agree that it is a good idea as it would make it easier to remember where things are!
You make the lower part of the floor an area for animals. But it is not enough, so you place animals that look like each other closer together in sub-groups, like a zebra and a horse or a tiger and a cat. This will make it easier to remember where things are. You also realize that burgers are made in the kitchen and place them beside each other.
You are confident that you now have a much better chance of remembering all the different categories! The next round begins.
Round 5: Simplicity
You are greeted with the familiar scene of a small pile of papers in the middle of the room. The light from above illuminates the papers with a faint green glow from the door. You pick up the next paper, excited to see what this round may have of challenges. The first image is of a Bengal, a cat resembling a mini version of a tiger. It is a cat, but still… You know where the two areas are on the floor but are unsure if your friend would draw the right thing. You decide it is best to place it in the middle between the Tiger and Cat in the hope that your friend will understand that it is not just a cat but one that looks like a tiger.
One down, three to go! You make a mental fist bump before inspecting the remaining papers as you wait for your friend to finish inside the main room. You are surprised and a bit confused seeing what is on the remaining papers, not because of despair this time, but because you are relieved that this round is easier than the last four rounds. The remaining papers are of three dogs, one of an American Hairless Terrier, another of a Bearded Colli, and the last of a Beauceron. The light changes to green, and you enter the room to place the cube in the area reserved for dogs.
Retrospective on Round 5
After the fifth round, you meet up with your friend again to see how many paintings you got correct. Your friend shows the first image, a drawing of a lynx. Dammit! So close, but to be fair, a Bengal looks more like a jaguar than a tiger, and a lynx is more in the middle of the two… But at least your friend understood what you meant when you placed the cube between two areas! You are rushing your friend to show the remaining three paintings, excited to see if the remaining ones are right. And luckily, all three of them are images of dogs!
Quite satisfied, you call in the administrator to admire that you got 3/4 correct this time. Quit impressive, right? The administrator does nothing more than shake her head before pointing out that the paintings indeed look like dogs but not at all like the ones on the images. Dammit, she is right! The paintings are all of Labradors, one of the most common dog breeds, not the three breeds shown on the papers. She leaves again to give you more time before the last round starts.
Should you add all dog breeds to the floor? You already had a problem remembering where everything was before, so this seems a bit extensive… Both of you look at the papers and notice that the dogs are not just dogs. Each dog is different in size and hair length. Could you divide the dog area into smaller areas that define the dog’s hair length and height instead of making new areas for each dog breed?
It is a good idea as it keeps the number of areas to a minimum, but you realize there is a problem. You just learned that getting your friend to draw new things is possible by placing the cube between two areas; this happened with the lynx. The problem with the new idea is that it gets harder to guess whether the cube is between two areas because it combines the two or just because one area has really long hair. You choose to drop the idea for now…
After a while, you have yet to find a good solution or new ideas… ARHHHH!… You pick up the rectangular cube to fiddle with something as you think about how to solve the problem. It is still sticky, not the nicest feeling but better than having nothing in your hands. As you look closer at the cube, you realize that four lines go all the way around it, like it is put together by five small cubes to get the rectangular form. And now that you think about it, isn’t the cube more deformed than in the beginning? You call over your friend to inspect the cube more closely together. It turns out that what you thought was a single rectangular cube actually is five cubes held together by a screw that has come loose because of all the football you played with it! But each cube is still sticky…
Your friend is always curious, so it comes to no one’s surprise that he begins to play around with the cubes. Frankly, it would be best to have a break from all the thinking, so you sit back to observe your friend while he plays with the cubes. He tries to press two of them together to see if they are strong enough to stick together without the screw. He slowly removes one hand, excited to see if he succeeds and ready to catch one cube if they break apart. The construction hold. He nods, satisfied, and goes for the next part of his plan — to see if the two cubes can stick to the wall.
As your friend slowly removes his hand from the wall with the two cubes sticking to it, a stir runs through you. “I got it!” you shout to your friend, which jumps a bit from the shock and bumps into the cubes that break apart and stumble to the ground. Your friend looks a bit annoyed at you but curious about what idea you got. “If the floor isn’t enough, why not use the walls too!?”. Your friend asks you to explain in more detail. “Earlier, we talked about splitting the dog area into smaller parts for hair length and size but agreed that this wouldn’t be a good idea as it would not make it possible to make new things like the lynx. But what if we place one cube on the floor to tell that animal it is and another cube on one of the walls to indicate the animal’s hair length and size?” You both agree that this is a good approach and agree to use not only one wall but all the walls! You also decide to not have animals and other things on the floor anymore, but rather make the following reorganization:
- The Orange floor is split into continents to make it easy to indicate what geographic area the things are from. The cube will represent Norway if placed at the top of Europe or South Africa if placed at the bottom of Africa. You decide that the middle is reserved for when the thing belongs to no specific country.
- The Blue Wall decides how large the thing is and how long hair it has. They decide that the largest size is a planet, the medium size is an elephant, and the smallest is when it has no size. At the same time, the longest hair length is 2 meters long, the medium size is half a meter, and the smallest is no hair at all.
- The green wall follows the same concept: one direction determines how dominant circles are, and the other, how much stripes dominate. A dot in the middle may be an ellipse, a long circle that can be seen as a combination of circles and stripes.
- The Purple Wall decides how dangerous the thing is and how much it looks like an animal.
- The Yellow wall represents food and trees. The food defines how much we see things as something that should be eaten. A burger would be at the top of this scale as it can be eaten immediately, while a can of beans would be in the middle, as we need to get the beans out first. The lowest part would be something like a stone that (hopefully) no one would eat. The tree defines how much like a tree the thing is, with a flower close to the left, a bush in the middle, and a tree to the right.
Just as you finish deciding how to split the floor and walls, the administrator tells you that the sixth and last round is about to begin. You are (once again) ready and quite excited about your new tactic!
Final round — Round 6: Masters of Space
It is the final round, one more time, and then you are done (and hopefully winning)! You pick up the first paper, ready to nail this challenge. You see a cow on the first paper, easy. You go into the room to conquer the floor like you are 18 and back at the dance floor again. You place one cube in the middle of the floor to tell your friend it is found everywhere in the world and another cube around the middle to the left on the blue wall as it is a large animal with short hair. You take a look at the green wall, circles or stripes? Heck yeah, a bit of both, but primarily circles and not too many, placing that bad boy in the upper half a bit to the left. Dangerous? It is not totally harmless, but definitely not considered dangerous, and surely an animal: You place it in the lower right corner of the purple wall. Food? Many people eat cows, so you set the cube around the middle height as it is an animal but not a piece of meat. A cow looks nothing like a plant, so the cube is placed all the way to the left. Genius.
Just as fast as you finished the first paper, you rush through the following two, with one displaying a giraffe and the other displaying a sun. As you complete the third paper, you feel like you are beginning to get the hang of it. You take the last piece, ready to see the final challenge. As much confidence that filled your before, just as much disbelief is filling you now. There is not any image on the paper… There is not per se “nothing” on the paper… just… no image… What is on the paper, you may ask? Text… It says “Shiitake Mushroom.” You take some time to let the view sink in… You remember that no one told you that the drawing should look like the paper, just like what was on the paper. So… can you place the cubes just as before and get your friend to draw a shiitake mushroom? You think to yourself: “So even if I have text on the paper, my friend can still draw a mushroom? Let’s give it a try.” You try your luck and place the cubes like the paper displayed a shiitake mushroom. You put the cubes such that the one on the floor is near Japan. It is a small plant that can be eaten, so you place it at the top yellow wall around the middle. It is not an animal nor dangerous but is round and a have a stem, so you place it in the middle of the green wall and in the lower left corner of the purple wall. It is small and has no hair, so you place it to the left a bit up on the blue wall.
You leave the room for the last time. Exciting.
The goal of Generative AI is to generate things, just like your friend’s goal was “generating/making” paintings. But just like your friend can also write text, a Generative AI can generate anything we ask it to as long as it knows how (your friend may not be able to make music because he has yet to learn it). We often want AIs to be really good at small tasks rather than average at many things, so we usually confine it to just generating a single type of content, like images. But just like generalists and specialists have different roles at the workplace, specialized and general AIs can be used for different tasks and have each their strengths and weaknesses.
Generative AI can still generate a painting even when we do not tell it what to generate, as the friend did in the first round when you didn’t know how to communicate with each other. But it is often not practical to just generate random things, so you want a way to influence what is generated/painted. The problem is that you cannot directly tell the Generative AI what to generate, like you could not talk directly with your friend. Hence, you need to agree on another way to do it. The way you did it and the way that Generative AI does it is the same, you place a cube inside a room where different areas are reserved for different things. This is called a “Latent Space” for a Generative AI, which is just a fancy word for a special room where you and your friend cannot be simultaneously.
If you want to nail the challenge, you need your friend/Generative AI to be good at two things:
- Generate as many different things as possible
- Generate new things not seen before
This is where the problems begin to surface. It becomes harder and harder to remember where things are as more and more things are introduced. There are two ways to solve this:
- Place things that look like each other close together
- If the floor does not have enough room, use the walls as well
The first thing to do is to place things that look like each other closer together. This will improve the ability to generate both a diverse set of things and new things.
- It will be easier to generate many things because you don’t need to remember where everything is placed, just what things in different areas look like, and even if you do not draw the right thing, you will not be too far away as it will look like what is in that area.
- It will also be easier to generate new things as the focus is not on where things are but what they look like. This means that your friend will know that he should paint something with a bit of hair when you place the cube between something with no hair and long hair.
The second thing to do is to use not just the floor but the walls as well. In the story, you and your friend talked about how it was possible to place everything on the floor but that it would not be a good solution as it would ruin your option of painting things that have yet to be seen. You cannot paint things that are yet to be seen because you now need, let’s say, a place for dogs with long and no hair. If you add these to your dog area, the effect will be that when the cube is placed between a dog and a wolf, you do not know when it is a combination between the two or just a dog with long hair.
This is why it is crucial to use not only the floor but the walls as well. It allows you to generate more things because you can express different concepts on each wall, like what animal it looks like on the floor and the hair length and size on the wall. The more walls you have, the more things can be generated/painted, but it will also be harder to “just paint a dog” as you have many more options now. So the number of walls you use will depend on how much control you want.
The last piece of paper had text rather than an image written on it. Generative AI does not care about what is on the paper, only where the cubes are placed in the room. Generative AIs, like OpenAI’s Dall-e 2, create a painting from the text you give it. The image at the beginning of the blog post was created by giving it the text “Two people standing in a bright wide white room with a door in the middle.”
This concludes “The Little Story about Generative AI: The Drawing Challenge,” a story about two friends and their path to communicate without talking together — only using a room and a couple of sticky cubes.
Thanks for reading; I hope you enjoyed the story and now better understand what Generative AI is and how it works. Check My profile for more blog posts and comment if you have questions, thoughts, or ideas for future blog posts.
I’m currently writing multiple blog posts that will be released this year, so subscribe if you want to get a notification when new ones are released!
If you enjoyed this book and are interested in new insights into machine learning and data science, sign up for a Medium Membership for full access to my content. Follow me to receive an e-mail when I publish a new chapter or post.