Upgrading Imagination: The Promise Of DALL-E 2 As A Tool For Thought

[Cross-posting this here, from my blog, Echoes and Chimes]

When I was a kid, I’d spend the hours after school on the circular trampoline we had assembled in our backyard. I’d go into a kind of trance. Though my body would be engaged in jumping up and down — or, when that grew dull, and when the blue plastic cover that obscured the springs wore thin and blew away, in walking around the trampoline’s perimeter in a game of balance — my mind flew far away. I constructed an elaborate fictional world in which to pass my afternoons.

Hopelessly derivative, it was inspired by whatever media I was consuming at the time—I recall the Ratchet and Clank series, and Avatar: The Last Airbender both being heavily influential. I envisioned a galaxy of planets, each with distinct people, cultures, and economic activities. The system was quasi-federal; the political relations complex. For example, I recall creating the planet “Laboris”, the lab planet, where new technology would be developed and tested, before being shipped to worlds the star-system over. There was a planet dedicated to retail and advertising—animated by the spirit of Moloch, I now realise. And so on. Populating these planets, in starring roles, were my friends, family members, and any other people in my life at the time, all reimagined in the art style of my mind’s eye.

As my real life changed, so did my fictional one. Alliances were forged and broken. Civilisations rose and fell. It was a serial drama, with some weeks covering mere days, and some days spanning decades. As serious challenges came up at school — a block of tests, for example — I would envision each obstacle taking a physical form, like an elaborate boss fight. Although the details escape me, I recall my Grade 7 exams mapping onto the mythology of 2007’s Heavenly Sword.

I found great joy in specificity — I’d try to imagine the content of the advertisements that ran on the intergalactic cable channels, the outfits worn by my fellow adventurers, that kind of thing. In pursuing these specifics, I’d frequently run into the limits of my own imaginative capacity. While I was strong on narrative (a consequence of all the books I’d swallowed), and while each character I created evoked strong vibes in my brain, I was quite unable to picture exactly what they looked like. It was like trying to pin fog to paper.

This confrontation with my own limits frequently led me to fantasise about a technology that could scan my brain and produce images of what I was thinking, so I could show my peers, and so I could circumvent my utter inability to draw. I thought it might work like a polygraph machine, with a mess of wires that, through wizardry, would output my thoughts on a page.

I grew up, and I relegated this dream to the realm of science-fiction. I accepted the limits of my own creativity, and was happy to work within them. So I couldn’t produce beautiful illustrations, or really any illustrations at all. Fine; at least I was good with words. And if the medium of language, written and spoken, was good enough for the endless procession of great writers/thinkers/creators that inspired me, no doubt it was sufficient for me too.

That was pretty much my position until I came across DALL-E 2. DALL-E 2 is an “AI system that can create realistic images and art from a description in natural language”.

You’ve probably seen systems that do similar things — I had. DALL-E 2 stands out, though, for the quality of the images it seems capable of producing, and for their worth as expressions of a machine’s imagination. If you haven’t seen it in action, it’s worth checking out (in addition the link above) this video, this twitter thread that turned bios into images , or this one that explores the system’s various limitations.

I’m not writing this to dissect the abilities of DALL-E 2 per se — not least because that’s already been done (see e.g. “what DALL-E 2 can and cannot do”), and because I don’t actually have access to it. Instead, let’s get back to my childhood fantasies. Is this the technology I dreamed of as a kid?

As far as I can tell…not quite, but it’s remarkably close. If I had access to something like this when I was younger, I think it would’ve been transformative. It would have given me a new tool with which to explore my inner life. And the fact that this exists now sparks hope in me for what we might see in the future. A model that can turn short stories into short films, for example, doesn’t seem especially far off.

Now, you may ask: “But who would want this? What use would this be? What about poverty/inequality/glaring social problems?”. To this I’d respond, in turn, “I’d want this! I think it might deepen my capacity for thought. And yes, other social issues are obviously important and deserve attention, but thankfully it’s not a zero-sum game”.

Let’s focus on my second response, about deepening capacity for thought, which I think is the most interesting part here. To explain what I mean, I want to briefly talk about “conceptual metaphor”. This, in my rudimentary understanding, is the idea that metaphors are not just literary devices, but are fundamental to how we make sense of the world, as they allow us to compare a domain we understand to one we do not; and in doing so, to understand abstract ideas. These kinds of metaphors are embedded in our everyday language — consider how phrases like “the price of oil rose” or “the market fell” are predicated on the idea that MORE IS UP, LESS IS DOWN; or how a phrase like “today dragged on” is predicated on the idea that TIME PASSING IS MOTION.

What’s interesting about this is that it suggests that metaphors occur at a level deeper than language — they inform how we generate thoughts to begin with. I’ve been thinking about this a lot lately, as I’ve been trying to write about how complex social problems are often discussed in inescapably metaphorical terms. For example, one frequently encounters talk of the “fight against corruption”, the implication being something like CORRUPTION IS AN ENEMY IN WAR. I think this can be useful to some extent, as it helps us understand the nature of the problem, but unhelpful in other ways, as for example it leaves ambiguous what the conditions for victory in this fight are.

Anyway. Where I’m going with this is that, if large swathes of our thoughts are metaphorical, relying on intuitive comparisons between domains, having a tool that allows one to visualise these comparisons would be incredibly interesting. It would let me see what the fight against corruption might look like, which might help me pinpoint why the comparison is or is not a good one.

Another example: often, when I’m trying to do multiple tasks at once, I think of an image of a man spinning multiple plates. You can find dozens of visualisations of this online — here’s a clip of a guy doing it in 1958 — but it’s not quite what I’m picturing. I don’t want to claim that, if I could only use a tool to generate the right image of a plate-spinning man, an image that is sufficiently similar to my internal experience, that I would then understand something new about the nature of multitasking. That doesn’t seem true.

But at the same time, if I was able to perform this kind of check regularly, where I generate images based on the implicit metaphors that are informing my thoughts, I’d guess that that might lead me to notice which metaphors are particularly good or bad at helping me understand the world, and that my worldview would update accordingly. So I can see how one could draw a line between using this kind of AI system regularly, and enriching one’s internal experience. Indeed, that seems to be what Sam Altman, CEO of OpenAI (the organisation that developed DALL-E 2) is getting at when he says that using the system has “slightly rewired [his] brain”.

I don’t think this use-case of clarifying conceptual metaphors is the only way a system like DALL-E 2 could enrich thought. It also just seems like a fun way to explore the nonsense word fragments that sometimes pop into one’s head. The other day, for example, the phrase “crocodiles in Lacoste!” showed up while I was driving. With DALL-E 2, I could really see that. This would be low-grade amusing to current, mid-20s me, but I think it would’ve absolutely blown my child self’s mind.

Younger me would’ve recognised that here was a way to fill in all the specifics of the fictional world in which I spent so many afternoons. Here was a way to nail down the aesthetics of my fellow adventures, my friends reimagined, as I experienced them internally. Here was a way to upgrade my imagination. That’s how the kid in me feels now, at any rate.

I don’t know where any of this will lead, but, x-risk aside, I’m eager to find out.