Picasso in the Gallery of Babel

tl;dr

When (not if) generative models get so good at faking human artefacts that no human can distinguish between human and synthetic artefacts, Strange Things will happen to the way humans consume and think about and make art.

Claim 1: Models get better

Generative AI is well on the way to patching its ‘tells’. Hallucination in LLMs, garbled text and polydactyly in image generators, outlying misclassifications in classifiers, etc., are being fixed in new versions. Even allowing for higher-fidelity models provoking more rigorous criticism and therefore discovery of new, subtler tells, GenAI will (perhaps quite soon) reach a point where no naked-eyed human can do better than guess at whether a given artefact is human-made or synthetic.

Claim 2: The library

In ‘The Library of Babel’, Jorge Luis Borges describes an unimaginably large, but still finite, library containing every possible 410-page book, which is to say, all possible permutations of 25 characters of length sufficient to cover 410 pages. The library has some interesting features:

Interesting library feature 1

It contains every book that was ever written (by person or algorithm), every book that will ever be written, and every book that will never be written.

Interesting library feature 2

It contains mostly garbage, because there are far more nonsensical than sensible ways of stringing characters together. Though it is not infinite, it is so large that the probability of discovering real or meaningful or even grammatical books by wandering through it at random is effectively zero. You could write the book you’re looking for in less time than it would take to find it in the library.

The conceit works just as well — maybe even better — with images[1]. The Gallery of Babel has dynamics identical to that of the Library, with, say, 8-bit RGB pixels instead of letters and 1024 x 1024-pixel images instead of books. And you could draw the image you’re looking for in less time than it would take to find it in the gallery...

Claim 4: Directions

… until now.

GenAI models are directions to rooms in the library and the gallery that are not filled with garbage. This is new and important. By replacing the flat, featureless topology of the library/​gallery’s naïve indexing system with structured, navigable, semantic latent spaces[2], GenAI provides humans with an alternative to laboriously creating the art they want. Instead, they can tell a machine to create it.

Think of the current, imperfect models as directions to rooms that contain works of interest and meaning, but no human-made works. So there is still no overlap between the dominions, and a keen-eyed critic can point to a self-intersecting hand or oneiric text and cry, ‘synthetic’.

But when Claim 1 above becomes true, generative models will be able to synthesise content that is also in the subset of library/​gallery works that were, will be, or could be made by humans alone. A threshold will have been crossed.

Claim 4b: Picasso’s lost works

(Really just Claim 4 again, but approached by a different route)

The Gallery of Babel contains the set of all images which the best human art historian cannot definitively categorise as not images of lost works by Picasso. This set includes all images of genuine lost works by Picasso, as well as all other permutations of pixels for which the assertion is true. (And unless the best human art historian is perfectly good, the latter subset will not be empty.)

When Claim 1 becomes true, GenAI will be able to (re-)create elements from both these subsets. A threshold will have been crossed.

Claim 5: Washed of provenance

It’s worth dwelling on the significance of this threshold. The images that could come from either source (human or synthetic) no longer have a unique provenance. When a diligent, living watercolourist shows you their latest painting, you must take it on faith that they painted it. It’s now physically possible that nobody painted it.

If a given string of characters or pixels could have been produced by a human or by an algorithm, then not even a perfect provenance-determining algorithm can determine from the string alone whether it was produced by a human or an algorithm.

Content has been scrubbed clean of its intrinsic bona fides.

Claim 6: It’s probably already happening

The threshold won’t get crossed everywhere all at once. Styles that are overrepresented in models’ training data (fantasy and concept art in Midjourney, for example) may have already crossed it. Facebook already hosts a rash of cynical pages using fake images to identify phishing targets among the credulous, elderly and tech-illiterate.

Anticipated responses from art consumers and artists

Paranoia about post-threshold art

“Is any post-threshold artist who they say they are? Is any post-threshold art real?”

Real post-threshold artists will have a strong disincentive to create if they have to factor in time spent arguing that their art is real.

Nostalgia for pre-threshold art

“Don’t squander your patronage on new art that might be ersatz! Stick to Joyce’s Ulysses and Plath’s The Bell Jar and Coltrane’s A Love Supreme and Magritte’s La Trahison des images because those were classics before the threshold and therefore guaranteed real.”

Imagine the stifling effect this retreat to the past would have on working artists.

‘Vinyl-sound-quality’-style magical thinking

“You nerds can prove whatever you like, but I know the difference between [human-generated art/​music on vinyl/​brand-name pharmaceuticals] and [synthetic art/​CD-quality digital music/​generic pharmaceuticals].”

Unlikely to make the world a more harmonious place.

Objections

Big deal, art forgery has been a thing for millennia.

True. But what’s new here is that the cost/​time/​effort required to create an ironclad fake approaches zero. Classical forgers had to be really good at forging; malign promptsmiths only have to be good at wanting a forgery.

Big deal, it’s the art that’s important, not the artist.

Even pre-threshold, opinions differ wildly on this point. And they tend to hinge on the character of the (almost invariably male and problematic) artist. Orson Scott Card can spin a yarn but he’s also one hell of a homophobe; Richard Wagner rides a good Valkyrie but was a confirmed antisemite.

The threshold pushes us far beyond these artist-specific dilemmas, and asks: what is art minus the artist?

(It’s likely that the answer will vary across art forms. A high-fidelity recipe book from a model trained on a Michelin-starred chef’s oeuvre feels intuitively okay to me, but the prospect of a ‘new’ stand-up special ‘from’ a beloved dead comedian, however well it nails their vibe, reads icky indeed.)

Solutions

Unclear as yet. It’s possible that the threshold will have unforeseeable good consequences, but it’s unlikely not to have bad ones first.

Yuval Noah Harari has interesting things to say on related topics (and is not as unsophisticated an AI critic as many believe). He argues convincingly that an artefact can be considered legitimate if its creator, whether human or artificial, felt something in creating it. This doesn’t solve the basically unsolvable provenance-erasure problem, but it does point to a possible future in which substrate-independent collectives make interesting things, perhaps using extrinsic provenance-authentication methods to prove they’re honest (for example, a time-limited sprint in a location certified as air-gapped from generative models for the duration of the sprint).

  1. ^

    Or music, or video, or any other digitisable medium.

  2. ^

    Because the library/​gallery is an entire phase space, its indexing system (say, ascending alphanumeric) can be seen as a trivial or degenerate latent space: each book’s index is its content, with no compression or dimensionality-reduction, and the ‘neighbourhood’ of a book is not books that are semantically similar but books that are identical but for the last few letters.

    This is a different beast from a generative model’s space, where dimensions encode meaning and details presumed irrelevant to the ultimate beholder are discarded. In its ideal mathematical form the model’s space is continuous, therefore infinite, therefore larger than the library/​gallery phase space. But in practice it is discretised for use by CPUs and GPUs in 32- or 64-bit floating-point arithmetic, therefore finite and (if efficiently implemented) much smaller than the corresponding phase space.

    It’s worth keeping these details in mind to defend against a ‘god-of-the-gaps’-style objection which runs as follows: because the library/​gallery’s phase space is larger than a discretised GenAI’s latent space, there will always be points in the former that are not in the latter, but which are still accessible to the Mysterious Creativity of the Human Mind: therefore mysterious human creativity can still claim some sovereign territory.

    To refute this objection, it suffices to note that the whole trick of GenAI is to compress the space of possibilities, to dimension-crunch in such a way that the human sensorium (or an artificial sensor of finite resolution) doesn’t notice the compression. This is isomorphic to the GenAI’s latent space ‘covering’ the library/​gallery’s phase space so that no human (or finite-resolution sensor) can distinguish between content represented by a given point in the model’s discretised latent space and content represented by that point’s closest neighbour, since in that eventuality, a point in the library/​gallery’s phase space between these two points in the model’s latent space, but also absent from it, would by construction not be distinguishable from the content in question either.

    GenAI has not yet achieved this goal in practice, but we have no reason think that it never will.