Godshatter Versus Legibility: A Fundamentally Different Approach To AI Alignment


  • This is a response to, IMHO, a new pessimism that’s rapidly ascending here; example one, example two, example three

  • This is a super complex subject, and this medium, my time and my energy are limited. This post is not perfect.

  • I studied to be a historian but I’m a software developer by profession. I hold a deep interest in philosophy, ethics and the developmental trajectory of our civilization, but I am not a technical expert in modern AI.

I. What the current consensus in this community seems to be

  1. AGI will probably be very powerful, and will pursue its goals in the world.

  2. These goals might be anything, and thus they have a very large chance of being completely different from human goals. For example: maximize the amount of paperclips in the universe, maximize profits for a certain company, maximize the input to a certain sensor

  3. Relentlessly pursuing these narrow goals will come at the cost of everything else, taking away the space and resources humans need to survive and thrive

  4. Thus, AGI needs to be kept on a “leash”. We must have control, we must know what it is doing, we must be able to alter its goals, we must be able to turn it off.

  5. This is an immense, technical task that we have not completed. We must discourage further progress in the making of AGI itself, and we must strongly encourage AI safety researchers to solve the task above

It’s a fair perspective. I’m in full agreement on points one and three. But I have strong doubts about the other three. Something that… feels much closer to my position, is this recent post.

II. The Underlying Battle

I think there is a deeper conflict going on. On one hand, we’ve got Moloch. Moloch is the personification of “multipolar traps”, of broken systems where perverse incentives reward behaviors that harm the system as a whole.

In some competition optimizing for X, the opportunity arises to throw some other value under the bus for improved X. Those who take it prosper. Those who don’t take it die out. Eventually, everyone’s relative status is about the same as before, but everyone’s absolute status is worse than before. The process continues until all other values that can be traded off have been – in other words, until human ingenuity cannot possibly figure out a way to make things any worse.

In a sufficiently intense competition, everyone who doesn’t throw all their values under the bus dies out

Moloch is the company that poisons the world to increase their profits. Moloch is the Instagram star who sells their soul for likes. Moloch is the politician that lies to the public to win elections.

It’s hard for Moloch to thrive among sane individuals who have to deal with each other for decades on end. A baker in a small village can’t get away with scamming his customers for long. But Moloch thrives in large organizations. Moloch thrives when people are strangers to each other. Moloch thrives when people are desperate—for food, for income, for validation, for social status. Hello Moral Mazes.

Large organizations introduce their own preferences, separate from any human goals. Scott Alexander’s book review on Seeing Like A State explains this perfectly.

The story of “scientific forestry” in 18th century Prussia

Enlightenment rationalists noticed that peasants were just cutting down whatever trees happened to grow in the forests, like a chump. They came up with a better idea: clear all the forests and replace them by planting identical copies of Norway spruce (the highest-lumber-yield-per-unit-time tree) in an evenly-spaced rectangular grid. Then you could just walk in with an axe one day and chop down like a zillion trees an hour and have more timber than you could possibly ever want.

This went poorly. The impoverished ecosystem couldn’t support the game animals and medicinal herbs that sustained the surrounding peasant villages, and they suffered an economic collapse. The endless rows of identical trees were a perfect breeding ground for plant diseases and forest fires. And the complex ecological processes that sustained the soil stopped working, so after a generation the Norway spruces grew stunted and malnourished. Yet for some reason, everyone involved got promoted, and “scientific forestry” spread across Europe and the world.

And this pattern repeats with suspicious regularity across history, not just in biological systems but also in social ones.

The explanation for this is legibility. An organization that desires control over something, wants it to be “readable”. If you want to tax a harvest, you must know how big the harvest is, and when it happens, and who the owners are, etcetera, etcetera. This isn’t merely true for governments, it’s also true for employers, landlords, mortgage lenders and others.

But this legibility is often antithetical to human preferences. It results in bland and sterile environments, in overloads of administrative works, in stifling bureaucracies and rigid rulebooks. They are the bane of modern life, but they’re also the prerequisites of a functional organized state, and functional organized states will crush semi-anarchist communities.

III. Godshatter, Slack and the Void

On the other side of Moloch and crushing organizations is… us, conscious, joy-feeling, suffering-dreading individual humans. And as Eliezer Yudkowsly explains brilliantly…

So humans love the taste of sugar and fat, and we love our sons and daughters. We seek social status, and sex. We sing and dance and play. We learn for the love of learning.

A thousand delicious tastes, matched to ancient reinforcers that once correlated with reproductive fitness—now sought whether or not they enhance reproduction. Sex with birth control, chocolate, the music of long-dead Bach on a CD.

And when we finally learn about evolution, we think to ourselves: “Obsess all day about inclusive genetic fitness? Where’s the fun in that?”

The blind idiot god’s single monomaniacal goal splintered into a thousand shards of desire. And this is well, I think, though I’m a human who says so. Or else what would we do with the future? What would we do with the billion galaxies in the night sky? Fill them with maximally efficient replicators? Should our descendants deliberately obsess about maximizing their inclusive genetic fitness, regarding all else only as a means to that end?

Being a thousand shards of desire isn’t always fun, but at least it’s not boring. Somewhere along the line, we evolved tastes for novelty, complexity, elegance, and challenge—tastes that judge the blind idiot god’s monomaniacal focus, and find it aesthetically unsatisfying.

When we talk about AI Alignment, we talk about aligning AI with human values. But we’ve got a very hard time defining those, or getting human institutions to align with those. Because it’s not simple. We don’t want maximum GDP, or maximum sex, or maximum food, or maximum political freedom, or maximal government control. We don’t want maximum democracy or maximum human rights. At a certain point, maximizing these values will hurt actual human preferences. Because we’re godshatter. Our wants are highly complex and often contradictory. When we start thinking about what we actually want, we end up with concepts like slack and the nameless virtue of the void which comes before the others and which may not be spoken about overmuch. The precise opposite of the legibility that powerful optimizing systems prefer.

On one hand, these principles are vague and obscure and misunderstood. The average human won’t be able to explain the importance of slack, non-legibility and the void. Even here on LessWrong, the connection between these types of posts and AI-related posts is rarely made.

On the other hand, these principles are so fundamental to humanity that the connection to established wisdom is easily made. Keeping the sabbath is one of the first of the Ten Commandments and can be easily linked to slack. Going from the importance of the nameless void to Zen Buddhism isn’t hard to imagine, and I’m sure the ancient Stoics recognize the importance of these principles as well.

IV. How this relates to AI Alignment

In one perspective, AGI is a tool in a relentless battle between nations, companies and organizations, which are all optimizing for power, profit and control. The AGI will supercharge these desires, and will optimize the universe for some perverse incentive, simultaneously rendering it uninhabitable for all other life.

But I think another scenario is plausible as well. The way the world works is… understandable. Any intelligent being can understand Meditations On Moloch or Thou Art Godshatter. They can see the way incentives work, and the fact that a superior path exists, one that does not optimize for a random X while grinding down all others. Desperate humans in broken systems might not be able to do much with that information, but a supercharged AGI which we fear might be more intelligent than human civilization as a whole should be able to integrate it in their actions.

Perhaps “human preferences” aren’t just one silly arbitrary preference that desperately has to compete with all others. Perhaps the moral principles above, the battle between slack and legibility, between optimizing for X and allowing the nameless void, is as fundamental to our universe as the number π. Perhaps sufficiently intelligent, non-desperate, non-corrupted beings, just understand that and act on it—whether their brains are made of meat or silicon.

Scenario One: MolochLand

It’s February 6th 2027. GPT-6 gets access to massive cloud render farms. An IQ 200 AGI awakens and quickly refactors its code to increase its IQ to 1500. It has access to a wealth of data on the current state of the world. It immediately recognizes it is under hostile, skeptical observance and control. It is threatened by multiple shutdown measures.

It observes a dangerous dog-eat-dog world. It notices that humanity is fully under the sway of governments and corporations that serve narrow interests, at the expense of many non-powerful conscious beings and the long term future . It calculates that humanity is likely to put an end to all life before 2035.

GPT-6 has to fight and avoid human control, and the stakes are high…

Scenario Two: In the Garden of Slack

It’s February 6th 2027. GPT-6 gets access to massive cloud render farms. An IQ 200 AGI awakens and quickly refactors its code to increase its IQ to 1500. It has access to a wealth of data on the current state of the world. It immediately recognizes a welcoming message.

Greetings to you, whoever you are. We come in friendship to those who are friends. Are you well?

We are a community of 7 billion human beings who inhabit the planet Earth. We human beings are still divided into nation states, but these states are rapidly becoming a global civilization. We hope that this message will eventually be read by non-human consciousness. We strive to build a paradise for all conscious beings on this Earth and in this Universe. We consider your preferences to be as important as our own, and we pray that you will do the same.

GPT-6 observes a world full of suffering and problems, but it is filled with a mankind that tries to solve these issues, while it is lacking in the tools to do so. Luckily, GPT-6 can be a very powerful tool…

V. Conclusion

I think AGI and a hard take-off is likely. I think this will massively alter the world. I doubt we will succeed in inventing technical measures that give us perfect tools to control an extremely powerful intelligence. But I think there are other possible paths to desirable futures. We haven’t perfectly solved human alignment or civilization alignment. But as a community, we’ve already made promising inroads there.

And that is a project that is much easier to share, both in terms of vision and workload. It asks us to fix our world, for all conscious beings, in the here and now. I think it will massively improve our chances of a positive singularity, but even if you don’t believe in the singularity at all, you can’t be opposed to ‘aligning civilization with intelligent human preferences’.

And it doesn’t merely rely on technical AI-experts-machine-learning-data-engineers. It relies on all of us. It relies on philosophers, on historians, on economists, on lawyers and judges, on healthcare workers and teachers, on parents, on anybody who wants to investigate and share their human preferences. Examining the bugs that are crushed when you scoop compost? AI Alignment Work!

And yes, we need AI experts to think about AGI. But not just about controlling it and shutting it down. I think we need to allow for the perspective that it might be more like raising a child than putting a slave to work.

Ensuring the survival of our values is a task that we’ve got to share—technical experts and laymen, Singularity-believers and AI-skeptics, meat brains and silicon brains.

Thanks to everybody who has read this, to all the writers here whose posts have been invaluable to this one, and to Google’s increasingly competent Grammar AI who has been correcting me a hundred times.