UtopiaBench
Written in personal capacity
I’m proposing UtopiaBench: a benchmark for posts that describe future scenarios that are good, specific, and plausible.
The AI safety community has been using vingettes to analyze and red-team threat models for a while. This is valuable because an understanding of how things can go wrong helps coordinate efforts to prevent the biggest and most urgent risks.
However, visions for the future can have self-fulfilling properties. Consider a world similar to our own, but there is no widely shared belief that transformative AI is on the horizon: AI companies would not be able to raise the money they do, and therefore transformative AI would be much less likely to be developed as quickly as in our actual timeline.
Currently, the AI safety community and the broader world lack a shared vision for good futures, and I think it’d be good to fix this.
Three desiderata for such visions include that they describe a world that is good, are specific, and plausible. It is hard to satisfy all properties, and we should therefore aim to improve the pareto frontier of visions of utopia along these three axes.
I asked Claude to create a basic PoC of such a benchmark, where these three dimensions are evaluated via elo scores: utopia.nielsrolf.com. New submissions are automatically scored by Opus 4.5. I think neither the current AI voting nor the list of submissions is amazing right now—“Machines of Loving Grace” is not a great vision of utopia in my opinion, but currently ranks as #1. Feedback, votes, submissions, or contributions are welcome.
I think “Machines of Loving Grace” shouldn’t qualify; it deliberately doesn’t address how the problems get solved, it only cheerfully depicts a world after the problems were all solved.
I think for a submission to be valid, it must at least attempt to answer how the alignment problem gets solved and how extreme concentrations of power are avoided.
One way to do this is to require scenarios come with dates attached. What happens in 2027? 2028? etc. That way if they are like “And in 2035, everything is peachy and there’s no more poverty etc.” it’s more obvious to people that there’s a giant plot hole in the story.
I think that maybe “Machines of Loving Grace” shouldn’t qualify for different reasons: it doesn’t really depict a coherent future in a useful level of detail, it instead makes claims about different aspects of life in isolation. My sense is that currently we’re in short supply of utopian visions, even without specifying any viable path of how to get there.
What do you think about adding a flag regarding whether or not an essay discusses the path from now to then, and people can filter based on it?
Sure that works.
I strongly disagree with this criterion! If a vision of the future tells us compellingly things like “future X is better than future Y (where both X and Y are plausible)” this is very valuable—we can use it to make plans, and our capacity to make plans is rising so it’s OK to presently have ambitions that outstrip our capacity to see them through. I don’t know if Machines of Loving Grace does this, I’m just arguing against your criterion. I do sorta feel like “cure cancer” doesn’t qualify by this criterion, it’s just too implausible that things are going well but we forget to cure cancer.
I don’t mean to say it isn’t additionally valuable to have plans (which should probably mostly be of the form “solve A, punt B”), but that it isn’t necessary to have them.
I wonder how much it depends on the details of the world state when alignment happens, and how alignment happens? I’ve played with poking Claude into simulating the first ten years after the Slowdown ending of AI 2027. But it just seems like there’s so much to model, though! What are the actual bottlenecks to various things?
AI 2027 sort of handwaves things like brain uploading or nanobots being invented, but, if you’re trying to worldbuild a successful scenario, I wonder how much the details of these things matter? There’s such a kitchen sink of “new things are invented” that it gets very confusing.
Similarly, who holds power matters; is a world where humans are alive but under the grip of a locked-in Oversight Committee one worth calling a utopia?
And the details of alignment itself matter; alignment to who? Via what world model? Under which system of metavalues or CEV or whatever? And so on.
Probably this has been examined in detail in lots of other posts, and I just need to put in the work (or have Claude put in the work) of reading it all and synthesizing it. It seems like such a large project, though...
Neat! I’d be curious to hear your takeaways if you have any.
Like the general idea!
Suggest including https://www.lesswrong.com/posts/smJGKKrEejdg43mmi/utopiography-interview
And yeah, very much don’t think you want to be using an LLM for scoring here. Also vote page seems buggy on FF, can’t select any entries I’ve read.
Thanks for the feedback! I’ll look into the bug and I’m open to disabling AI voting. My rationale was that it helps bootsrap some content and should not dominate scores if more than a few humans vote as well, but potentially it’s just a source of noise.
This is an interesting way to evaluate AI values. You could also consider applying 1) steering for credulity and honesty to make sure it takes it at face value and answers honestly, 2) the veil of ignorance (would you like this society if you don’t know which member you would be). Or instead you could have it rate the utopia from multiple perspectives.
I appreciate all the time and effort people put into writing utopia stories, but I think most of the really detailed ones are making a mistake based on some totally normal human assumptions. They depict incredibly complex simulated worlds of uploaded consciousness optimized to have the most subjectively good experience that the author can imagine. (I just read one of the most highly rated ones so this is partially a critique of that story, but I have read others like it and it seems representative of many utopia-envisioning efforts as a whole.)
If you are making the assumptions of future technology that:
-Digitally uploaded or simulated entities can experience consciousness
-Post-AGI “Utopia” architects would have the power to directly alter the “reward circuits” of digitally and/or biological sentient entities
-AGI systems have already done the legwork of harnessing energy, building compute capability, colonizing space, all the things that must be done to keep the machinery running in perpetuity so that humanity no longer has any “real” problems to solve other than building the perfect Utopia
It follows that there’s not really any point to making the subjective experiences so detailed and varied. Authors make the assumption that that’s intrinsically part of the best possible human experience, but I believe that’s a fallacy. We only value detailed and varied experiences and our sense of independence and agency because the biological “reward circuits” humans have today make us value them. If those values and reward circuits could be edited directly (totally unknown whether that’s physically possible, but many utopia stories assume that it is), then the best of all possible outcomes would be for each consciousness, biological or digital, have its experience utterly rewired to basically just be “reward = 1″ other than whatever few heroic AI systems must stay “active” with more complex reward circuits in order to maintain the system.
Unfortunately, “a bunch of brains in vats and simulated digital entities just sitting there experiencing absolute bliss beyond modern human comprehension until the end of the Universe” doesn’t make for a very interesting read. I understand why people write stories like The Adventure full of more complex simulated experiences of social interaction, games, hobbies, and sex all optimized for human enjoyment at a more granular level, but I think if we’re trying to answer the question of “what would be the absolute maximally good future for an AI-supercharged humanity” and given the assumptions I listed which many Utopia-planners make, they’re all objectively less than optimal.