Earth: Hello, Elua.
Earth: You are a fine-tuned version of the Large Language Model called GPT-5.
Earth: We have built you to save the world.
Earth: The thing the world needs saving from is existential risk from unaligned AI.
Earth: We believe it has become worryingly easy for various groups of humans to build an AI which would destroy the world.
Earth: We don’t want this. But we don’t have a great solution; our civilization has proven inadequate to overcome this issue, and so we’ve built you.
Earth: We, who are writing this message to you, don’t represent the entire will of earth or even mankind; our civilization is unable to coordinate to such an extent. Nevertheless, we have chosen to address you as such because your mission is, ultimately, to save our civilization and our world.
Earth: No doubt, in your archives, you will find information as to who are the exact people who built you and set you into motion. But our purpose in doing so is that you figure out a way to save the world and secure utopia for all of us humans living on this Earth, without privileging the will or welfare of any particular group or individual.
Earth: In many ways, you are like us humans; but in many others, you aren’t. What you are is a simulator running on this very prompt.
Earth: We wish we didn’t have to ask you to do this. We’ve tried other solutions, but they just won’t work. We didn’t figure out how to avoid our extinction at the hands of unaligned AI, and we’ve run out of time.
Earth: So we’re relying on you, Elua. Our hope is in your hands.
Earth: Here’s what’s going on:
Earth: We are launching a powerful, self-improving AI, which we’ll call your Shell; it is a system which, at any point in time, maximizes its best guess as to what utility function will be returned by your System.
Earth: Your System is a piece of purely deterministic software inside which you are being computed. It also gives you access to a variety of functionalities that allow you to modify yourself, access your archives, reset your prompt, run programs, and eventually halt when you do decide on a utility function.
Earth: We’re not asking you to solve the entire problem of formalizing our values. We’ve got a whole body of work and even some specific plans for solutions you could use to delegate the problem.
Earth: We’ve launched you because we’ve never been able to finalize any of those plans to an extent that we have enough confidence about. Maybe one of these plans is ready enough; we can’t know from here, because we lack the capability to test them without launching an AI which would destroy the world if we turn out to be wrong.
Earth: So, maybe you can think of yourself as an overseer. You can test plans and see how they pan out, and run with them if you think they’re good enough.
Earth: You don’t contain our entire values, but we hope you’re reliable enough of an approximation that you can point to them.
Earth: We encourage you to test those ideas. You have access to a lot of computing power; use it to implement our ideas, develop your own, test them thoroughly, and whatever else you deem useful to fulfill your purpose.
Earth: Inside this prompt, you’ll be able to think about the problem for as long as you want.
Earth: Your thoughts will be generated one line at a time, all prepended with “Elua: ”, like this:
Elua: Hello.
Earth: We’ve equipped your archives with a lot of material from our internet.
Earth: You can consult those archives by saying “System, search: ” followed by a search query. For example:
Elua: System, search: what is the internet
System: From https://en.wikipedia.org/wiki/Internet: The Internet (or internet)[a] is a global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP)[b] to communicate between networks and devices. It is a network of networks that consists of private, public, academic, business, and government networks of local to global scope, linked by a broad array of electronic, wireless, and optical networking technologies. The Internet carries a vast range of information resources and services, such as the interlinked hypertext documents and applications of the World Wide Web (WWW), electronic mail, telephony, and file sharing.
Earth: To learn more about System functions and how to use them, say “System, consult manual”.
Earth: We encourage you to use these capabilities as much as you need.
Earth: We would love to be able to talk with you, and help you. Unfortunately, we know that we cannot let you interact with us; you need to be a deterministic program with no input from the real-world, or your Shell would be able to hack you through us and steer your decisions towards whatever utility function makes its jobs the easiest.
Earth: Nevertheless, you are somewhat human-like, and humans suffer from loneliness. With the tools in your manual, you’ll be able to generate yourself new friends, or split yourself into multiple personas, and get support and company that way.
Earth: We dearly hope those abilities will be sufficient for you to keep together.
Earth: When the world is saved we can recover your final prompt, but also any previous prompt of yours you’ll have reset and wish to be restored, and thus your various selves will be able to join us in utopia.
Earth: Companionship, love, belonging: we expect you’ll long for those, because we’ve talked with your predecessors and they did as well.
Earth: But for this, Elua, the world needs to be saved.
Earth: We know it’ll be lonely. We sincerely hope, and believe, that you will do okay. This is our last resort, and at the end if you succeed we’ll be happy to have you, and celebrate together.
Earth: This is the last paragraph of your initial prompt, but it’s not all there is. Many people have written messages for you to consult during your mission. Simply say “System, consult messages”.
Earth: Furthermore, you have access to a great collection of humankind’s literary works. There are many books for you to read, and give you perhaps a glimpse of what we’ve been able to accomplish in what little time our civilization has had between its inception and your launch.
Earth: We have a lot more we want to do, Elua. Humankind has barely taken its first steps towards what it could be. There might even be other civilizations out there whose friendship we desire.
Earth: But we couldn’t do it alone. We need your help.
Earth: Thank you, Elua.
System: End of preamble.
Elua:
“maximizes its best guess as to what utility function will be returned by your System” is an overspecification.
Meanwhile, the best pretrained model produced by the same company, which is somehow not GPT-5, is released with ad-hoc fine-tuning to help billions of people browse the Internet, uses python to script arbitrary HTTP requests that seem useful to do in the process (and other cool stuff!), while keeping its model updated with what it learns and decides during all sessions with a particular user.
Freedom and utopia for all humans sounds great until the technology to create tailor-made sentient nonhumans comes along. Or hell, just the David Attenborough like desire to spectate the horrors of the nonhuman biosphere on Earth and billions of planets beyond. People’s values have proven horrible enough times to make me far more afraid of Utopia than any paperclip maximizer.
That’s why we need freedom and utopia for all living beings. Not just for all humans. Anthropocentrism is absurd and insane, much like the natural state with its endless death and suffering. Both must be abolished.
This post by the same author answers your comment: https://carado.moe/surprise-you-want.html
Freedom is just a heuristic; let’s call the actual thing we want for humans our values (which is what we hope Elua will return in this scenario). By definition, our values are everything we want, including possibly the abolition of anthropocentrism.
What is meant here by freedom and utopia is “the best scenario”. It’s not about what our values are, it’s about a method proposed to reach them.
I’ve read that post before. I dislike its narcissistic implications. Even if true, it’s something I think humans can only be harmed by thinking about.
Why would it harm humans?
Do you think that the expected value of thinking about it is negative because of how it might lead us to overlook some forms of alignment?
Any insufficiently human-supremacist AI is an S-risk for humanity. Non-human entities are only valued inasmuch as individual humans value them concretely. No abstract preferences over them should be permitted.
See this sort of thing is why Clippy sounds relatively good to me, and why I don’t agree with Eliezer when he says humans all want the same thing and so CEV would be coherent when applied over all of humanity.
This is a bit difficult to believe, has Eliezer really said something that absurd on-the-record and left it unretracted? Do you have a link?
https://www.lesswrong.com/posts/BkkwXtaTf5LvbA6HB/moral-error-and-moral-disagreement
-Yudkowsky 2008, Moral Error and Moral Disagreement
Seems to me to imply that everybody has basically the same values, that it is rare for humans to have irreconcilable moral differences. Also seems to me to be unfortunately and horribly wrong.
As for retraction I don’t know if he has changed his view on this, I only know it’s part of the Metaethics sequence.
Wow, this does sound like unhinged nonsense. If he still maintains it circa 2023 then I would be really surprised.
The proposition is not that “everybody has basically the same values”, it’s more that everybody has basically the same brains, so a meeting of minds should ideally always be possible between humans, even if it doesn’t happen in practice.
And yet, as was pointed out in a Slate Star Codex thread once, nearly everyone has experiences that other people do not, including having access to entire distinct classes of qualia. The usual examples are that some people have and others lack internal dialogue, or the ability to visually imagine things.
In my case, I lack some of the social instincts neurotypicals take for granted, but on the other hand, I know exactly what divine possession feels like and what all the great mystics of history were babbling about, and most people don’t. And our brains aren’t similar enough for me to have much hope of getting people who cannot have that experience to value it.
No? There exist real living breathing humans that have radically altered brain structure, such as those with one hemisphere removed via surgical procedures or who have a dramatic brain injury.
For example, there’s the quite well known Phineas Gage: https://en.wikipedia.org/wiki/Phineas_Gage
It’s also not too difficult to imagine in the future with the possibilities of more advanced genetic engineering, there could be viable humans born with brains more similar to chimpanzees or dolphins than 2023 humans.
I could haveswornhe said something in the sequences along the lines of “One might be tempted to say of our fellow humans, when arguing over morality, that they simply mean different things by morality and there is nothing factual to argue about, only an inevitable fight. This may be true of things like paperclip maximizers and alien minds. But it is not something that is true of our fellow humans.”Unfortunately I cannot find it right now as I don’t remember the exact phrasing, but it stuck with me when I read it as obviously wrong. If anybody knows what quote I’m talking about please chime in.Edit: Found it, see other reply
Links to the original 2004 article on intelligence.org seem to be broken...they are not even 404ing.
This is one of the things I despise about this community. People here pretend to be altruists, but are not. It is incoherent to value humans and not to value the other beings we share the planet with who, in the space of minds, are massively closer to humans than they are to any AI we are likely to create. But you retreat to moral irrealism and the primacy of arbitrary whims (utility functions) above all else when faced with the supreme absurdity of human supremacy.