Isn’t a major point of the Sequences that you can NOT hand anything to a UFAI because it will always find ways to fuck you over? Once you have a UFAI up and running, it’s done, your goose is cooked.
The sequences don’t say an AI will always fail to optimize a formal goal. The problem is more of mismatch between the formal goal and what humans want. My idea tries to make that mismatch small, by making the goal say directly which conscious experiences should exist in the universe (isomorphic to a given set of unmodified human brains experiencing a given VR setup, sometimes creating new brains according to given rules, all of which was defined without AI involvement). Then we’re okay with recursive self-improvement and all sorts of destruction in the pursuit of that goal. It can eat the whole universe if it likes.
Something will have to run it which will include things like preventing some humans from creating virtual hells, waging war on neighbours, etc. etc. That something will have to be an AI.
My idea was to make humans set all the rules, while defining the VR utopia, before giving it to the UFAI. It’d be like writing a video game. It seems possible to write a video game that doesn’t let people create hells (including virtual hells, because the VR can be very coarse-grained). Similar for the problem of pain, just give people some control buttons that other people can’t take away. I think hand-coding a toy universe that feels livable long term and has no sharp tools is well within mankind’s ability.
My idea tries to make that mismatch small, by making the goal say directly which conscious experiences should exist
You think you can formalize a goal which specifies which conscious experiences should exist? It looks to me to be equivalent to formalizing the human value system. And being isomorphic to a “set of unmodified human brains” just gives you the whole humanity as it is: some people’s fantasies involve rainbows and unicorns, and some—pain and domination. There are people who do want hells, virtual or not—so you either will have them in your utopia or you will have to filter such desires out and that involves a value system to decide what’s acceptable in the utopia and what’s not.
My idea was to make humans set all the rules, while defining the VR utopia, before even starting the AI.
That’s called politics and is equivalent to setting the rules for real-life societies on real-life Earth. I don’t see why you would expect it to go noticeably better this time around—you’re still deciding on rules for reality, just with a detour through VR. And how would that work in practice? A UN committee or something? How will disagreements be resolved?
just give people some control buttons that other people can’t take away
To take a trivial example, consider internet harassment. Everyone has a control button that online trolls cannot take away: the off switch on your computer (or even the little X in the top corner of your window). You think it works that well?
You think you can formalize a goal which specifies which conscious experiences should exist? It looks to me to be equivalent to formalizing the human value system.
The hope is that encoding the idea of consciousness will be strictly easier than encoding everything that humans value, including the idea of consciousness (and pleasure, pain, love, population ethics, etc). It’s an assumption of the post.
That’s called politics and is equivalent to setting the rules for real-life societies on real-life Earth.
Correct. My idea doesn’t aim to solve all human problems forever. It aims to solve the problem that right now we’re sitting on a powder keg, with many ways for smarter than human intelligences to emerge, most of which kill everyone. Once we’ve resolved that danger, we can take our time to solve things like politics, internet harassment, or reconciling people’s fantasies.
I agree that defining the VR is itself a political problem, though. Maybe we should do it with a UN committee! It’s a human-scale decision, and even if we get it wrong and a bunch of people suffer, that might be still preferable to killing everyone.
Once we’ve resolved that danger, we can take our time to solve things
I don’t know—I think that once you hand off the formalized goal to the UFAI, you’re stuck: you snapshotted the desired state and you can’t change anything any more. If you can change things, well, that UFAI will make sure things will get changed in the direction it wants.
I think it should be possible to define a game that gives people tools to peacefully resolve disagreements, without giving them tools for intelligence explosion. The two don’t seem obviously connected.
So then, basically, the core of your idea is to move all humans to a controlled reality (first VR, then physical) where an intelligence explosion is impossible? It’s not really supposed to solve any problems, just prevent the expected self-destruction?
Yeah. At quite high cost, too. Like I said, it’s intended as a lower bound of what’s achievable, and I wouldn’t have posted it if any better lower bound was known.
The sequences don’t say an AI will always fail to optimize a formal goal. The problem is more of mismatch between the formal goal and what humans want. My idea tries to make that mismatch small, by making the goal say directly which conscious experiences should exist in the universe (isomorphic to a given set of unmodified human brains experiencing a given VR setup, sometimes creating new brains according to given rules, all of which was defined without AI involvement). Then we’re okay with recursive self-improvement and all sorts of destruction in the pursuit of that goal. It can eat the whole universe if it likes.
My idea was to make humans set all the rules, while defining the VR utopia, before giving it to the UFAI. It’d be like writing a video game. It seems possible to write a video game that doesn’t let people create hells (including virtual hells, because the VR can be very coarse-grained). Similar for the problem of pain, just give people some control buttons that other people can’t take away. I think hand-coding a toy universe that feels livable long term and has no sharp tools is well within mankind’s ability.
You think you can formalize a goal which specifies which conscious experiences should exist? It looks to me to be equivalent to formalizing the human value system. And being isomorphic to a “set of unmodified human brains” just gives you the whole humanity as it is: some people’s fantasies involve rainbows and unicorns, and some—pain and domination. There are people who do want hells, virtual or not—so you either will have them in your utopia or you will have to filter such desires out and that involves a value system to decide what’s acceptable in the utopia and what’s not.
That’s called politics and is equivalent to setting the rules for real-life societies on real-life Earth. I don’t see why you would expect it to go noticeably better this time around—you’re still deciding on rules for reality, just with a detour through VR. And how would that work in practice? A UN committee or something? How will disagreements be resolved?
To take a trivial example, consider internet harassment. Everyone has a control button that online trolls cannot take away: the off switch on your computer (or even the little X in the top corner of your window). You think it works that well?
The hope is that encoding the idea of consciousness will be strictly easier than encoding everything that humans value, including the idea of consciousness (and pleasure, pain, love, population ethics, etc). It’s an assumption of the post.
Correct. My idea doesn’t aim to solve all human problems forever. It aims to solve the problem that right now we’re sitting on a powder keg, with many ways for smarter than human intelligences to emerge, most of which kill everyone. Once we’ve resolved that danger, we can take our time to solve things like politics, internet harassment, or reconciling people’s fantasies.
I agree that defining the VR is itself a political problem, though. Maybe we should do it with a UN committee! It’s a human-scale decision, and even if we get it wrong and a bunch of people suffer, that might be still preferable to killing everyone.
I don’t know—I think that once you hand off the formalized goal to the UFAI, you’re stuck: you snapshotted the desired state and you can’t change anything any more. If you can change things, well, that UFAI will make sure things will get changed in the direction it wants.
I think it should be possible to define a game that gives people tools to peacefully resolve disagreements, without giving them tools for intelligence explosion. The two don’t seem obviously connected.
So then, basically, the core of your idea is to move all humans to a controlled reality (first VR, then physical) where an intelligence explosion is impossible? It’s not really supposed to solve any problems, just prevent the expected self-destruction?
Yeah. At quite high cost, too. Like I said, it’s intended as a lower bound of what’s achievable, and I wouldn’t have posted it if any better lower bound was known.