Hopefully I’m wrong, please help me find a mistake.
There is more than just one mistake here IMO, and I’m not going to try to list them.
Just the title alone (“AGI is uncontrollable, alignment is impossible”) is totally misguided IMO. It would, among other things, imply that brain emulations are impossible (humans can be regarded as a sort of AGI, and it’s not impossible for humans to be aligned).
But oh well. I’m sure your perspectives here are earnestly held / it’s how you currently see things. And there are no “perfect” procedures for evaluating how much to trust one’s own reasoning compared to others.
I would advise reading the sequences (or listening to them as an audiobook) 🙂
I don’t think analogy with humans is reliable. But for the sake of argument I’d like to highlight that corporations and countries are mostly limited by their power, not by alignment. Usually countries declare independence once they are able to.
Most humans are not obedient/subservient to others (at least not maximally so). But also: Most humans would not exterminate the rest of humanity if given the power to do so. I think many humans, if they became a “singleton”, would want to avoid killing other humans. Some would also be inclined to make the world a good place to live for everyone (not just other humans, but other sentient beings as well).
From my perspective, the example of humans was intended as “existence proof”. I expect AGIs we develop to be quite different from ourselves. I wouldn’t be interested in the topic of alignment if I didn’t perceive there to be risks associated with misaligned AGI, but I also don’t think alignment is doomed/hopeless or anything like that 🙂
The only way to control AGI is to contain it. We need to ensure that we run AGI in fully isolated simulations and gather insights with the assumption that the AGI will try to seek power in simulated environment.
I feel that you don’t find my words convincing, maybe I’ll find a better way to articulate my proof. Until then I want to contribute as much as I can to safety.
There is more than just one mistake here IMO, and I’m not going to try to list them.
Just the title alone (“AGI is uncontrollable, alignment is impossible”) is totally misguided IMO. It would, among other things, imply that brain emulations are impossible (humans can be regarded as a sort of AGI, and it’s not impossible for humans to be aligned).
But oh well. I’m sure your perspectives here are earnestly held / it’s how you currently see things. And there are no “perfect” procedures for evaluating how much to trust one’s own reasoning compared to others.
I would advise reading the sequences (or listening to them as an audiobook) 🙂
Thanks for feedback.
I don’t think analogy with humans is reliable. But for the sake of argument I’d like to highlight that corporations and countries are mostly limited by their power, not by alignment. Usually countries declare independence once they are able to.
Most humans are not obedient/subservient to others (at least not maximally so). But also: Most humans would not exterminate the rest of humanity if given the power to do so. I think many humans, if they became a “singleton”, would want to avoid killing other humans. Some would also be inclined to make the world a good place to live for everyone (not just other humans, but other sentient beings as well).
From my perspective, the example of humans was intended as “existence proof”. I expect AGIs we develop to be quite different from ourselves. I wouldn’t be interested in the topic of alignment if I didn’t perceive there to be risks associated with misaligned AGI, but I also don’t think alignment is doomed/hopeless or anything like that 🙂
But it is doomed, the proof is above.
The only way to control AGI is to contain it. We need to ensure that we run AGI in fully isolated simulations and gather insights with the assumption that the AGI will try to seek power in simulated environment.
I feel that you don’t find my words convincing, maybe I’ll find a better way to articulate my proof. Until then I want to contribute as much as I can to safety.
Please don’t.
Please refute the proof rationally before directing.