Interestingly enough I believe the opposite: Eliezer was quite wrong (Though not wrong enough to totally think we’re out of the danger zone).
I think this for several reasons:
I think that GPT is proof that reasonably large intelligence can be done without being agentic. A lot of LW arguments start failing once we realize that GPT isn’t an agent, but rather a simulator/oracle AI like Janus’s Simulator post. His post is here:
And this is immensely valuable, especially if the simulator framing holds in the limit, which means we have superhuman AI that is myopic and non-agentic, so no instrumental convergence or inner alignment problems come up here. This drastically avoids many hard questions to solve.
I believe natural abstractions hold well enough such that the abstractions used by a human and ones used by an AI are easy to translate. One of Logan Zollener’s posts covers how good natural abstractions are, and they are really good in models that are very capable. If AI Alignment was a natural abstraction, then Outer Alignment solves itself, though I would be careful here. Logan Zollener’s post is here:
I believe sandboxing powerful AI such that they don’t learn particular things like human models or deception is actually possible and maybe reasonably practical. Indeed I gave a proof on Christmas showing that conditioned on careful enough curation of data and fully removing nondeterminism (Which isn’t super difficult, Blockchain already does this for consensus reasons), then AI can’t break out of the sandbox due to the No Free Lunch theorem.
One big problem still remains: Amdahl’s law suggests that if you have a tool that helps you do something very well vs an agent where you just delegate things to, agents are way better, since they’re not bottlenecked on the human. And I fear economic pressure will make people give more and more control, until the AI is given full control and then a discontinuity suddenly emerges. And I think this economic pressure is probably going to lead to the problems inherent in agents.
Interestingly enough I believe the opposite: Eliezer was quite wrong (Though not wrong enough to totally think we’re out of the danger zone).
I think this for several reasons:
I think that GPT is proof that reasonably large intelligence can be done without being agentic. A lot of LW arguments start failing once we realize that GPT isn’t an agent, but rather a simulator/oracle AI like Janus’s Simulator post. His post is here:
https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators
And this is immensely valuable, especially if the simulator framing holds in the limit, which means we have superhuman AI that is myopic and non-agentic, so no instrumental convergence or inner alignment problems come up here. This drastically avoids many hard questions to solve.
I believe natural abstractions hold well enough such that the abstractions used by a human and ones used by an AI are easy to translate. One of Logan Zollener’s posts covers how good natural abstractions are, and they are really good in models that are very capable. If AI Alignment was a natural abstraction, then Outer Alignment solves itself, though I would be careful here. Logan Zollener’s post is here:
https://www.lesswrong.com/posts/BdfQMrtuL8wNfpfnF/natural-categories-update
I believe sandboxing powerful AI such that they don’t learn particular things like human models or deception is actually possible and maybe reasonably practical. Indeed I gave a proof on Christmas showing that conditioned on careful enough curation of data and fully removing nondeterminism (Which isn’t super difficult, Blockchain already does this for consensus reasons), then AI can’t break out of the sandbox due to the No Free Lunch theorem.
Post here by me:
https://www.lesswrong.com/posts/osmwiGkCGxqPfLf4A/i-ve-updated-towards-ai-boxing-being-surprisingly-easy
One big problem still remains: Amdahl’s law suggests that if you have a tool that helps you do something very well vs an agent where you just delegate things to, agents are way better, since they’re not bottlenecked on the human. And I fear economic pressure will make people give more and more control, until the AI is given full control and then a discontinuity suddenly emerges. And I think this economic pressure is probably going to lead to the problems inherent in agents.