I think this post might be the best one of all the MIRI dialogues. I also feel confused about how to relate to the MIRI dialogues overall.
A lot of the MIRI dialogues consist of Eliezer and Nate saying things that seem really important and obvious to me, and a lot of my love for them comes from a feeling of “this actually makes a bunch of the important arguments for why the problem is hard”. But the nature of the argument is kind of closed off.
Like, I agree with these arguments, but like, if you believe these arguments, having traction on AI Alignment becomes much harder, and a lot of things that people currently label “AI Alignment” kind of stops feeling real, and I have this feeling that even though a really quite substantial fraction of the people I talk to about AI Alignment are compelled by Eliezer’s argument for difficulty, that there is some kind of structural reason that AI Alignment as a field can’t really track these arguments.
Like, a lot of people’s jobs and funding rely on these arguments being false, and also, if these arguments are correct, the space of perspectives on the problem suddenly loses a lot of common ground on how to proceed or what to do, and it isn’t really obvious that you even want an “AI Alignment field” or lots of “AI Alignment research organizations” or “AI Alignment student groups”. Like, because we don’t know how to solve this problem, it really isn’t clear what the right type of social organization is, and there aren’t obviously great gains from trade, and so from a coalition perspective, you don’t get a coalition of people who think these arguments are real.
I feel deeply confused about this. Over the last two years, I think I wrongly ended up just kind of investing into an ecosystem of people that somewhat structurally can’t really handle these arguments, and makes plans that assume that these arguments are false, and in doing so actually mostly makes the world worse, by having a far too optimistic stance on the differential technological progress of solving various ML challenges, and feeling like they can pick up a lot of probability mass of good outcomes by just having better political relationships to capabilities-labs by giving them resources to make AI happen even faster.
I now regret that a lot, and I think somehow engaging with these dialogues more closely, or having more discussion of them, would have prevented me from making what I currently consider one of the biggest mistakes in my life. Maybe also making them more accessible, or somehow having them be structured in a way that gave me as a reader more permission for actually taking the conclusions of them seriously, by having content that builds on these assumptions and asks the question “what’s next” instead of just the question of “why not X?” in dialogue with people who disagree.
In terms of follow-up work, the dialogues I would most love to see is maybe a conversation between Eliezer and Nate, or between John Wentworth and Eliezer, where they try to hash out their disagreements about what to do next, instead of having the conversation be at the level these dialogues were at.
I think I was actually helping Robby edit some early version of this post a few months before it was posted on LessWrong, so I think my exposure to it was actually closer to ~18-20 months ago.
I do think that still means I set a lot of my current/recent plans into motion before this was out, and your post is appreciated.
Interestingly enough I believe the opposite: Eliezer was quite wrong (Though not wrong enough to totally think we’re out of the danger zone).
I think this for several reasons:
I think that GPT is proof that reasonably large intelligence can be done without being agentic. A lot of LW arguments start failing once we realize that GPT isn’t an agent, but rather a simulator/oracle AI like Janus’s Simulator post. His post is here:
And this is immensely valuable, especially if the simulator framing holds in the limit, which means we have superhuman AI that is myopic and non-agentic, so no instrumental convergence or inner alignment problems come up here. This drastically avoids many hard questions to solve.
I believe natural abstractions hold well enough such that the abstractions used by a human and ones used by an AI are easy to translate. One of Logan Zollener’s posts covers how good natural abstractions are, and they are really good in models that are very capable. If AI Alignment was a natural abstraction, then Outer Alignment solves itself, though I would be careful here. Logan Zollener’s post is here:
I believe sandboxing powerful AI such that they don’t learn particular things like human models or deception is actually possible and maybe reasonably practical. Indeed I gave a proof on Christmas showing that conditioned on careful enough curation of data and fully removing nondeterminism (Which isn’t super difficult, Blockchain already does this for consensus reasons), then AI can’t break out of the sandbox due to the No Free Lunch theorem.
One big problem still remains: Amdahl’s law suggests that if you have a tool that helps you do something very well vs an agent where you just delegate things to, agents are way better, since they’re not bottlenecked on the human. And I fear economic pressure will make people give more and more control, until the AI is given full control and then a discontinuity suddenly emerges. And I think this economic pressure is probably going to lead to the problems inherent in agents.
I think this post might be the best one of all the MIRI dialogues. I also feel confused about how to relate to the MIRI dialogues overall.
A lot of the MIRI dialogues consist of Eliezer and Nate saying things that seem really important and obvious to me, and a lot of my love for them comes from a feeling of “this actually makes a bunch of the important arguments for why the problem is hard”. But the nature of the argument is kind of closed off.
Like, I agree with these arguments, but like, if you believe these arguments, having traction on AI Alignment becomes much harder, and a lot of things that people currently label “AI Alignment” kind of stops feeling real, and I have this feeling that even though a really quite substantial fraction of the people I talk to about AI Alignment are compelled by Eliezer’s argument for difficulty, that there is some kind of structural reason that AI Alignment as a field can’t really track these arguments.
Like, a lot of people’s jobs and funding rely on these arguments being false, and also, if these arguments are correct, the space of perspectives on the problem suddenly loses a lot of common ground on how to proceed or what to do, and it isn’t really obvious that you even want an “AI Alignment field” or lots of “AI Alignment research organizations” or “AI Alignment student groups”. Like, because we don’t know how to solve this problem, it really isn’t clear what the right type of social organization is, and there aren’t obviously great gains from trade, and so from a coalition perspective, you don’t get a coalition of people who think these arguments are real.
I feel deeply confused about this. Over the last two years, I think I wrongly ended up just kind of investing into an ecosystem of people that somewhat structurally can’t really handle these arguments, and makes plans that assume that these arguments are false, and in doing so actually mostly makes the world worse, by having a far too optimistic stance on the differential technological progress of solving various ML challenges, and feeling like they can pick up a lot of probability mass of good outcomes by just having better political relationships to capabilities-labs by giving them resources to make AI happen even faster.
I now regret that a lot, and I think somehow engaging with these dialogues more closely, or having more discussion of them, would have prevented me from making what I currently consider one of the biggest mistakes in my life. Maybe also making them more accessible, or somehow having them be structured in a way that gave me as a reader more permission for actually taking the conclusions of them seriously, by having content that builds on these assumptions and asks the question “what’s next” instead of just the question of “why not X?” in dialogue with people who disagree.
In terms of follow-up work, the dialogues I would most love to see is maybe a conversation between Eliezer and Nate, or between John Wentworth and Eliezer, where they try to hash out their disagreements about what to do next, instead of having the conversation be at the level these dialogues were at.
If it’s a mistake you made over the last two years, I have to say in your defense that this post didn’t exist 2 years ago.
I think I was actually helping Robby edit some early version of this post a few months before it was posted on LessWrong, so I think my exposure to it was actually closer to ~18-20 months ago.
I do think that still means I set a lot of my current/recent plans into motion before this was out, and your post is appreciated.
Interestingly enough I believe the opposite: Eliezer was quite wrong (Though not wrong enough to totally think we’re out of the danger zone).
I think this for several reasons:
I think that GPT is proof that reasonably large intelligence can be done without being agentic. A lot of LW arguments start failing once we realize that GPT isn’t an agent, but rather a simulator/oracle AI like Janus’s Simulator post. His post is here:
https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators
And this is immensely valuable, especially if the simulator framing holds in the limit, which means we have superhuman AI that is myopic and non-agentic, so no instrumental convergence or inner alignment problems come up here. This drastically avoids many hard questions to solve.
I believe natural abstractions hold well enough such that the abstractions used by a human and ones used by an AI are easy to translate. One of Logan Zollener’s posts covers how good natural abstractions are, and they are really good in models that are very capable. If AI Alignment was a natural abstraction, then Outer Alignment solves itself, though I would be careful here. Logan Zollener’s post is here:
https://www.lesswrong.com/posts/BdfQMrtuL8wNfpfnF/natural-categories-update
I believe sandboxing powerful AI such that they don’t learn particular things like human models or deception is actually possible and maybe reasonably practical. Indeed I gave a proof on Christmas showing that conditioned on careful enough curation of data and fully removing nondeterminism (Which isn’t super difficult, Blockchain already does this for consensus reasons), then AI can’t break out of the sandbox due to the No Free Lunch theorem.
Post here by me:
https://www.lesswrong.com/posts/osmwiGkCGxqPfLf4A/i-ve-updated-towards-ai-boxing-being-surprisingly-easy
One big problem still remains: Amdahl’s law suggests that if you have a tool that helps you do something very well vs an agent where you just delegate things to, agents are way better, since they’re not bottlenecked on the human. And I fear economic pressure will make people give more and more control, until the AI is given full control and then a discontinuity suddenly emerges. And I think this economic pressure is probably going to lead to the problems inherent in agents.