I am not actually convinced that this annoyance should be fed instead of starved? I’m thinking mostly about Your Price for Joining, here, and a comment by Nate at a recent event that he wished people who thought the book was all correct except for part i would argue with people who thought the book was all correct except for part j. Like, I think the overlap between your criticisms and the criticisms of other people in the community is actually pretty low, and the correct interpretation of that is more like “yes we agree with the core thesis but disagree on nuance” rather than “we all think the core thesis is wrong in identifiable way X.”
[To be clear, the core thesis is “if anybody builds it, everybody dies”, and that the default path is ruin. If you think your alignment agenda has a shot, and also everyone will implement your alignment agenda on the default path—then it makes sense to disagree, because you’re part of the “it”? But I think this is a probably an unreasonable expectation, and if you think there’s a company that isn’t going to implement your alignment agenda, then maybe you’re not in the “it”.]
Like, I think the overlap between your criticisms and the criticisms of other people in the community is actually pretty low,
I disagree; I think my core complaints about their arguments are very similar to Will MacAskill, Kelsey Piper, and the majority of people whose ideas on AI safety I have a moderate amount of respect for. I agree that some other LW people agree with the parts I think are wrong, but that’s not what I’m talking about.
I think my core complaints about their arguments are very similar to Will MacAskill, Kelsey Piper, and the majority of people whose ideas on AI safety I have a moderate amount of respect for.
So taking this tweet as representative of MacAskill’s thoughts, and this as representative of Kelsey Piper’s, I see:
The evolution analogy in part I.
You like it but think how the authors would handle the disanalogies would probably be bad; MacAskill complains that they don’t handle the disanalogies; Piper doesn’t discuss it.
Discontinuous capability growth.
MacAskill doesn’t like it; you don’t seem to comment on it; Piper doesn’t seem to comment on it. (I think MacAskill also misunderstands its role and relevance in the argument.)
In particular, MacAskill quotes PC’s summary of EY as “you can’t learn anything about alignment from experimentation and failures before the critical try” but I think EY’s position is closer to “you can’t learn enough about alignment from experimentation and failures before the critical try”.
The world in which we make our first crucial try will be significantly different from the current world.
I think both you and MacAskill think this is a significant deficiency (this is your tricky hypothesis #2); I think Piper also identifies this as a point that the authors don’t adequately elaborate on, but as far as I can tell she doesn’t think this is critical. (That is, yes, the situation might be better in the future, but not obviously better enough that we shouldn’t attempt a ban now.)
Catastrophic misalignment.
MacAskill thinks we have lots of evidence that AIs will not do what the user wanted, but not very much evidence that AIs will attempt to take over. I think both you and Piper think it’s likely that there will be at least one AI of sufficient capability that attempts to take over.
Part 3.
You and MacAskill all seem to dislike their policy proposals. Piper seems much more pro-ban than you or MacAskill are; I don’t get a good sense of whether MacAskill actually thinks a ban is bad (what catchup risk is there if neither frontrunners nor laggards can train AIs?) or just unlikely to be implemented.
I don’t think MacAskill is thinking thru the “close substitutes for agentic superintelligence” point. If they are close substitutes, then they have enough of the risks of agentic superintelligence that it still makes sense to ban them!
So, at least on this pass, I didn’t actually find a specific point that all three of you agreed on. (I don’t count “they should have had a better editor” as a specific point, because it doesn’t specify the direction; an editing choice Piper liked more could easily have been an editing choice that MacAskill liked less.)
The closest was that the book isn’t explicit or convincing enough when talking about iterative alignment strategies (like in chapter 11). Are there other points that I missed (or should I believe your agreement on that point is actually much clearer than I think it is)?
I think me and Will and Kelsey have similar positions (having talked with both of them about this quite a lot), we just emphasized different disagreements.
(Except that I am less sold than Kelsey on her “maybe we can have non-agentic substitutes” point.)
I am not actually convinced that this annoyance should be fed instead of starved? I’m thinking mostly about Your Price for Joining, here, and a comment by Nate at a recent event that he wished people who thought the book was all correct except for part i would argue with people who thought the book was all correct except for part j. Like, I think the overlap between your criticisms and the criticisms of other people in the community is actually pretty low, and the correct interpretation of that is more like “yes we agree with the core thesis but disagree on nuance” rather than “we all think the core thesis is wrong in identifiable way X.”
[To be clear, the core thesis is “if anybody builds it, everybody dies”, and that the default path is ruin. If you think your alignment agenda has a shot, and also everyone will implement your alignment agenda on the default path—then it makes sense to disagree, because you’re part of the “it”? But I think this is a probably an unreasonable expectation, and if you think there’s a company that isn’t going to implement your alignment agenda, then maybe you’re not in the “it”.]
I disagree; I think my core complaints about their arguments are very similar to Will MacAskill, Kelsey Piper, and the majority of people whose ideas on AI safety I have a moderate amount of respect for. I agree that some other LW people agree with the parts I think are wrong, but that’s not what I’m talking about.
So taking this tweet as representative of MacAskill’s thoughts, and this as representative of Kelsey Piper’s, I see:
The evolution analogy in part I.
You like it but think how the authors would handle the disanalogies would probably be bad; MacAskill complains that they don’t handle the disanalogies; Piper doesn’t discuss it.
Discontinuous capability growth.
MacAskill doesn’t like it; you don’t seem to comment on it; Piper doesn’t seem to comment on it. (I think MacAskill also misunderstands its role and relevance in the argument.)
In particular, MacAskill quotes PC’s summary of EY as “you can’t learn anything about alignment from experimentation and failures before the critical try” but I think EY’s position is closer to “you can’t learn enough about alignment from experimentation and failures before the critical try”.
The world in which we make our first crucial try will be significantly different from the current world.
I think both you and MacAskill think this is a significant deficiency (this is your tricky hypothesis #2); I think Piper also identifies this as a point that the authors don’t adequately elaborate on, but as far as I can tell she doesn’t think this is critical. (That is, yes, the situation might be better in the future, but not obviously better enough that we shouldn’t attempt a ban now.)
Catastrophic misalignment.
MacAskill thinks we have lots of evidence that AIs will not do what the user wanted, but not very much evidence that AIs will attempt to take over. I think both you and Piper think it’s likely that there will be at least one AI of sufficient capability that attempts to take over.
Part 3.
You and MacAskill all seem to dislike their policy proposals. Piper seems much more pro-ban than you or MacAskill are; I don’t get a good sense of whether MacAskill actually thinks a ban is bad (what catchup risk is there if neither frontrunners nor laggards can train AIs?) or just unlikely to be implemented.
I don’t think MacAskill is thinking thru the “close substitutes for agentic superintelligence” point. If they are close substitutes, then they have enough of the risks of agentic superintelligence that it still makes sense to ban them!
So, at least on this pass, I didn’t actually find a specific point that all three of you agreed on. (I don’t count “they should have had a better editor” as a specific point, because it doesn’t specify the direction; an editing choice Piper liked more could easily have been an editing choice that MacAskill liked less.)
The closest was that the book isn’t explicit or convincing enough when talking about iterative alignment strategies (like in chapter 11). Are there other points that I missed (or should I believe your agreement on that point is actually much clearer than I think it is)?
I think me and Will and Kelsey have similar positions (having talked with both of them about this quite a lot), we just emphasized different disagreements.
(Except that I am less sold than Kelsey on her “maybe we can have non-agentic substitutes” point.)
Do you agree with the “types of misalignment” section of MacAskill’s tweet? (Or, I guess, is it ‘similar to your position’?)
If not, I think it would be neat to see the two of you have some sort of public dialogue about it.