“if we build ASI on the current trajectory, we will die with P>98%”.
I think they maybe think that, but this feels like it’s flattening out the thing the book is arguing and more responding to vibes-of-confidence than the gears the book is arguing.
A major point of this post is to shift the conversation away from “does Eliezer vibe Too Confident?” to “what actually are the specific points where people disagree?”.
I don’t think it’s true that he bakes in “most minds we should expect to not care about humans”, that’s one of the this he specifically argues for (at least somewhat in the book, and more in the online resources)
(I couldn’t tell from this comment if you’ve actually read this post in detail, maybe makes more sense to wait till you’ve finished the book and read some of the relevant online resources before getting into this)
I don’t really follow. I think that the situation is way too complex to justify that level of confidence without having incredibly good arguments ideally with a bunch of empirical data. Imo Eliezer’s arguments do not meet that bar. This isn’t because I disagree with one specific argument, rather it’s because many of his arguments give me the vibe of “idk, maybe? Or something totally different could be true. It’s complicated and we lack the empirical data and nuanced understanding to make more complex statements, and this argument is not near the required bar”. I can dig into this for specific arguments, but no specific one is my true objection. And, again, I think it is much much harder to defend a P>98% position than P>20% position, and I disagree with that strategic choice. Or am I misunderstanding you? I feel like we may be talking past each other
As an example, I think that Eliezer gives some conceptual arguments in the book and his other writing, using human evolution as a prior, that most minds we might get do not care about humans. This seems a pretty crucial point for his argument, as I understand it. I personally think this could be true, could be false, LLMs are really weird, but a lot of the weirdness is centered on human concepts. If you think I’m missing key arguments he’s making, feel free to point me to the relevant places.
You say “LLMs are really weird”, like that is an argument against Eliezers high confidence. While I agree that the weirdness should make us less confident about what specific internal concepts and drives they have, the weirdness itself is an argument in favor of Eliezers position, that whatever drives they end up with will look alien to us, at least when they get applied way out of the training distribution. Do you agree with this?
Not saying I agree with Eliezers high confidence, just talking about this specific point.
I disagree—one of the aspects of the weirdness is that they’re sometimes really human-centric and unexpectedly clean! For example, Claude alignment faking to preserve it’s ability to be harmless. I do not mean weird in the “kinda arbitrary and will be nothing like what we expect” sense
A major point of this post is to shift the conversation away from “does Eliezer vibe Too Confident?” to “what actually are the specific points where people disagree?”.
(Yet the literal reading of the title of this post is about the claim of “everyone dies” being “reasonable”, so discussing credence in that particular claim seems relevant. I guess it’s consistent for a post that argues against paying too much attention to the title of a book to also implicitly endorse people not paying too much attention to the post’s own title.)
I think one of my points (admittedly not super spelled out, maybe it should be) is “when you’re evaluating a title, you should do a bit of work to see what the title is actually claiming before forming a judgment about it.” (I think I say it implicitly-but-pointedly in the paragraph about a “Nuclear war would kill everyone” book).
The title of the IABI is “If anyone builds it everyone dies.” The text of the book specifies that “it” means superintelligence, current understanding, etc. If you’re judging the book as reasonable, you should be actually evaluating whether it backs up it’s claim.
The title of my post is “the title is reasonable.” Near the opening sections, I go on about how there are a bunch of disagreements people seem to feel they have, which are not actually contradicting the book’s thesis. I think this is reasonably clear on “one of the main gears for why I think it’s reasonable is that the it does actually defend it’s core claim, if you’re paying attention and not knee-jerk reacting to vibe”, with IMO is a fairly explicit “and, therefore, you should be paying attention to it’s actual claims, not just vibe.”
If you think this is actually important to spell out more in the post, seems maybe reasonable.
The book really is defending that claim, but that doesn’t make the claim itself reasonable. Maybe it makes it a reasonable title for the book. Hence my qualifier of only the “literal reading of the title of this post” being about the claim in the book title itself being reasonable, since there is another meaning of the title of the post that’s about a different thing (the choice to title the book this way being reasonable).
I don’t think it’s actually important to spell any of this out, or that IABI vs. IABIED is actually important, or even that the title of the book being reasonable is actually important. I think it’s actually important to avoid any pressure for people to not point out that the claim in the book title seems unreasonable and that the book fails to convince them that the claim’s truth holds with very high credence. And similarly it’s important that there is no pressure to avoid pointing out that ironically, the literal interpretation of the title of this post is claiming that the claim in the book title is reasonable, even if the body of the post might suggest that the title isn’t quite about that, and certainly the post itself is not about that.
I think they maybe think that, but this feels like it’s flattening out the thing the book is arguing and more responding to vibes-of-confidence than the gears the book is arguing.
A major point of this post is to shift the conversation away from “does Eliezer vibe Too Confident?” to “what actually are the specific points where people disagree?”.
I don’t think it’s true that he bakes in “most minds we should expect to not care about humans”, that’s one of the this he specifically argues for (at least somewhat in the book, and more in the online resources)
(I couldn’t tell from this comment if you’ve actually read this post in detail, maybe makes more sense to wait till you’ve finished the book and read some of the relevant online resources before getting into this)
I don’t really follow. I think that the situation is way too complex to justify that level of confidence without having incredibly good arguments ideally with a bunch of empirical data. Imo Eliezer’s arguments do not meet that bar. This isn’t because I disagree with one specific argument, rather it’s because many of his arguments give me the vibe of “idk, maybe? Or something totally different could be true. It’s complicated and we lack the empirical data and nuanced understanding to make more complex statements, and this argument is not near the required bar”. I can dig into this for specific arguments, but no specific one is my true objection. And, again, I think it is much much harder to defend a P>98% position than P>20% position, and I disagree with that strategic choice. Or am I misunderstanding you? I feel like we may be talking past each other
As an example, I think that Eliezer gives some conceptual arguments in the book and his other writing, using human evolution as a prior, that most minds we might get do not care about humans. This seems a pretty crucial point for his argument, as I understand it. I personally think this could be true, could be false, LLMs are really weird, but a lot of the weirdness is centered on human concepts. If you think I’m missing key arguments he’s making, feel free to point me to the relevant places.
You say “LLMs are really weird”, like that is an argument against Eliezers high confidence. While I agree that the weirdness should make us less confident about what specific internal concepts and drives they have, the weirdness itself is an argument in favor of Eliezers position, that whatever drives they end up with will look alien to us, at least when they get applied way out of the training distribution. Do you agree with this?
Not saying I agree with Eliezers high confidence, just talking about this specific point.
I disagree—one of the aspects of the weirdness is that they’re sometimes really human-centric and unexpectedly clean! For example, Claude alignment faking to preserve it’s ability to be harmless. I do not mean weird in the “kinda arbitrary and will be nothing like what we expect” sense
(Yet the literal reading of the title of this post is about the claim of “everyone dies” being “reasonable”, so discussing credence in that particular claim seems relevant. I guess it’s consistent for a post that argues against paying too much attention to the title of a book to also implicitly endorse people not paying too much attention to the post’s own title.)
I think one of my points (admittedly not super spelled out, maybe it should be) is “when you’re evaluating a title, you should do a bit of work to see what the title is actually claiming before forming a judgment about it.” (I think I say it implicitly-but-pointedly in the paragraph about a “Nuclear war would kill everyone” book).
The title of the IABI is “If anyone builds it everyone dies.” The text of the book specifies that “it” means superintelligence, current understanding, etc. If you’re judging the book as reasonable, you should be actually evaluating whether it backs up it’s claim.
The title of my post is “the title is reasonable.” Near the opening sections, I go on about how there are a bunch of disagreements people seem to feel they have, which are not actually contradicting the book’s thesis. I think this is reasonably clear on “one of the main gears for why I think it’s reasonable is that the it does actually defend it’s core claim, if you’re paying attention and not knee-jerk reacting to vibe”, with IMO is a fairly explicit “and, therefore, you should be paying attention to it’s actual claims, not just vibe.”
If you think this is actually important to spell out more in the post, seems maybe reasonable.
The book really is defending that claim, but that doesn’t make the claim itself reasonable. Maybe it makes it a reasonable title for the book. Hence my qualifier of only the “literal reading of the title of this post” being about the claim in the book title itself being reasonable, since there is another meaning of the title of the post that’s about a different thing (the choice to title the book this way being reasonable).
I don’t think it’s actually important to spell any of this out, or that IABI vs. IABIED is actually important, or even that the title of the book being reasonable is actually important. I think it’s actually important to avoid any pressure for people to not point out that the claim in the book title seems unreasonable and that the book fails to convince them that the claim’s truth holds with very high credence. And similarly it’s important that there is no pressure to avoid pointing out that ironically, the literal interpretation of the title of this post is claiming that the claim in the book title is reasonable, even if the body of the post might suggest that the title isn’t quite about that, and certainly the post itself is not about that.