I’m still writing up and refining my thoughts on the book, so I’ll be brief here and may be skipping some steps. But a few points of criticism:
If (2) is true and (1) is false (that is, if doom is likely and the book fails to make a strong case for it), then it is hard to imagine an amount of criticism that the book does not deserve. I’m pretty skeptical of the thesis of the book, but if I were less skeptical, I think I would be even more critical of the book itself.
We can accept both (1) and (2) and reject (3) if the probability of reaching AGI is low enough. Are we more likely to successfully solve alignment (or to forestall AGI forever) by putting our effort here? This is only plausible if LLMs in particular are very likely to become AGI.
I’m also not at all convinced by (2), and I don’t think the book addresses my concerns even a bit? A colossal superintelligence, generating in a day insights that would take all of humanity a decade, which developed a desire to not just satisfy but optimize even its smallest desires to the largest possible degree (even for things it was not trained to optimize), would probably destroy humanity, yes, even if its training gave it instincts based in human though. I’d put p(doom) at maybe 80% in that scenario. But why would this scenario happen?
As I understand it, the normal argument here tries to say that the only way to have goals really is to optimize a utility function that you can define and most utility functions don’t have a big positive coefficient on “humans exist” so doom. This argument fails for reasons that I’m hoping to type up when I’m confident I’ve understood the book (and associated work) better. I do not think we have any appreciable chance of creating this type of superintelligence by anything that could plausibly develop from modern methods, and I think we’re more likely to get this sort of superintelligence if we give ourselves no information whatsoever about what machine intelligence might look like. Shutting everything down right now doesn’t seem like it could plausibly help more than it would hurt.
I also want to push back on your core claim here, which reflects a dangerous way of thinking:
Because, I believe that the authors have made a compelling case that even if >95% of their specific arguments are incorrect, the core claim “if anyone builds it, everyone dies” still holds true.
This is never true? No argument is so compelling that it can be wrong about every claim of fact and still convince skeptics, because skeptics do not believe that they are wrong about the facts. If you believe something like this about any topic, then you have failed to model people who disagree with you (not failed to model them accurately, failed to model them at all) and you probably shouldn’t be seeking out productive disagreement until you fix that.
I’m still writing up and refining my thoughts on the book, so I’ll be brief here and may be skipping some steps. But a few points of criticism:
If (2) is true and (1) is false (that is, if doom is likely and the book fails to make a strong case for it), then it is hard to imagine an amount of criticism that the book does not deserve. I’m pretty skeptical of the thesis of the book, but if I were less skeptical, I think I would be even more critical of the book itself.
We can accept both (1) and (2) and reject (3) if the probability of reaching AGI is low enough. Are we more likely to successfully solve alignment (or to forestall AGI forever) by putting our effort here? This is only plausible if LLMs in particular are very likely to become AGI.
I’m also not at all convinced by (2), and I don’t think the book addresses my concerns even a bit? A colossal superintelligence, generating in a day insights that would take all of humanity a decade, which developed a desire to not just satisfy but optimize even its smallest desires to the largest possible degree (even for things it was not trained to optimize), would probably destroy humanity, yes, even if its training gave it instincts based in human though. I’d put p(doom) at maybe 80% in that scenario. But why would this scenario happen?
As I understand it, the normal argument here tries to say that the only way to have goals really is to optimize a utility function that you can define and most utility functions don’t have a big positive coefficient on “humans exist” so doom. This argument fails for reasons that I’m hoping to type up when I’m confident I’ve understood the book (and associated work) better. I do not think we have any appreciable chance of creating this type of superintelligence by anything that could plausibly develop from modern methods, and I think we’re more likely to get this sort of superintelligence if we give ourselves no information whatsoever about what machine intelligence might look like. Shutting everything down right now doesn’t seem like it could plausibly help more than it would hurt.
I also want to push back on your core claim here, which reflects a dangerous way of thinking:
This is never true? No argument is so compelling that it can be wrong about every claim of fact and still convince skeptics, because skeptics do not believe that they are wrong about the facts. If you believe something like this about any topic, then you have failed to model people who disagree with you (not failed to model them accurately, failed to model them at all) and you probably shouldn’t be seeking out productive disagreement until you fix that.