I don’t agree with Hanson generally, but I think there’s something there that rationalist AI risk public outreach has overemphasized first principles thinking, theory, and logical possibilities (e.g. evolution, gradient decent, human-chimp analogy, ) over concrete more tangible empirical findings (e.g. deception emerging in small models, specification gaming, LLMs helping to create WMDs, etc.).
I tend to agree with this, I was trying to gesture at the various kinds of empirical evidence we have in the paragraph mentioning Bing, not sure how successful that was.
The situation is quite interesting, since Eliezer was writing about alignment before a lot of this evidence came in. So first-principles reasoning worked for him, at least to the point of predicting that there would be alignment issues, if not to the point of predicting the exact form those issues would take. So many rationalists (probably including me) tend to over-focus on theory, since that’s how they learned it themselves from Eliezer’s writings. But now that we have all these examples, we should definitely be talking about them and learning from them more.
Specifics are just that—specifics. They depend on the details of any given technology, and insofar as no AI for now has the power to self-improve or even come up with complex plans to achieve its goals, they’re not particularly relevant to AGI, which may even use a different architecture altogether.
To me it seems like the arguments remain solid and general, the way, say, the rocket equation is, even if you don’t specifically know what your propellant will be. And like for that time Oppenheimer & co. had to worry about the possibility of igniting the atmosphere, you can’t just go “oh well, can’t possibly work this out from theory alone, let’s roll the dice and see”.
I don’t agree with Hanson generally, but I think there’s something there that rationalist AI risk public outreach has overemphasized first principles thinking, theory, and logical possibilities (e.g. evolution, gradient decent, human-chimp analogy, ) over concrete more tangible empirical findings (e.g. deception emerging in small models, specification gaming, LLMs helping to create WMDs, etc.).
I tend to agree with this, I was trying to gesture at the various kinds of empirical evidence we have in the paragraph mentioning Bing, not sure how successful that was.
The situation is quite interesting, since Eliezer was writing about alignment before a lot of this evidence came in. So first-principles reasoning worked for him, at least to the point of predicting that there would be alignment issues, if not to the point of predicting the exact form those issues would take. So many rationalists (probably including me) tend to over-focus on theory, since that’s how they learned it themselves from Eliezer’s writings. But now that we have all these examples, we should definitely be talking about them and learning from them more.
Specifics are just that—specifics. They depend on the details of any given technology, and insofar as no AI for now has the power to self-improve or even come up with complex plans to achieve its goals, they’re not particularly relevant to AGI, which may even use a different architecture altogether.
To me it seems like the arguments remain solid and general, the way, say, the rocket equation is, even if you don’t specifically know what your propellant will be. And like for that time Oppenheimer & co. had to worry about the possibility of igniting the atmosphere, you can’t just go “oh well, can’t possibly work this out from theory alone, let’s roll the dice and see”.