I expect those in power to see this rather simple fact (experts disagree wildly) and realize they should slow down, but I fear that could happen too late.
I don’t expect that to happen until at least some experts are saying that the danger is imminent, rather than a few years away, and probably not until we get a moderately impressive near miss that supports this claim. Currently, basically everyone still agrees that models are not existentially dangerous yet.
Racing all they way to the edge of the precipice and then slamming the brakes on at the very last moment like we’re playing Chicken has a very obvious failure mode — nevertheless, that’s what I’m expecting society to attempt to do. Which unfortunately means those of us taking part in the public discourse need to not be the Boy Who Cried Wolf before we’re actually getting within clear sight of the edge, and restrain ourselves to posting warnings about the probability of wolves ahead.
Now, that’s for technical alignment. The additional problems of societal alignment (whose values is it aligned to, and how does that all shake out) are a different ball of wax.
Absolutely! I have some opinions on that too, but that seems like an area where the people who work on governance problems probably have more leverage.
Yes, my question (and my answer) are about how hard a problem technical alignment is; and as I discuss in it, it’s assuming that, at the moment, the best path to primarily work on is on how to align LLMs, for three reasons: because ASI will probably happen sooner if it happens via LLMs than via some other architecture, because if our ASI isn’t LLM-based or LLM-like it may well still contain an LLM as a subcomponent or I/O device (or at least something that was trained via distilling information and behavior from humans using SGD), and because it’s generally more productive to work on something that already exists than something still mostly hypothetical. I’m glad there are people like Steven Byrnes working on other approaches to aligning other sorts of ASI, having a range of bets is good, but I think for now putting the bulk of our effort into aligning LLMs makes sense.
I don’t expect that to happen until at least some experts are saying that the danger is imminent, rather than a few years away, and probably not until we get a moderately impressive near miss that supports this claim. Currently, basically everyone still agrees that models are not existentially dangerous yet.
Racing all they way to the edge of the precipice and then slamming the brakes on at the very last moment like we’re playing Chicken has a very obvious failure mode — nevertheless, that’s what I’m expecting society to attempt to do. Which unfortunately means those of us taking part in the public discourse need to not be the Boy Who Cried Wolf before we’re actually getting within clear sight of the edge, and restrain ourselves to posting warnings about the probability of wolves ahead.
Absolutely! I have some opinions on that too, but that seems like an area where the people who work on governance problems probably have more leverage.
Yes, my question (and my answer) are about how hard a problem technical alignment is; and as I discuss in it, it’s assuming that, at the moment, the best path to primarily work on is on how to align LLMs, for three reasons: because ASI will probably happen sooner if it happens via LLMs than via some other architecture, because if our ASI isn’t LLM-based or LLM-like it may well still contain an LLM as a subcomponent or I/O device (or at least something that was trained via distilling information and behavior from humans using SGD), and because it’s generally more productive to work on something that already exists than something still mostly hypothetical. I’m glad there are people like Steven Byrnes working on other approaches to aligning other sorts of ASI, having a range of bets is good, but I think for now putting the bulk of our effort into aligning LLMs makes sense.