hypothesis: the kind of reasoning that causes ML people to say “we have made no progress towards AGI whatsoever” is closely analogous to the kind of reasoning that makes alignment people say “we have made no progress towards hard alignment whatsoever”
ML people see stuff like GPT4 and correctly notice that it’s in fact kind of dumb and bad at generalization in the same ways that ML always has been. they make an incorrect extrapolation, which is that AGI must therefore be 100 years away, rather than 10 years away
high p(doom) alignment people see current model alignment techniques and correctly notice that they fail to tackle the AGI alignment problem in the same way that alignment techniques always have. they make an incorrect extrapolation and conclude that p(doom) = 0.99, rather than 0.5
(there is an asymmetry which is that overconfidence that alignment will be solved is much more dangerous than overconfidence that AGI will be solved)
It’s differential progress that matters in alignment. I.e., if you expected that we need additional year of alignment research after creating AGI, it still looks pretty doomed, even if you admit overall progress in field.
sure, but seems orthogonal to the thing i’m describing—the claim is that a lot of alignment work on current models has ~no bearing on progress towards aligning AGI.
hypothesis: the kind of reasoning that causes ML people to say “we have made no progress towards AGI whatsoever” is closely analogous to the kind of reasoning that makes alignment people say “we have made no progress towards hard alignment whatsoever”
ML people see stuff like GPT4 and correctly notice that it’s in fact kind of dumb and bad at generalization in the same ways that ML always has been. they make an incorrect extrapolation, which is that AGI must therefore be 100 years away, rather than 10 years away
high p(doom) alignment people see current model alignment techniques and correctly notice that they fail to tackle the AGI alignment problem in the same way that alignment techniques always have. they make an incorrect extrapolation and conclude that p(doom) = 0.99, rather than 0.5
(there is an asymmetry which is that overconfidence that alignment will be solved is much more dangerous than overconfidence that AGI will be solved)
It’s differential progress that matters in alignment. I.e., if you expected that we need additional year of alignment research after creating AGI, it still looks pretty doomed, even if you admit overall progress in field.
sure, but seems orthogonal to the thing i’m describing—the claim is that a lot of alignment work on current models has ~no bearing on progress towards aligning AGI.