I mean, I’d put it the other way: You can make a pretty good case that the last three years have given you more opportunity to update your model of “intelligence” than any prior time in history, no? How could it not be reasonable to have changed your mind about things? And therefore rather reasonable to have updated in some positive / negative direction?
(Maybe the best years of Cajal’s life were better? But, yeah, surely there has been tons of evidence from the last three years.)
I’m not saying you need to update in a positive direction. If you want you could update in negative direction, go for it. I’m just saying—antedently, if your model of the world isn’t hugely different now than three years ago, what was your model even doing?
Like for it not to update means that your model must have already had gears in it which were predicting stuff like: vastly improved interpretability and the manner of interpretability; RL-over-CoT; persistent lack of steganography within RL-over-CoT; policy gradient being all you need for an actually astonishing variety of stuff; continuation of persona priors over “instrumental convergence” themed RL tendencies; the rise (and fall?) of reward hacking; model specs becoming ever-more-detailed; goal-guarding ala Opus-3 being ephemeral and easily avoidable; the continued failure of “fast takeoff” despite hitting various milestones; and so on. I didn’t have all of these predicted three years ago!
So it seems pretty reasonable to actually have changed your mind a lot; I think that’s a better point to start at than “how could you change your mind.”
I mean, I’d put it the other way: You can make a pretty good case that the last three years have given you more opportunity to update your model of “intelligence” than any prior time in history, no? How could it not be reasonable to have changed your mind about things? And therefore rather reasonable to have updated in some positive / negative direction?
(Maybe the best years of Cajal’s life were better? But, yeah, surely there has been tons of evidence from the last three years.)
I’m not saying you need to update in a positive direction. If you want you could update in negative direction, go for it. I’m just saying—antedently, if your model of the world isn’t hugely different now than three years ago, what was your model even doing?
Like for it not to update means that your model must have already had gears in it which were predicting stuff like: vastly improved interpretability and the manner of interpretability; RL-over-CoT; persistent lack of steganography within RL-over-CoT; policy gradient being all you need for an actually astonishing variety of stuff; continuation of persona priors over “instrumental convergence” themed RL tendencies; the rise (and fall?) of reward hacking; model specs becoming ever-more-detailed; goal-guarding ala Opus-3 being ephemeral and easily avoidable; the continued failure of “fast takeoff” despite hitting various milestones; and so on. I didn’t have all of these predicted three years ago!
So it seems pretty reasonable to actually have changed your mind a lot; I think that’s a better point to start at than “how could you change your mind.”