It seems like more of the same progress as we’ve been having so far, except it’s gotten sufficiently good at hacking (or: sufficiently smart to be very good at hacking) that [something something unprecedented cybersec danger].
If it’s the next level of pretraining compared to Opus 4 and Gemini 3 Pro, there’s potential for novel observations about what that does to the texture of capabilities. It’s the kind of thing that will predictably scale further soon without requiring algorithmic breakthroughs, and it’s not even clear that RLVR can be expected to deliver more phase changes in capabilities in the near future due to pure scaling than pretraining (even if it’s less than 1 phase change for either in expectation, until 2032 or so).
It seems like more of the same progress as we’ve been having so far, except it’s gotten sufficiently good at hacking (or: sufficiently smart to be very good at hacking) that [something something unprecedented cybersec danger].
If it’s the next level of pretraining compared to Opus 4 and Gemini 3 Pro, there’s potential for novel observations about what that does to the texture of capabilities. It’s the kind of thing that will predictably scale further soon without requiring algorithmic breakthroughs, and it’s not even clear that RLVR can be expected to deliver more phase changes in capabilities in the near future due to pure scaling than pretraining (even if it’s less than 1 phase change for either in expectation, until 2032 or so).