A flag is that to the extent that the 4 month doubling time is based on RL with verifiable rewards/RL on CoT, this may not hold for long, because the paper provides evidence that RL doesn’t actually increase capabilities indefinitely, and puts a pretty harsh limit on how far RL can scale (but see @Jozdien’s response to the paper below):
Nice, so if we return to 7 month doubling time in the not too distant future that’s compatible with reasoning models being the cause, but not AI accelerating development. Cool, looking forward to seeing how this unfolds, and set up a market.
A flag is that to the extent that the 4 month doubling time is based on RL with verifiable rewards/RL on CoT, this may not hold for long, because the paper provides evidence that RL doesn’t actually increase capabilities indefinitely, and puts a pretty harsh limit on how far RL can scale (but see @Jozdien’s response to the paper below):
https://www.lesswrong.com/posts/s3NaETDujoxj4GbEm/tsinghua-paper-does-rl-really-incentivize-reasoning-capacity#Mkuqt7x7YojpJuCGt (OG post)
https://www.lesswrong.com/posts/s3NaETDujoxj4GbEm/tsinghua-paper-does-rl-really-incentivize-reasoning-capacity#Mkuqt7x7YojpJuCGt (Jozdien’s response)
Nice, so if we return to 7 month doubling time in the not too distant future that’s compatible with reasoning models being the cause, but not AI accelerating development. Cool, looking forward to seeing how this unfolds, and set up a market.