Noosphere89 comments on plex’s Shortform

Noosphere89 8 May 2025 16:13 UTC
4 points
0
A flag is that to the extent that the 4 month doubling time is based on RL with verifiable rewards/RL on CoT, this may not hold for long, because the paper provides evidence that RL doesn’t actually increase capabilities indefinitely, and puts a pretty harsh limit on how far RL can scale (but see @Jozdien’s response to the paper below):
https://www.lesswrong.com/posts/s3NaETDujoxj4GbEm/tsinghua-paper-does-rl-really-incentivize-reasoning-capacity#Mkuqt7x7YojpJuCGt (OG post)
https://www.lesswrong.com/posts/s3NaETDujoxj4GbEm/tsinghua-paper-does-rl-really-incentivize-reasoning-capacity#Mkuqt7x7YojpJuCGt (Jozdien’s response)
- plex 8 May 2025 16:28 UTC
  2 points
  0
  Parent
  Nice, so if we return to 7 month doubling time in the not too distant future that’s compatible with reasoning models being the cause, but not AI accelerating development. Cool, looking forward to seeing how this unfolds, and set up a market.