StanislavKrym comments on How Well Does RL Scale?

StanislavKrym 22 Oct 2025 13:27 UTC
3 points
0
Now that RL-training is nearing its effective limit, we may have lost the ability to effectively turn more compute into more intelligence.
The problem is that we haven’t tried many new architectures and don’t know what exactly is the key to building a capable architecture. However, we are likely on track to finding that CoT-based AIs don’t scale to superhuman coders which could be necessary for AI alignment.