Thane Ruthenis comments on Thane Ruthenis’s Shortform

Thane Ruthenis 11 Feb 2025 16:10 UTC
5 points
0
I do think RL works “as intended” to some extent, teaching models some actual reasoning skills, much like SSL works “as intended” to some extent, chiseling-in some generalizable knowledge. The question is to what extent it’s one or the other.