ryan_greenblatt comments on Fabien’s Shortform

ryan_greenblatt 23 Mar 2025 1:15 UTC
LW: 2 AF: 2
0
AF

First, I think RL is more like 10,000x than 100x less efficient than SL (deepseek v3 probably can’t be compressed much below 10GB, while deepseek r1-zero stage can probably be compressed to 1MB of transcripts, despite both being roughly 1e24 FLOP).

Maybe this is true for SL on trajectories from the post-RL policy, but this doesn’t clearly seem like the right way to think about it from my perspective.