Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 19 Jan 2025 2:56 UTC
1 point
0
RL-finetuning suffers from a “zero-to-one” problem, where the model is likely to be mostly legible and faithful by default, and difficult encoding schemes seem hard to randomly explore into.