I argue that this tendency is difficult to learn in short-form, because it’s hard to realize that the payoff is never coming when it has to come now or never
I bet you that current frontier models, when challenged to write prose in the 200-word range, will make all the mistakes I describe in my post or you describe in your comment.
You point out the mistake of hints and promises that you can’t deliver on. I claim that current models will absolutely do this even in 200-word works. Once we can RL it out of the models in this range, we can keep going longer.
I’ll admit that 200 is kind of absurdly short in a way that creates substantive, qualitative differences from more common types of writing (e.g. maybe it really penalizes certain kinds of action or dialogue). I could be convinced that ~500 is right.
I agree that they will make these mistakes at that scope, I’m claiming that the solution won’t scale—if you RL models to not do this in 200 words, I don’t think that will make it substantially easier for them to not do it at 5k words, except insofar as it trains them to not hint at things ever. I haven’t found frontier models to be significantly more tasteful or better at writing prose than less capable models, despite being generally smarter and better at some seemingly-related parts of creative writing, so my intuition is that current scaling levers are unlikely to address this problem well.
The specific dynamics of RL here are better discovered empirically, and in any case is not precisely within scope.
I was thinking of a more general optimization loop, as in: what evals should we make, how can we track model progress on writing, etc. My suggestion is that once we figure out how to make models write well in this playground (where evaluation is easier, generation is cheaper, etc.) -- either by training or pushing on things like harness design—we’ll be in a good position to improve LLM writing abilities more generally.
I bet you that current frontier models, when challenged to write prose in the 200-word range, will make all the mistakes I describe in my post or you describe in your comment.
You point out the mistake of hints and promises that you can’t deliver on. I claim that current models will absolutely do this even in 200-word works. Once we can RL it out of the models in this range, we can keep going longer.
I’ll admit that 200 is kind of absurdly short in a way that creates substantive, qualitative differences from more common types of writing (e.g. maybe it really penalizes certain kinds of action or dialogue). I could be convinced that ~500 is right.
I agree that they will make these mistakes at that scope, I’m claiming that the solution won’t scale—if you RL models to not do this in 200 words, I don’t think that will make it substantially easier for them to not do it at 5k words, except insofar as it trains them to not hint at things ever. I haven’t found frontier models to be significantly more tasteful or better at writing prose than less capable models, despite being generally smarter and better at some seemingly-related parts of creative writing, so my intuition is that current scaling levers are unlikely to address this problem well.
The specific dynamics of RL here are better discovered empirically, and in any case is not precisely within scope.
I was thinking of a more general optimization loop, as in: what evals should we make, how can we track model progress on writing, etc. My suggestion is that once we figure out how to make models write well in this playground (where evaluation is easier, generation is cheaper, etc.) -- either by training or pushing on things like harness design—we’ll be in a good position to improve LLM writing abilities more generally.