As far as I can tell, fiction-writing ability and humor are lacking in the best models. But it’s the type of thing that we should not be surprised to see fall this year.
I would be surprised. In my experience, AI creative writing abilities have stagnated in the past year. And since creative writing is less amenable to RLVR than activities like math and programming, AI labs don’t have an easy path toward large advances in it. Labs would have to rely on pretraining dataset collection and curation, where the low-hanging fruit has already been picked.
Registering a prediction: by the beginning of 2030, no novel in which more than 50% of the text is AI-generated will reach #1 on the New York Times bestseller list (94%).
Strong upvote, the key here is “VR” in RLVR: there are no automatically verifiable rewards for good or convincing writing, only RLHF, the cost of which scales proportionally with the length of writing evaluated (and if you hire non-Americans as RLHF trainers for economy reasons the result is unlikely to fit well with stylistic preferences of Americans). The labs can use engagement as a metric but that will lead to “baiting” already very common in the social media and will not convince anyone
I would be surprised. In my experience, AI creative writing abilities have stagnated in the past year. And since creative writing is less amenable to RLVR than activities like math and programming, AI labs don’t have an easy path toward large advances in it. Labs would have to rely on pretraining dataset collection and curation, where the low-hanging fruit has already been picked.
Registering a prediction: by the beginning of 2030, no novel in which more than 50% of the text is AI-generated will reach #1 on the New York Times bestseller list (94%).
Strong upvote, the key here is “VR” in RLVR: there are no automatically verifiable rewards for good or convincing writing, only RLHF, the cost of which scales proportionally with the length of writing evaluated (and if you hire non-Americans as RLHF trainers for economy reasons the result is unlikely to fit well with stylistic preferences of Americans). The labs can use engagement as a metric but that will lead to “baiting” already very common in the social media and will not convince anyone