Come to think of it, this would make for an interesting contest.
I’ve suggested that if someone was serious about creative writing benchmarking of novels, they should have a contest where you must provide a fully automated solution the organizers run, which is mandated to use up at least $1000 of tokens, and then release into the real world for marketplace evaluation.
I’ve suggested that if someone was serious about creative writing benchmarking of novels, they should have a contest where you must provide a fully automated solution the organizers run, which is mandated to use up at least $1000 of tokens, and then release into the real world for marketplace evaluation.