ozziegooen comments on Announcing RoastMyPost: LLMs Eval Blog Posts and More

ozziegooen 18 Dec 2025 15:49 UTC
2 points
0
I experimented with Opus 4.5 a bit for the Fallacy Check. Results did seem a bit better, but costs were much higher.

I think the main way I could picture adding money is to add some agentic setup that does a deep review of a certain paper and presents a summary. I could see the marginal costs of this being maybe $10 to $50 per 5k words or so, using a top model like Opus. That said, the fixed costs of doing a decent job seem frustrating, especially because we’re still lacking easy API use of existing agents (My preferred method would be a high-level Claude Code API, but that doesn’t really exist yet).

I’ve been thinking of having competitions here, for people to make their own reviews, then we could compare with a few researchers and LLMs. I think this area could make for a lot of cleverness and innovation.