ozziegooen comments on Announcing RoastMyPost: LLMs Eval Blog Posts and More

ozziegooen 17 Dec 2025 20:05 UTC
5 points
0
Thanks for trying it out and reporting your findings!

It’s tricky to tune the system to both flag important errors, but not flag too many errors. Right now I’ve been focusing on the former, assuming that it’s better to show too many errors than too few.

The Fact Check definitely does have mistakes (often due to the chunking, as you flagged).

The Fallacy Check is very overzealous—I scaled it back, but will continue to adjust it. I think that overall the fallacy check style is quite tricky to do, and I’ve been thinking about some much more serious approaches. If people here have ideas or implementations I’d be very curious!
- Brendan Long 17 Dec 2025 21:05 UTC
  3 points
  0
  Parent
  One option would be to use chunks to identify potential fallacies, and then re-run globally asking if the potential fallacy is actually a fallacy in the context of the whole post. I’m not sure if this would be too expensive though?
  - ozziegooen 17 Dec 2025 21:18 UTC
    5 points
    0
    Parent
    Agreed!
    
    The workflow we have does use a step for this. This specific workflow:
    1. Chunks document
    2. Runs analysis on each chunk, producing a long set of total comments.
    3. Then, all the comments are fed into a final step. This step sees the full post. It then removes a bunch of comments and writes a summary.
    
    I think it could use a lot more work and tuning. Generally, I’ve found these workflows fairly tricky and time-intensive to work on so far. I assume they will get easier in the next year or so.