Pain points (besides what lc noted) that may already be mitigated but I don’t understand yet:
I don’t know, long term, how well this handles things like malicious users tampering with the prompt or response.
If the LLM is dumb, how do we correct it?
When I use LLMs for fact-checking, they seem to be really credulous. I keep a bucket of example interactions for regression testing my prompts on, and there are a few where weird early search results hijack the framing (like a SparkNotes examining The Matrix as more Gnostic than Christian taking Claude totally off the rails).
Not a nit but a wishlist: I wonder if the LLMs can be persuaded into more general contextualizing at the same time, like adding the referenced study to a news article or offering a brief background on an event it mentions. These would be very scope-creep-y, but I’m trying to compare how I would fact-check an article to find new potential methods.
lc this is a really interesting project. Thanks for sharing.
Yeah, this is a fantastically interesting idea.
Pain points (besides what lc noted) that may already be mitigated but I don’t understand yet:
I don’t know, long term, how well this handles things like malicious users tampering with the prompt or response.
If the LLM is dumb, how do we correct it?
When I use LLMs for fact-checking, they seem to be really credulous. I keep a bucket of example interactions for regression testing my prompts on, and there are a few where weird early search results hijack the framing (like a SparkNotes examining The Matrix as more Gnostic than Christian taking Claude totally off the rails).
Not a nit but a wishlist: I wonder if the LLMs can be persuaded into more general contextualizing at the same time, like adding the referenced study to a news article or offering a brief background on an event it mentions. These would be very scope-creep-y, but I’m trying to compare how I would fact-check an article to find new potential methods.
lc this is a really interesting project. Thanks for sharing.