This should be probably only attempted with clear and huge warning that it’s a LLM authored comment. Because LLMs are good at matching style without matching the content, it could go with exploiting heuristics of the users calibrated only for human level of honesty / reliability / non-bulshitting.
Also check this comment about how conditioning on the karma score can give you hallucinated strong evidence:
Makes me think, what about humans who would do the same thing? But probably the difference is that humans can build their credibility over time, and if someone new posted an unlikely comment, they would be called out on that.
It’s really hard for humans to match the style / presentation / language without putting a lot of work into understanding the target of the comment. LLMs are inherently worse (right now) at doing the understanding, coming up with things worth saying, being calibrated about being critical AND they are a lot better at just imitating the style.
This just invalidates some side signals humans habitually use on one another.
This should be probably only attempted with clear and huge warning that it’s a LLM authored comment. Because LLMs are good at matching style without matching the content, it could go with exploiting heuristics of the users calibrated only for human level of honesty / reliability / non-bulshitting.
Also check this comment about how conditioning on the karma score can give you hallucinated strong evidence:
https://www.lesswrong.com/posts/PQaZiATafCh7n5Luf/gwern-s-shortform?commentId=smBq9zcrWaAavL9G7
Okay, that pretty much ruins the idea.
Makes me think, what about humans who would do the same thing? But probably the difference is that humans can build their credibility over time, and if someone new posted an unlikely comment, they would be called out on that.
It’s really hard for humans to match the style / presentation / language without putting a lot of work into understanding the target of the comment. LLMs are inherently worse (right now) at doing the understanding, coming up with things worth saying, being calibrated about being critical AND they are a lot better at just imitating the style.
This just invalidates some side signals humans habitually use on one another.
Presumably, this happens: https://slatestarcodex.com/2016/12/12/might-people-on-the-internet-sometimes-lie/
I do often notice how the top upvoted reddit comment in big subs is confidently wrong, with a more correct/nuanced take sitting much lower.