I am treating LLM output as somewhat less trustworthy than I would trust what a colleague of mine says, but not fundamentally different.
If you’re asking a human about some even mildly specialized topic, like history of Spain in the 17th century or different crop rotation methods or ordinary differential equations, and there’s no special reason that they really want to appear like they know what they’re talking about, they’ll generally just say “IDK”. LLMs are much less like that IME. I think this is actually a big difference in practice, at least in the domains I’ve tried (reproductive biology). LLMs routinely give misleading / false / out-of-date / vague-but-deceptively-satiating summaries.
I agree the LLMs are somewhat worse, especially compared to rationalist-adjacent experts in specialized fields, but they really aren’t that bad for most things. Like I researched the state of the art of datacenter security practices yesterday, and I am not like 99% confident that the AI got everything right, but I am pretty sure it helped me understand the rough shape of things a lot better.
This seems fine and good—for laying some foundations, which you can use for your own further theorizing, which will make you ready to learn from more reliable + rich expert sources over time. Then you can report that stuff. If instead you’re directly reporting your immediately-post-LLM models, I currently don’t think I want to read that stuff, or would want a warning. (I’m not necessarily pushing for some big policy, that seems hard. I would push for personal standards though.)
If you’re asking a human about some even mildly specialized topic, like history of Spain in the 17th century or different crop rotation methods or ordinary differential equations, and there’s no special reason that they really want to appear like they know what they’re talking about, they’ll generally just say “IDK”. LLMs are much less like that IME. I think this is actually a big difference in practice, at least in the domains I’ve tried (reproductive biology). LLMs routinely give misleading / false / out-of-date / vague-but-deceptively-satiating summaries.
I agree the LLMs are somewhat worse, especially compared to rationalist-adjacent experts in specialized fields, but they really aren’t that bad for most things. Like I researched the state of the art of datacenter security practices yesterday, and I am not like 99% confident that the AI got everything right, but I am pretty sure it helped me understand the rough shape of things a lot better.
This seems fine and good—for laying some foundations, which you can use for your own further theorizing, which will make you ready to learn from more reliable + rich expert sources over time. Then you can report that stuff. If instead you’re directly reporting your immediately-post-LLM models, I currently don’t think I want to read that stuff, or would want a warning. (I’m not necessarily pushing for some big policy, that seems hard. I would push for personal standards though.)