Fwiw, in my experience LLMs lie far more than early Wikipedia or any human I know, and in subtler and harder to detect ways. My spot checks for accuracy have been so dismal/alarming that at this point I basically only use them as search engines to find things humans have said.
I am wondering whether your experiences were formed via the first generation of reasoning models, and my guess is you are also thinking of asking different kinds of questions.
The thing that LLMs are really great at is to speak and think in the ontology and structure that is prevalent among experts in any field. This is usually where the vast majority of evidence comes from. LLMs aren’t going to make up whole ontologies about how bankruptcy law works, or how datacenter security works. It might totally make up details, but it won’t make up the high-level picture.
Second, this has just gotten a lot better over the last 6 months. GPT-5 still lies a good amount, but vastly less than o1 or o3. I found o1 almost unusable on this dimension.
Datapoint: I’m currently setting up a recording studio at Lighthaven, and I am using them all the time to get guides for things like “how to change a setting on this camera” or “how to use this microphone” or “how to use this recording software”.
Yes, they confabulate menus and things a lot, but as long as I keep uploading photos of what I actually see, they know the basics much better than me (e.g. what bit rate to set the video vs the audio, where to look to kill the random white noise input I’m getting, etc).
I’d say they confabulate like 50% of the time but that they’re still a much more effective search engine for me than google, and can read the manual much faster than me. My guess is I simply couldn’t do some of the projects I’m doing without them.
Fwiw, in my experience LLMs lie far more than early Wikipedia or any human I know, and in subtler and harder to detect ways. My spot checks for accuracy have been so dismal/alarming that at this point I basically only use them as search engines to find things humans have said.
I am wondering whether your experiences were formed via the first generation of reasoning models, and my guess is you are also thinking of asking different kinds of questions.
The thing that LLMs are really great at is to speak and think in the ontology and structure that is prevalent among experts in any field. This is usually where the vast majority of evidence comes from. LLMs aren’t going to make up whole ontologies about how bankruptcy law works, or how datacenter security works. It might totally make up details, but it won’t make up the high-level picture.
Second, this has just gotten a lot better over the last 6 months. GPT-5 still lies a good amount, but vastly less than o1 or o3. I found o1 almost unusable on this dimension.
Datapoint: I’m currently setting up a recording studio at Lighthaven, and I am using them all the time to get guides for things like “how to change a setting on this camera” or “how to use this microphone” or “how to use this recording software”.
Yes, they confabulate menus and things a lot, but as long as I keep uploading photos of what I actually see, they know the basics much better than me (e.g. what bit rate to set the video vs the audio, where to look to kill the random white noise input I’m getting, etc).
I’d say they confabulate like 50% of the time but that they’re still a much more effective search engine for me than google, and can read the manual much faster than me. My guess is I simply couldn’t do some of the projects I’m doing without them.