I am wondering whether your experiences were formed via the first generation of reasoning models, and my guess is you are also thinking of asking different kinds of questions.
The thing that LLMs are really great at is to speak and think in the ontology and structure that is prevalent among experts in any field. This is usually where the vast majority of evidence comes from. LLMs aren’t going to make up whole ontologies about how bankruptcy law works, or how datacenter security works. It might totally make up details, but it won’t make up the high-level picture.
Second, this has just gotten a lot better over the last 6 months. GPT-5 still lies a good amount, but vastly less than o1 or o3. I found o1 almost unusable on this dimension.
I am wondering whether your experiences were formed via the first generation of reasoning models, and my guess is you are also thinking of asking different kinds of questions.
The thing that LLMs are really great at is to speak and think in the ontology and structure that is prevalent among experts in any field. This is usually where the vast majority of evidence comes from. LLMs aren’t going to make up whole ontologies about how bankruptcy law works, or how datacenter security works. It might totally make up details, but it won’t make up the high-level picture.
Second, this has just gotten a lot better over the last 6 months. GPT-5 still lies a good amount, but vastly less than o1 or o3. I found o1 almost unusable on this dimension.