I initially tried to use Gemini 2.5 Pro to write the whole explanation, but it kept making one mistake after another in its economics reasoning. Each rewrite would contain a new mistake after I pointed out the last one, or it would introduce a new mistake when I asked for some other kind of change. After pointing out 8 mistakes like this, I finally gave up and wrote it myself. I also tried Grok 3 and Claude 3.7 Sonnet but gave up more quickly on them after the initial responses didn’t look promising. However AI still helped a bit by reminding me of the right concepts/vocabulary.
Thought it would be worth noting this, as it seems a bit surprising. (Supposedly “phd-level” AI failing badly on an Econ 101 problem.) Here is the full transcript in case anyone is curious. Digging into this a bit myself, it appears that the “phd-level” claim is based on performance on GPQA, which includes Physics, Chemistry, and Biology, but not Economics.
I initially tried to use Gemini 2.5 Pro to write the whole explanation, but it kept making one mistake after another in its economics reasoning. Each rewrite would contain a new mistake after I pointed out the last one, or it would introduce a new mistake when I asked for some other kind of change. After pointing out 8 mistakes like this, I finally gave up and wrote it myself. I also tried Grok 3 and Claude 3.7 Sonnet but gave up more quickly on them after the initial responses didn’t look promising. However AI still helped a bit by reminding me of the right concepts/vocabulary.
Thought it would be worth noting this, as it seems a bit surprising. (Supposedly “phd-level” AI failing badly on an Econ 101 problem.) Here is the full transcript in case anyone is curious. Digging into this a bit myself, it appears that the “phd-level” claim is based on performance on GPQA, which includes Physics, Chemistry, and Biology, but not Economics.