I mean, GPT-5 getting 43% of PhD problems right isn’t particularly bad. I don’t know about making new insights but it doesn’t seem like it would be unachievable (especially as it’s possible that prompting/tooling/agent scaffolding might compensate for some of the problems).
I mean, GPT-5 getting 43% of PhD problems right isn’t particularly bad. I don’t know about making new insights but it doesn’t seem like it would be unachievable (especially as it’s possible that prompting/tooling/agent scaffolding might compensate for some of the problems).