This is an interesting direction, a subset of AI for epistemics.
I note you didn’t include Opus, so I’m betting your GPT 5.3 wasn’t with thinking enabled since both require subscriptions. Those models are far smarter than any you tested.
Thanks, yea fair point, just tried them and would score them as follows:
Opus 4.6: 1
Opus 4.7: 2
GPT5.3 thinking mode: 3
So not a massive shift in the overall shape of the conclusions (still got the best result given what I was after from Sonnet + Aiden) but GPT5.3 seemed genuinely insightful in a way I will need to explore further. Main reasons they didn’t get higher scores was generally sticking with the goals as framed, and suggesting how to achieve them rather than suggesting alternative goals.
This is an interesting direction, a subset of AI for epistemics.
I note you didn’t include Opus, so I’m betting your GPT 5.3 wasn’t with thinking enabled since both require subscriptions. Those models are far smarter than any you tested.
Thanks, yea fair point, just tried them and would score them as follows:
Opus 4.6: 1
Opus 4.7: 2
GPT5.3 thinking mode: 3
So not a massive shift in the overall shape of the conclusions (still got the best result given what I was after from Sonnet + Aiden) but GPT5.3 seemed genuinely insightful in a way I will need to explore further. Main reasons they didn’t get higher scores was generally sticking with the goals as framed, and suggesting how to achieve them rather than suggesting alternative goals.
Curious to get your experience with GPT5.5 too.