In a very goal oriented, agency-required domain (hacking) GPT-5 seems notably better than other frontier models: https://xbow.com/blog/gpt-5
This actually updates me away from my previous position of “LLMs are insufficient and brain-like architectures might cause discontinuous capabilities upgrades” towards “maybe LLMs can be made sufficient with enough research manpower”. I still mostly believe in the first position, but now the second position seems at least possible.
In a very goal oriented, agency-required domain (hacking) GPT-5 seems notably better than other frontier models: https://xbow.com/blog/gpt-5
This actually updates me away from my previous position of “LLMs are insufficient and brain-like architectures might cause discontinuous capabilities upgrades” towards “maybe LLMs can be made sufficient with enough research manpower”. I still mostly believe in the first position, but now the second position seems at least possible.