Claude 4.5 is already superhuman in some areas, including:
Breadth of knowledge.
Understanding complex context “at a glance.”
Speed, at least for many things.
But there are other essential abilities where leading LLMs are dumber than diligent 7 year old. Gemini is one of the stronger visual models, and I routinely benchmark it failing simple visual tasks that any child could solve.
And then there’s software development. I use Claude for software development, and it’s quite skilled at many simple tasks. But I also spend a lot of time dealing with ill-conceived shit code that it writes. You can’t just give an irresponsible junior programmer a copy of Claude Code and allow them commit straight to main with no review. If you do, you will learn the meaning of the word “slop.” In the hands of a skilled professional who takes 100% responsibility for the output, Claude Code is useful. In the hands of an utter newb who can’t do anything on their own, it’s also great. But it can’t do anything serious without massive handholding and close expert supervision.
So my take is that frontier models are terrifyingly impressive if you’re paying attention, but they are still critically broken in ways that make even the simplest real-world use cases a constant struggle. (To be clear, I think this is good: I’m a hardcore doomer.)
And the AI safety space is not in denial about this. It’s extremely common for AI safety people to worry about either irrevocable loss or human control or even complete human extinction in the 2030s. You don’t even have to look that far around here to find someone who’d estimate a 10% chance of human loss of control in the next 5 years.
Claude 4.5 is already superhuman in some areas, including:
Breadth of knowledge.
Understanding complex context “at a glance.”
Speed, at least for many things.
But there are other essential abilities where leading LLMs are dumber than diligent 7 year old. Gemini is one of the stronger visual models, and I routinely benchmark it failing simple visual tasks that any child could solve.
And then there’s software development. I use Claude for software development, and it’s quite skilled at many simple tasks. But I also spend a lot of time dealing with ill-conceived shit code that it writes. You can’t just give an irresponsible junior programmer a copy of Claude Code and allow them commit straight to main with no review. If you do, you will learn the meaning of the word “slop.” In the hands of a skilled professional who takes 100% responsibility for the output, Claude Code is useful. In the hands of an utter newb who can’t do anything on their own, it’s also great. But it can’t do anything serious without massive handholding and close expert supervision.
So my take is that frontier models are terrifyingly impressive if you’re paying attention, but they are still critically broken in ways that make even the simplest real-world use cases a constant struggle. (To be clear, I think this is good: I’m a hardcore doomer.)
And the AI safety space is not in denial about this. It’s extremely common for AI safety people to worry about either irrevocable loss or human control or even complete human extinction in the 2030s. You don’t even have to look that far around here to find someone who’d estimate a 10% chance of human loss of control in the next 5 years.