My view on this, based on my observations, reasoning and the problems we’re looking to solve within out domains
Human jobs are a switching regime between thinking work and conforming work (following of rules, codified and implicitly understood).
Rule following is something humans do with near 100% concordance (Once they choose to cooperate), whereas AI struggles with even the most basic rulesets. I believe this holds true at 12 y.o. level as well (based on my observations of 12 y.o. Playing board games—something where i think they would beat AI today) - it’s the social rules 12 y.o. Struggle with, mainly due to insufficiently formed heuristics
My experience with Opus is still that it fails at this, both due to it’s regime switching classifier being poor (i don’t expect you to think about the merit of these rules, just to follow them) , and because it fails at even basic formal logic (if measured with consistency of a human).
It is able to code logic as a result of its translation ability, but whenever I need it to infer logic chains—even in programming i ger rapid degradation.
All of this tells me that we’re not at 12 y.o. replacement yet
Math olympiad and benchmark contexts are all purposely less messy than the real world
Part of me also suspects that it may be less than 12. The AI struggled a lot at Pokemon and although they manage to win now, iirc they still make mistakes that humans (even 12 year old) would never make.
I do think though, that some of the worst examples of AI failing at a task humans easily do are caused by reasons other than intelligence. E.g. AI performance in ARC-AGI-3 greatly improved with scaffolding. The scaffolding team explains that the AI did poorly in part due to difficulty recognizing shapes. Once the AI understood the problem, it could do well and write algorithms to find the optimal solution.
As I left my comment there I got to thinking that rule following is a function of accountability and social pressure—therefore it can be solved with scaffolding (and indeed that’s how we’re solving it in our domain) - but the epistemic grounding and formal logic internalisation as skills seem a long way off.
My best fitting metaphor (albeit anthropomorphic which I’m not a fan of) for AI at the moment isn’t that it’s humanly intelligent, it’s that it can perform some economically viable tasks (coding) at savant levels—but operating from system 1 and “gut” in human terms, rather than from system 2 (yes even in reasoning mode, as the reasoning isn’t grounded in logic, but still in chaining instinct).
I also feel that AI seems to be using intuition instead of logic. Often the answer it gives matches my surface level intuition, the answer someone would give at first thought, but it doesn’t seem to think things through with a world model and everything.
Even when the AI does arithmetic, it feels like it’s answering using intuition. Imagine you stare at two numbers and just know what they multiply to. It’s quite an alien way of thinking. The answer would be approximately right but the last few digits might be wrong. (Or at least this is how things were before they fixed it by training the AI use tools by default.)
My view on this, based on my observations, reasoning and the problems we’re looking to solve within out domains
Human jobs are a switching regime between thinking work and conforming work (following of rules, codified and implicitly understood).
Rule following is something humans do with near 100% concordance (Once they choose to cooperate), whereas AI struggles with even the most basic rulesets. I believe this holds true at 12 y.o. level as well (based on my observations of 12 y.o. Playing board games—something where i think they would beat AI today) - it’s the social rules 12 y.o. Struggle with, mainly due to insufficiently formed heuristics
My experience with Opus is still that it fails at this, both due to it’s regime switching classifier being poor (i don’t expect you to think about the merit of these rules, just to follow them) , and because it fails at even basic formal logic (if measured with consistency of a human).
It is able to code logic as a result of its translation ability, but whenever I need it to infer logic chains—even in programming i ger rapid degradation.
All of this tells me that we’re not at 12 y.o. replacement yet
Math olympiad and benchmark contexts are all purposely less messy than the real world
Part of me also suspects that it may be less than 12. The AI struggled a lot at Pokemon and although they manage to win now, iirc they still make mistakes that humans (even 12 year old) would never make.
I do think though, that some of the worst examples of AI failing at a task humans easily do are caused by reasons other than intelligence. E.g. AI performance in ARC-AGI-3 greatly improved with scaffolding. The scaffolding team explains that the AI did poorly in part due to difficulty recognizing shapes. Once the AI understood the problem, it could do well and write algorithms to find the optimal solution.
As I left my comment there I got to thinking that rule following is a function of accountability and social pressure—therefore it can be solved with scaffolding (and indeed that’s how we’re solving it in our domain) - but the epistemic grounding and formal logic internalisation as skills seem a long way off.
My best fitting metaphor (albeit anthropomorphic which I’m not a fan of) for AI at the moment isn’t that it’s humanly intelligent, it’s that it can perform some economically viable tasks (coding) at savant levels—but operating from system 1 and “gut” in human terms, rather than from system 2 (yes even in reasoning mode, as the reasoning isn’t grounded in logic, but still in chaining instinct).
I also feel that AI seems to be using intuition instead of logic. Often the answer it gives matches my surface level intuition, the answer someone would give at first thought, but it doesn’t seem to think things through with a world model and everything.
Even when the AI does arithmetic, it feels like it’s answering using intuition. Imagine you stare at two numbers and just know what they multiply to. It’s quite an alien way of thinking. The answer would be approximately right but the last few digits might be wrong. (Or at least this is how things were before they fixed it by training the AI use tools by default.)