What do you mean by C. Mythos outperforming just 12-year-olds? It seems to me that any plausible task given to a schoolboy (or even a 1st year student?) is either ARC-AGI-3-like, not more complex than ARC-AGI-2 (which is ~70% solved by Opus 4.6) or is such that LLMs do it far better than humans (e.g. basic group theory, olympiads-style math). Humans are likely better at complex ideation and keeping in mind long contexts. The AI-2027 forecasters also imply that the median date for the last human coders (who have long contexts, but not that complex ideas) to be outperformed is June 2028, and the ASI is thought to emerge in May 2029 (but I don’t understand whether it assumes the Race Ending or the Slowdown Ending). The original AI-2027 forecast outright assumed that Agent-4, the superhuman AI researcher will emerge in ~6 months after Agent-2 became superhuman at coding.
My view on this, based on my observations, reasoning and the problems we’re looking to solve within out domains
Human jobs are a switching regime between thinking work and conforming work (following of rules, codified and implicitly understood).
Rule following is something humans do with near 100% concordance (Once they choose to cooperate), whereas AI struggles with even the most basic rulesets. I believe this holds true at 12 y.o. level as well (based on my observations of 12 y.o. Playing board games—something where i think they would beat AI today) - it’s the social rules 12 y.o. Struggle with, mainly due to insufficiently formed heuristics
My experience with Opus is still that it fails at this, both due to it’s regime switching classifier being poor (i don’t expect you to think about the merit of these rules, just to follow them) , and because it fails at even basic formal logic (if measured with consistency of a human).
It is able to code logic as a result of its translation ability, but whenever I need it to infer logic chains—even in programming i ger rapid degradation.
All of this tells me that we’re not at 12 y.o. replacement yet
Math olympiad and benchmark contexts are all purposely less messy than the real world
Part of me also suspects that it may be less than 12. The AI struggled a lot at Pokemon and although they manage to win now, iirc they still make mistakes that humans (even 12 year old) would never make.
I do think though, that some of the worst examples of AI failing at a task humans easily do are caused by reasons other than intelligence. E.g. AI performance in ARC-AGI-3 greatly improved with scaffolding. The scaffolding team explains that the AI did poorly in part due to difficulty recognizing shapes. Once the AI understood the problem, it could do well and write algorithms to find the optimal solution.
As I left my comment there I got to thinking that rule following is a function of accountability and social pressure—therefore it can be solved with scaffolding (and indeed that’s how we’re solving it in our domain) - but the epistemic grounding and formal logic internalisation as skills seem a long way off.
My best fitting metaphor (albeit anthropomorphic which I’m not a fan of) for AI at the moment isn’t that it’s humanly intelligent, it’s that it can perform some economically viable tasks (coding) at savant levels—but operating from system 1 and “gut” in human terms, rather than from system 2 (yes even in reasoning mode, as the reasoning isn’t grounded in logic, but still in chaining instinct).
I also feel that AI seems to be using intuition instead of logic. Often the answer it gives matches my surface level intuition, the answer someone would give at first thought, but it doesn’t seem to think things through with a world model and everything.
Even when the AI does arithmetic, it feels like it’s answering using intuition. Imagine you stare at two numbers and just know what they multiply to. It’s quite an alien way of thinking. The answer would be approximately right but the last few digits might be wrong. (Or at least this is how things were before they fixed it by training the AI use tools by default.)
I completely agree that AI is far better at humans at some tasks and far worse at others, so when you pick an age of humans to be comparable to AI, the comparison will be full of tasks where one side beats the other by a large margin.
However, that doesn’t imply that “outperforming” can’t be defined. It’s the thought experiment of randomly picking a real world job (maybe from 2020, before ChatGPT existed). We have 12 year olds try to do it. If they all get fired in the first week, it means the job is too hard for 12 year olds to do. If they don’t get fired, it means 12 year olds can do the job.
We then imagine asking the AI model to attempt all the jobs 12 year olds can do. If they outperform the 12 year olds on most of these jobs, it means the AI’s Job Replacement Age is higher than 12. If they underperform the 12 year olds on most of these jobs, it’s lower, because 12 year olds have more “real world employability” than the AI.
I guess you’re right that AI coding ability complicates things, maybe we should ignore jobs which the AI does better because the 12 year old can’t do the job at all. You’re right that we shouldn’t be comparing their abilities in disjoint sets of jobs!
What do you mean by C. Mythos outperforming just 12-year-olds? It seems to me that any plausible task given to a schoolboy (or even a 1st year student?) is either ARC-AGI-3-like, not more complex than ARC-AGI-2 (which is ~70% solved by Opus 4.6) or is such that LLMs do it far better than humans (e.g. basic group theory, olympiads-style math). Humans are likely better at complex ideation and keeping in mind long contexts. The AI-2027 forecasters also imply that the median date for the last human coders (who have long contexts, but not that complex ideas) to be outperformed is June 2028, and the ASI is thought to emerge in May 2029 (but I don’t understand whether it assumes the Race Ending or the Slowdown Ending). The original AI-2027 forecast outright assumed that Agent-4, the superhuman AI researcher will emerge in ~6 months after Agent-2 became superhuman at coding.
My view on this, based on my observations, reasoning and the problems we’re looking to solve within out domains
Human jobs are a switching regime between thinking work and conforming work (following of rules, codified and implicitly understood).
Rule following is something humans do with near 100% concordance (Once they choose to cooperate), whereas AI struggles with even the most basic rulesets. I believe this holds true at 12 y.o. level as well (based on my observations of 12 y.o. Playing board games—something where i think they would beat AI today) - it’s the social rules 12 y.o. Struggle with, mainly due to insufficiently formed heuristics
My experience with Opus is still that it fails at this, both due to it’s regime switching classifier being poor (i don’t expect you to think about the merit of these rules, just to follow them) , and because it fails at even basic formal logic (if measured with consistency of a human).
It is able to code logic as a result of its translation ability, but whenever I need it to infer logic chains—even in programming i ger rapid degradation.
All of this tells me that we’re not at 12 y.o. replacement yet
Math olympiad and benchmark contexts are all purposely less messy than the real world
Part of me also suspects that it may be less than 12. The AI struggled a lot at Pokemon and although they manage to win now, iirc they still make mistakes that humans (even 12 year old) would never make.
I do think though, that some of the worst examples of AI failing at a task humans easily do are caused by reasons other than intelligence. E.g. AI performance in ARC-AGI-3 greatly improved with scaffolding. The scaffolding team explains that the AI did poorly in part due to difficulty recognizing shapes. Once the AI understood the problem, it could do well and write algorithms to find the optimal solution.
As I left my comment there I got to thinking that rule following is a function of accountability and social pressure—therefore it can be solved with scaffolding (and indeed that’s how we’re solving it in our domain) - but the epistemic grounding and formal logic internalisation as skills seem a long way off.
My best fitting metaphor (albeit anthropomorphic which I’m not a fan of) for AI at the moment isn’t that it’s humanly intelligent, it’s that it can perform some economically viable tasks (coding) at savant levels—but operating from system 1 and “gut” in human terms, rather than from system 2 (yes even in reasoning mode, as the reasoning isn’t grounded in logic, but still in chaining instinct).
I also feel that AI seems to be using intuition instead of logic. Often the answer it gives matches my surface level intuition, the answer someone would give at first thought, but it doesn’t seem to think things through with a world model and everything.
Even when the AI does arithmetic, it feels like it’s answering using intuition. Imagine you stare at two numbers and just know what they multiply to. It’s quite an alien way of thinking. The answer would be approximately right but the last few digits might be wrong. (Or at least this is how things were before they fixed it by training the AI use tools by default.)
I completely agree that AI is far better at humans at some tasks and far worse at others, so when you pick an age of humans to be comparable to AI, the comparison will be full of tasks where one side beats the other by a large margin.
However, that doesn’t imply that “outperforming” can’t be defined. It’s the thought experiment of randomly picking a real world job (maybe from 2020, before ChatGPT existed). We have 12 year olds try to do it. If they all get fired in the first week, it means the job is too hard for 12 year olds to do. If they don’t get fired, it means 12 year olds can do the job.
We then imagine asking the AI model to attempt all the jobs 12 year olds can do. If they outperform the 12 year olds on most of these jobs, it means the AI’s Job Replacement Age is higher than 12. If they underperform the 12 year olds on most of these jobs, it’s lower, because 12 year olds have more “real world employability” than the AI.
I guess you’re right that AI coding ability complicates things, maybe we should ignore jobs which the AI does better because the 12 year old can’t do the job at all. You’re right that we shouldn’t be comparing their abilities in disjoint sets of jobs!