I enjoyed the post. The framework challenged some of my core assumptions about AI progress, particularly given the rapid acceleration we’ve seen in the past few months with OpenAI’s GPT-o3 and Deep Research tool, and Anthropic’s Claude Code model. My mental model has been that rapid progress would continue, shortening AGI timelines—but your post makes me reconsider how much of that is genuine frontier expansion versus polish and UX improvements.
A few points where your arguments challenge my mental model and warrant further discussion:
Improvements in conversational coherence and reliability as meaningful progress You suggest that much of what makes new models “feel” smarter is actually improvements in personality rather than capability. I think this understates the importance of these refinements for alignment. If models are becoming more coherent, responsive, and easy to work with, that is a form of intelligence improvement, even if it doesn’t show up as a major shift in raw problem-solving ability. Enhanced coherence might mean better inference-time reasoning structures, clearer chain-of-thought outputs, or just more predictable behavior—all of which are valuable for practical deployment and alignment research.
Evaluating the value of GPT’s Deep Research I’ve used GPT’s Deep Research regularly, and while I agree that its outputs should not be taken at face value and the lack of access to real-time information is frustrating, I disagree with calling it “AI slop.” I’ve found it to be notably well-structured, with strong synthesis and citations compared to standard LLM-generated content. While it doesn’t replace expert judgment, it does enhance research workflows, particularly for synthesizing existing knowledge. I’d be curious if your concerns stem from specific factual inaccuracies, or more from a general skepticism about LLM-generated analysis being useful at all.
Calibration of AI timelines Your argument that LLMs are approaching diminishing returns leads you to predict AGI is farther off—perhaps into the 2030s, contingent on a major paradigm shift. I’m not so sure. Even if pure LLM scaling is running into walls, integration with other systems (symbolic reasoning, memory architectures, improved CoT training, multimodal interactions, agent scaffolding) could constitute the “next step” without requiring a completely new paradigm. The fact that OpenAI and Anthropic remain bullish on 2027-2028 AGI timelines suggests they may be seeing something internally that isn’t publicly obvious yet. I’d guess that there’s that AGI emerges before 2030, with a 20% chance that LLM-based methods alone will be sufficient (e.g., why I believe it’s plausible). If OpenAI and Anthropic are wrong, what do you think they’re miscalculating?
Again appreciate your arguments, and it’s given me a useful counterweight to my prior assumptions about fast progress. Would love to hear where you see the strongest cruxes between your perspective and those of more accelerationist takes.
Improved personality is indeed a real, important improvement in the models, but (compared to traditional pre-training scaling) it feels like more of a one-off “unhobbling” than something we should expect to continue driving improved performance in the future. Going from pure next-token-predictors to chatbots with RLHF was a huge boost in usefulness. Then, going from OpenAI’s chatbot personality to Claude’s chatbot personality was a noticeable (but much smaller) boost. But where do we go from here? I can’t really imagine a way for Anthropic to improve Claude’s personality by 10x or 100x (whatever that would even mean). Versus I can imagine scaling RL to improve a reasoning model’s math skills by 100x.
Interesting point about personality improvements being a “one-off unhobbling” with diminishing returns. But I wonder if this reflects a measurement bias rather than an actual capability ceiling: we have clear benchmarks for evaluating math skills—it’s easy to measure 100x improvement when a model goes from solving basic algebra to proving novel theorems. But how do we quantify personality improvements? There’s a vast gap between “helpful but generic” and “perfectly attuned to individual users’ needs, communication styles, and thinking patterns.”
I can imagine future models that feel like they truly understand me personally, anticipate my unstated needs, communicate in exactly my preferred style, and adapt their approach based on my emotional state—something far beyond current implementations. The lack of obvious metrics for these qualities doesn’t mean the improvement ceiling is low, just that we’re not good at measuring them yet. Thoughts?
That’s a good point—the kind of idealized personal life coach / advisor Dario describes in his post “Machines of Loving Grace” is definitely in a sense a personality upgrade over Claude 3.7. But I feel like when you think about it more closely, most of the improvements from Claude to ideal-AI-life-coach are coming from non-personality improvements, like:
having a TON of context about my personal life, interests, all my ongoing projects and relationships, etc
having more intelligence (including reasoning ability, but also fuzzier skills like social / psychological modeling) to bring to bear on brainstorming solutions to problems, or identifying the root cause of various issues, etc. (does the idea of “superpersuasion” load more heavily on superintelligence, or on “superpersonality”? seems like a bit of both; IMO you would at least need considerable intelligence even if it’s somehow mostly tied to personality)
even the gains that I’d definitely count as personality improvements, might not all come primarily from more-tasteful RLHF creating a single, ideal super-personality (like what Claude currently aims for). Instead, an ideal AI advisor product would probably be able to identify the best way of working with a given patient/customer, and tailor its personality to work well with that particular individual. RLHF as practiced today can do this to a limited extent (ie, claude can do things like sense whether a formal vs informal style of reply would be more appropriate, given the context), but I feel like new methods beyond centralized RLHF might be needed to fully customize an AI’s personality to each individual.
I enjoyed the post. The framework challenged some of my core assumptions about AI progress, particularly given the rapid acceleration we’ve seen in the past few months with OpenAI’s GPT-o3 and Deep Research tool, and Anthropic’s Claude Code model. My mental model has been that rapid progress would continue, shortening AGI timelines—but your post makes me reconsider how much of that is genuine frontier expansion versus polish and UX improvements.
A few points where your arguments challenge my mental model and warrant further discussion:
Improvements in conversational coherence and reliability as meaningful progress You suggest that much of what makes new models “feel” smarter is actually improvements in personality rather than capability. I think this understates the importance of these refinements for alignment. If models are becoming more coherent, responsive, and easy to work with, that is a form of intelligence improvement, even if it doesn’t show up as a major shift in raw problem-solving ability. Enhanced coherence might mean better inference-time reasoning structures, clearer chain-of-thought outputs, or just more predictable behavior—all of which are valuable for practical deployment and alignment research.
Evaluating the value of GPT’s Deep Research I’ve used GPT’s Deep Research regularly, and while I agree that its outputs should not be taken at face value and the lack of access to real-time information is frustrating, I disagree with calling it “AI slop.” I’ve found it to be notably well-structured, with strong synthesis and citations compared to standard LLM-generated content. While it doesn’t replace expert judgment, it does enhance research workflows, particularly for synthesizing existing knowledge. I’d be curious if your concerns stem from specific factual inaccuracies, or more from a general skepticism about LLM-generated analysis being useful at all.
Calibration of AI timelines Your argument that LLMs are approaching diminishing returns leads you to predict AGI is farther off—perhaps into the 2030s, contingent on a major paradigm shift. I’m not so sure. Even if pure LLM scaling is running into walls, integration with other systems (symbolic reasoning, memory architectures, improved CoT training, multimodal interactions, agent scaffolding) could constitute the “next step” without requiring a completely new paradigm. The fact that OpenAI and Anthropic remain bullish on 2027-2028 AGI timelines suggests they may be seeing something internally that isn’t publicly obvious yet. I’d guess that there’s that AGI emerges before 2030, with a 20% chance that LLM-based methods alone will be sufficient (e.g., why I believe it’s plausible). If OpenAI and Anthropic are wrong, what do you think they’re miscalculating?
Again appreciate your arguments, and it’s given me a useful counterweight to my prior assumptions about fast progress. Would love to hear where you see the strongest cruxes between your perspective and those of more accelerationist takes.
Improved personality is indeed a real, important improvement in the models, but (compared to traditional pre-training scaling) it feels like more of a one-off “unhobbling” than something we should expect to continue driving improved performance in the future. Going from pure next-token-predictors to chatbots with RLHF was a huge boost in usefulness. Then, going from OpenAI’s chatbot personality to Claude’s chatbot personality was a noticeable (but much smaller) boost. But where do we go from here? I can’t really imagine a way for Anthropic to improve Claude’s personality by 10x or 100x (whatever that would even mean). Versus I can imagine scaling RL to improve a reasoning model’s math skills by 100x.
Interesting point about personality improvements being a “one-off unhobbling” with diminishing returns. But I wonder if this reflects a measurement bias rather than an actual capability ceiling: we have clear benchmarks for evaluating math skills—it’s easy to measure 100x improvement when a model goes from solving basic algebra to proving novel theorems. But how do we quantify personality improvements? There’s a vast gap between “helpful but generic” and “perfectly attuned to individual users’ needs, communication styles, and thinking patterns.”
I can imagine future models that feel like they truly understand me personally, anticipate my unstated needs, communicate in exactly my preferred style, and adapt their approach based on my emotional state—something far beyond current implementations. The lack of obvious metrics for these qualities doesn’t mean the improvement ceiling is low, just that we’re not good at measuring them yet. Thoughts?
That’s a good point—the kind of idealized personal life coach / advisor Dario describes in his post “Machines of Loving Grace” is definitely in a sense a personality upgrade over Claude 3.7. But I feel like when you think about it more closely, most of the improvements from Claude to ideal-AI-life-coach are coming from non-personality improvements, like:
having a TON of context about my personal life, interests, all my ongoing projects and relationships, etc
having more intelligence (including reasoning ability, but also fuzzier skills like social / psychological modeling) to bring to bear on brainstorming solutions to problems, or identifying the root cause of various issues, etc. (does the idea of “superpersuasion” load more heavily on superintelligence, or on “superpersonality”? seems like a bit of both; IMO you would at least need considerable intelligence even if it’s somehow mostly tied to personality)
even the gains that I’d definitely count as personality improvements, might not all come primarily from more-tasteful RLHF creating a single, ideal super-personality (like what Claude currently aims for). Instead, an ideal AI advisor product would probably be able to identify the best way of working with a given patient/customer, and tailor its personality to work well with that particular individual. RLHF as practiced today can do this to a limited extent (ie, claude can do things like sense whether a formal vs informal style of reply would be more appropriate, given the context), but I feel like new methods beyond centralized RLHF might be needed to fully customize an AI’s personality to each individual.