Seth Herd answers What are the strongest arguments for very short timelines?

Seth Herd 23 Dec 2024 22:14 UTC
50 points
15
I think the gaps between where we are and roughly human-level cognition are smaller than they appear. Modest improvements in to-date neglected cognitive systems can allow LLMs to apply their cognitive abilities in more ways, allowing more human-like routes to performance and learning. These strengths will build on each other nonlinearly (while likely also encountering unexpected roadblocks).
Timelines are thus very difficult to predict, but ruling out very short timelines based on averaging predictions without gears-level models of fast routes to AGI would be a big mistake. Whether and how quickly they work is an empirical question.
One blocker to taking short timelines seriously is the belief that fast timelines mean likely human extinction. I think they’re extremely dangerous but that possible routes to alignment also exist—but that’s a separate question.
I also think this is the current default path, or I wouldn’t describe it.
I think my research career using deep nets and cognitive architectures to understand human cognition is pretty relevant for making good predictions on this path to AGI. But I’m biased, just like everyone else.
Anyway, here’s very roughly why I think the gaps are smaller than they appear.
Current LLMs are like humans with excellent:
- language abilities,
- semantic memory
- working memory
They can now do almost all short time-horizon tasks that are framed in language better than humans. And other networks can translate real-world systems into language and code, where humans haven’t already done it.
But current LLMs/foundation models are dramatically missing some human cognitive abilities:
- Almost no episodic memory for specific important experiences
- No agency—they do only what they’re told
- Poor executive function (self-management of cognitive tasks)
  - Relatedly, bad/incompetent at long time-horizon tasks.
- And zero continuous learning (and self-directed learning)
  - Crucial for human performance on complex tasks
Those lacks would appear to imply long timelines.
But both long time-horizon tasks and self-directed learning are fairly easy to reach. The gaps are not as large as they appear.
Agency is as simple as repeatedly calling a prompt of “act as an agent working toward goal X; use tools Y to gather information and take actions as appropriate”. The gap between a good oracle and an effective agent is almost completely illusory.
Episodic memory is less trivial, but still relatively easy to improve from current near-zero-effort systems. Efforts from here will likely build on LLMs strengths. I’ll say no more publicly; DM me for details. But it doesn’t take a PhD in computational neuroscience to rederive this, which is the only reason I’m mentioning it publicly. More on infohazards later.
Now to the capabilities payoff: long time-horizon tasks and continuous, self-directed learning.
Long time-horizon task abilities are an emergent product of episodic memory and general cognitive abilities. LLMs are “smart” enough to manage their own thinking; they don’t have instructions or skills to do it. o1 appears to have those skills (although no episodic memory which is very helpful in managing multiple chains of thought), so similar RL training on Chains of Thought is probably one route achieving those.
Humans do not mostly perform long time-horizon tasks by trying them over and over. They either ask someone how to do it, then memorize and reference those strategies with episodic memory; or they perform self-directed learning, and pose questions and form theories to answer those same questions.
Humans do not have or need “9s of reliability” to perform long time-horizon tasks. We substitute frequent error-checking and error-correction. We then learn continuously on both strategy (largely episodic memory) and skills/habitual learning (fine-tuning LLMs already provides a form of this habitization of explicit knowledge to fast implicit skills).
Continuous, self-directed learning is a product of having any type of new learning (memory), and using some of the network/agents’ cognitive abilities to decide what’s worth learning. This learning could be selective fine-tuning (like o1s “deliberative alignment), episodic memory, or even very long context with good access as a first step. This is how humans master new tasks, along with taking instruction wisely. This would be very helpful for mastering economically viable tasks, so I expect real efforts put into mastering it.
Self-directed learning would also be critical for an autonomous agent to accomplish entirely novel tasks, like taking over the world.
This is why I expect “Real AGI” that’s agentic and learns on its own, and not just transformative tool “AGI” within the next five years (or less). It’s easy and useful, and perhaps the shortest path to capabilities (as with humans teaching themselves).
If that happens, I don’t think we’re necessarily doomed, even without much new progress on alignment (although we would definitely improve our odds!). We are already teaching LLMs mostly to answer questions correctly and to follow instructions. As long as nobody gives their agent an open-ended top-level goal like “make me lots of money”, we might be okay. Instruction-following AGI is easier and more likely than value aligned AGI although I need to work through and clarify why I find this so central. I’d love help.
Convincing predictions are also blueprints for progress. Thus, I have been hesitant to say all of that clearly.
I said some of this at more length in Capabilities and alignment of LLM cognitive architectures and elsewhere. But I didn’t publish it in my previous neuroscience career nor have I elaborated since then.
But I’m increasingly convinced that all of this stuff is going to quickly become obvious to any team that sits down and starts thinking seriously about how to get from where we are to really useful capabilities. And more talented teams are steadily doing just that.
I now think it’s more important that the alignment community takes short timelines more seriously, rather than hiding our knowledge in hopes that it won’t be quickly rederived. There are more and more smart and creative people working directly toward AGI. We should not bet on their incompetence.
There could certainly be unexpected theoretical obstacles. There will certainly be practical obstacles. But even with expected discounts for human foibles and idiocy and unexpected hurdles, timelines are not long. We should not assume that any breakthroughs are necessary, or that we have spare time to solve alignment adequately to survive.
What links here?
- Lukas_Gloor 6 Mar 2025 12:46 UTC
  7 points
  0
  Parent
  Great reply!
  On episodic memory:
  I’ve been watching Claude play Pokemon recently and I got the impression of, “Claude is overqualified but suffering from the Memento-like memory limitations. Probably the agent scaffold also has some easy room for improvements (though it’s better than post-it notes and tatooing sentences on your body).”
  I don’t know much about neuroscience or ML, but how hard can it be to make the AI remember what it did a few minutes ago? Sure, that’s not all that’s between claude and TAI, but given that Claude is now within the human expert range on so many tasks, and given how fast progress has been recently, how can anyone not take short timelines seriously?
  People who largely rule out 1-5y timelines seem to not have updated at all on how much they’ve presumably been surprised by recent AI progress.
  (If someone had predicted a decent likelihood for transfer learning and PhD level research understanding shortly before those breakthroughs happened, followed by predicting a long gap after that, then I’d be more open to updating towards their intuitions. However, my guess is that people who have long TAI timelines now also held now-wrong confident long timelines for breakthroughs in transfer learning (etc.), and so, per my perspective, they arguably haven’t made the update that whatever their brain is doing when they make timelines forecast is not very good.)
- Kaj_Sotala 27 Dec 2024 20:38 UTC
  6 points
  2
  Parent
  Thanks, this is the kind of comment that tries to break down things by missing capabilities that I was hoping to see.
  Episodic memory is less trivial, but still relatively easy to improve from current near-zero-effort systems
  I agree that it’s likely to be relatively easy to improve from current systems, but just improving it is a much lower bar than getting episodic memory to actually be practically useful. So I’m not sure why this alone would imply a very short timeline. Getting things from “there are papers about this in the literature” to “actually sufficient for real-world problems” often takes a significant time, e.g.:
  - I believe that chain-of-thought prompting was introduced in a NeurIPS 2022 paper. Going from there to a model that systematically and rigorously made use of it (o1) took about two years, even though the idea was quite straightforward in principle.
  - After the 2007 DARPA Grand Challenge there was a lot of hype about how self-driving cars were just around the corner, but almost two decades later, they’re basically still in beta.
  My general prior is that this kind of work—from conceptual prototype to robust real-world application—can in general easily take between years to decades, especially once we move out of domains like games/math/programming and into ones that are significantly harder to formalize and test. Also, the more interacting components you have, the trickier it gets to test and train.
  - Seth Herd 27 Dec 2024 21:12 UTC
    8 points
    0
    Parent
    I think the intelligence inherent in LLMs will make episodic memory systems useful immediately. The people I know building chatbots with persistent memory were already finding it useful with vector databases, it just topped out in capacity quickly by slowing down memory search too much. And that was as of a year ago.
    
    I don’t think I managed to convey one central point, which is that reflection and continuous learning together can fill a lot of cognitive gaps. I think they do for humans. We can analyze our own thinking and then use new strategies where appropriate. It seems like the pieces are all there for LLM cognitive architectures to do this as well. Such a system will still take time to dramatically self-improve by re-engineering its base systems. But there’s a threshold of general intelligence and self-directed learning in which a system can self-correct and self-improve in limited but highly useful ways, so that its designers don’t have to fix very flaw by hand, and it can just back up and try again differently.
    
    I don’t like the term unhobbling because it’s more like adding cognitive tools that make new uses of LLMs considerable flexible intelligence.
    
    All of the continuous learning approaches that would enable self-directed continuous are clunky now but there are no obvious roadblocks to their improving rapidly when a few competent teams start working on them full-time. And since there are multiple approaches already in play, there’s a better chance some combination become useful quickly.
    
    Yes, refining it and other systems will take time. But counting on it being a long time doesn’t seem sensible. I am considering writing the complementary post, “what are the best arguments for long timelines”. I’m curious. I expect the strongest ones to support something like five year timelines to what I consider AGI—which importantly is fully general in that it can learn new things, but will not meet the bar of doing 95% of remote jobs because it’s not likely to be human-level in all areas right away.
    
    I focus on that definition because it seems like a fairly natural category shift from limited tool AI to sapience and understanding in “Real AGI” that roughly match our intuitive understanding of humans as entities, minds, or beings that understand and can learn about themselves if they choose to.
    
    The other reason I focus on that transition is that I expect it to function as a wake-up call to those who don’t imagine agentic AI in detail. It will match their intuitions about humans well enough for our recognition of humans as very dangerous to also apply to that type of AI. Hopefully their growth from general-and-sapient-but-dumb-in-soe-ways will be slow enough for society to adapt—months to years may be enough.
    - Kaj_Sotala 29 Dec 2024 20:48 UTC
      4 points
      0
      Parent
      Thanks. Still not convinced, but it will take me a full post to explain why exactly. :)
      Though possibly some of this is due to a difference in definitions. When you say this:
      what I consider AGI—which importantly is fully general in that it can learn new things, but will not meet the bar of doing 95% of remote jobs because it’s not likely to be human-level in all areas right away
      Do you have a sense of how long you expect it will take for it to go from “can learn new things” to “doing 95% of remote jobs”? If you e.g. expect that it might still take several years for AGI to master most jobs once it has been created, then that might be more compatible with my model.
      - Seth Herd 31 Dec 2024 18:29 UTC
        4 points
        0
        Parent
        I do think our models may be pretty similar once we get past slightly different definitions of AGI.
        
        It’s pretty hard to say how fast the types of agents I’m envisioning would take off. It could be a while between what I’m calling real AGI that can learn anything, and having it learn well and quickly enough, and be smart enough, to do 95% of remote jobs. If there aren’t breakthroughs in learning and memory systems, it could take as much as three years to really start doing substantial work, and be a slow progression toward 95% of jobs as it’s taught and teaches itself new skills. The incremental improvements on existing memory systems—RAG, vector databases, and fine-tuning for skills and new knowledge—would remain clumsier than human learning for a while.
        
        This would be potentially very good for safety. Semi-competent agents that aren’t yet takeover-capable might wake people up to the alignment and safety issues. And I’m optimistic about the agent route for technical alignment; of course that’s a more complex issue. Intent alignment as a stepping-stone to value alignment gives the broad outline and links to more work on how instruction-following language model agents might bypass some of the worst concerns about goal mis-specification and mis-generalization and risks from optimization.
        
        You made a good point in the linked comment that these systems will be clumsier to train and improve if they have more moving parts. My impression from the little information I have on agent projects is that this is true. But I haven’t heard of a large and skilled team taking on this task yet; it will be interesting to see what one can do. And at some point, an agent directing its own learning and performance gains an advantage that can offset the disadvantage of being harder for humans to improve and optimize the underlying system.
        
        I look forward to that post if you get around to writing it. I’ve been toying with the idea of writing a more complete post on my short timelines and slow takeoff scenario. Thanks for posing the question and getting me to dash off a short version at least.
  - Noosphere89 27 Dec 2024 20:47 UTC
    2 points
    1
    Parent
    I’d argue that self-driving cars were essentially solved by Waymo in 2021-2024, and to a lesser extent I’d include Tesla in this too, and that a lot of the reason why self-driving cars aren’t on the roads is because of liability issues, so in essence self-driving cars came 14-17 years after the DARPA grand challenge.
    - Kaj_Sotala 29 Dec 2024 20:44 UTC
      6 points
      0
      Parent
      Hmm, some years back I was hearing the claim that self-driving cars work badly in winter conditions, so are currently limited to the kinds of warmer climates where Waymo is operating. I haven’t checked whether that’s still entirely accurate, but at least I haven’t heard any news of this having made progress.
      - Noosphere89 29 Dec 2024 21:12 UTC
        4 points
        2
        Parent
        My guess is that a large portion of the “works badly in winter conditions” issue is closer to it does work reasonably well in winter climates, but it doesn’t work so well that you can’t be sued/have liability issues.
        
        I’d argue the moral of self-driving cars is regulation can slow down tech considerably, which does have implications for AI policy.