Nick_Tarleton comments on Views on when AGI comes and on strategy to reduce existential risk

Nick_Tarleton 2 Feb 2025 23:07 UTC
LW: 7 AF: 3
3
AF
It seems right to me that “fixed, partial concepts with fixed, partial understanding” that are “mostly ‘in the data’” likely block LLMs from being AGI in the sense of this post. (I’m somewhat confused / surprised that people don’t talk about this more — I don’t know whether to interpret that as not noticing it, or having a different ontology, or noticing it but disagreeing that it’s a blocker, or thinking that it’ll be easy to overcome, or what. I’m curious if you have a sense from talking to people.)
These also seem right
- “LLMs have a weird, non-human shaped set of capabilities”
- “There is a broken inference”
- “we should also update that this behavior surprisingly turns out to not require as much general intelligence as we thought”
- “LLMs do not behave with respect to X like a person who understands X, for many X”
(though I feel confused about how to update on the conjunction of those, and the things LLMs are good at — all the ways they don’t behave like a person who doesn’t understand X, either, for many X.)
But: you seem to have a relatively strong prior^[1] on how hard it is to get from current techniques to AGI, and I’m not sure where you’re getting that prior from. I’m not saying I have a strong inside view in the other direction, but, like, just for instance — it’s really not apparent to me that there isn’t a clever continuous-training architecture, requiring relatively little new conceptual progress, that’s sufficient; if that’s less sample-efficient than what humans are doing, it’s not apparent to me that it can’t still accomplish the same things humans do, with a feasible amount of brute force. And it seems like that is apparent to you.
Or, looked at from a different angle: to my gut, it seems bizarre if whatever conceptual progress is required takes multiple decades, in the world I expect to see with no more conceptual progress, where probably:
- AI is transformative enough to motivate a whole lot of sustained attention on overcoming its remaining limitations
- AI that’s narrowly superhuman on some range of math & software tasks can accelerate research
1. ^
  It’s hard for me to tell how strong: “—though not super strongly” is hard for me to square with your butt-numbers, even taking into account that you disclaim them as butt-numbers.
- TsviBT 3 Feb 2025 2:40 UTC
  LW: 9 AF: 5
  1
  AF Parent
  I’m curious if you have a sense from talking to people.
  
  More recently I’ve mostly disengaged (except for making kinda-shrill LW comments). Some people say that “concepts” aren’t a thing, or similar. E.g. by recentering on performable tasks, by pointing to benchmarks going up and saying that the coarser category of “all benchmarks” or similar is good enough for predictions. (See e.g. Kokotajlo’s comment here https://www.lesswrong.com/posts/oC4wv4nTrs2yrP5hz/what-are-the-strongest-arguments-for-very-short-timelines?commentId=QxD5DbH6fab9dpSrg, though his actual position is of course more complex and nuanced.) Some people say that the training process is already concept-gain-complete. Some people say that future research, such as “curiosity” in RL, will solve it. Some people say that the “convex hull” of existing concepts is already enough to set off FURSI (fast unbounded recursive self-improvement).
  
  (though I feel confused about how to update on the conjunction of those, and the things LLMs are good at — all the ways they don’t behave like a person who doesn’t understand X, either, for many X.)
  
  True; I think I’ve heard some various people discussing how to more precisely think of the class of LLM capabilities, but maybe there should be more.
  
  if that’s less sample-efficient than what humans are doing, it’s not apparent to me that it can’t still accomplish the same things humans do, with a feasible amount of brute force
  
  It’s often awkward discussing these things, because there’s sort of a “seeing double” that happens. In this case, the “double” is:
  
  “AI can’t FURSI because it has poor sample efficiency...
  1. ...and therefore it would take k orders of magnitude more data / compute than a human to do AI research.”
  2. ...and therefore more generally we’ve not actually gotten that much evidence that the AI has the algorithms which would have caused both good sample efficiency and also the ability to create novel insights / skills / etc.”
  The same goes mutatis mutandis for “can make novel concepts”.
  
  I’m more saying 2. rather than 1. (Of course, this would be a very silly thing for me to say if we observed the gippities creating lots of genuine novel useful insights, but with low sample complexity (whatever that should mean here). But I would legit be very surprised if we soon saw a thing that had been trained on 1000x less human data, and performs at modern levels on language tasks (allowing it to have breadth of knowledge that can be comfortably fit in the training set).)
  
  can’t still accomplish the same things humans do
  
  Well, I would not be surprised if it can accomplish a lot of the things. It already can of course. I would be surprised if there weren’t some millions of jobs lost in the next 10 years from AI (broadly, including manufacturing, driving, etc.). In general, there’s a spectrum/space of contexts / tasks, where on the one hand you have tasks that are short, clear-feedback, and common / stereotyped, and not that hard; on the other hand you have tasks that are long, unclear-feedback, uncommon / heterogenous, and hard. The way humans do things is that we practice the short ones in some pattern to build up for the variety of long ones. I expect there to be a frontier of AIs crawling from short to long ones. I think at any given time, pumping in a bunch of brute force can expand your frontier a little bit, but not much, and it doesn’t help that much with more permanently ratcheting out the frontier.
  
  AI that’s narrowly superhuman on some range of math & software tasks can accelerate research
  
  As you’re familiar with, if you have a computer program that has 3 resources bottlenecks A (50%), B (25%), and C (25%), and you optimize the fuck out of A down to ~1%, you ~double your overall efficiency; but then if you optimize the fuck out of A again down to .1%, you’ve basically done nothing. The question to me isn’t “does AI help a significant amount with some aspects of AI research”, but rather “does AI help a significant and unboundedly growing amount with all aspects of AI research, including the long-type tasks such as coming up with really new ideas”.
  
  AI is transformative enough to motivate a whole lot of sustained attention on overcoming its remaining limitations
  
  This certainly makes me worried in general, and it’s part of why my timelines aren’t even longer; I unfortunately don’t expect a large “naturally-occurring” AI winter.
  
  seems bizarre if whatever conceptual progress is required takes multiple decades
  
  Unfortunately I haven’t addressed your main point well yet… Quick comments:
  - Strong minds are the most structurally rich things ever. That doesn’t mean they have high algorithmic complexity; obviously brains are less algorithmically complex than entire organisms, and the relevant aspects of brains are presumably considerably simpler than actual brains. But still, IDK, it just seems weird to me to expect to make such an object “by default” or something? Craig Venter made a quasi-synthetic lifeform—but how long would it take us to make a minimum viable unbounded invasive organic replicator actually from scratch, like without copying DNA sequences from existing lifeforms?
  - I think my timelines would have been considered normalish among X-risk people 15 years ago? And would have been considered shockingly short by most AI people.
  - I think most of the difference is in how we’re updating, rather than on priors? IDK.
  - Nick_Tarleton 19 Jul 2025 20:52 UTC
    LW: 2 AF: 1
    0
    AF Parent
    
    Strong minds are the most structurally rich things ever. That doesn’t mean they have high algorithmic complexity; obviously brains are less algorithmically complex than entire organisms, and the relevant aspects of brains are presumably considerably simpler than actual brains. But still, IDK, it just seems weird to me to expect to make such an object “by default” or something? Craig Venter made a quasi-synthetic lifeform—but how long would it take us to make a minimum viable unbounded invasive organic replicator actually from scratch, like without copying DNA sequences from existing lifeforms?
    
    I think I don’t understand this argument. In creating AI we can draw on training data, which breaks the analogy to making a replicator actually from scratch (are you using a premise that this is a dead end, or something, because “Nearly all [thinkers] do not write much about the innards of their thinking processes...”?). We’ve seen that supervised (EDIT: unsupervised) learning and RL (and evolution) can create structural richness (if I have the right idea of what you mean) out of proportion to the understanding that went into them. Of course this doesn’t mean any particular learning process is able to create a strong mind, but, idk, I don’t see a way to put a strong lower bound on how much more powerful a learning process is necessary, and ISTM observations so far suggest ‘less than I would have guessed’.
    
    (EDIT: Maybe (you’d say) I should be drawing such a strong lower bound — or a lower bound on the needed difference from current techniques, not ‘power level’ — from the point about sample efficiency...? Like maybe I should think that we don’t have a good enough sample space to learn over and will probably have to jump far outside it; this comment seems in that direction.)
    
    (Nor do I get what view you’re paraphrasing as ‘expecting to make a strong mind “by default”’. Did LLMs or AlphaZero come about “by default”?)
    
    (EDIT: I feel like I get “by default” more after looking again at your “Let me restate my view again” passage here.)
    
    I think my timelines would have been considered normalish among X-risk people 15 years ago? And would have been considered shockingly short by most AI people.
    
    Unfortunately I can’t find the written artifact that came out of it, but I (very imperfectly) recall a large conversation around SIAI in 2010 where, IIRC, a 2040 median was pretty typical. I agree that “X-risk people” more broadly had longer timelines, and “most AI people” much longer.
    
    I think most of the difference is in how we’re updating, rather than on priors? IDK.
    
    Yeah, in particular it seems like I’m updating more than you from induction on the conceptual-progress-to-capabilities ratio we’ve seen so far / on what seem like surprises to the ‘we need lots of ideas’ view. (Or maybe you disagree about observations there, or disagree with that frame.) (The “missing update” should weaken this induction, but doesn’t invalidate it IMO.)
    - TsviBT 19 Jul 2025 22:41 UTC
      LW: 4 AF: 2
      0
      AF Parent
      
      I think I don’t understand this argument. In creating AI we can draw on training data, which breaks the analogy to making a replicator actually from scratch (are you using a premise that this is a dead end, or something, because “Nearly all [thinkers] do not write much about the innards of their thinking processes...”?).
      
      You’re technically right that the analogy is broken in that way, yeah. Likewise, if someone gleans substantial chunks of the needed Architecture by looking at scans of brains. But yes, as you say, I think the actual data (in both cases) doesn’t directly tell you what you need to know, by any stretch. (To riff on an analogy from Kabir Kumar: it’s sort of like trying to infer the inner workings of a metal casting machine, purely by observing price fluctuations for various commodities. It’s probably possible in theory, but staring at the price fluctuations—which are a highly mediated / garbled / fuzzed emanation from the “guts” of various manufacturing processes—is not a good way to discover the important ideas about how casting machines can work. Cf. https://www.lesswrong.com/posts/unCG3rhyMJpGJpoLd/koan-divining-alien-datastructures-from-ram-activations )
      
      We’ve seen that supervised learning and RL (and evolution) can create structural richness (if I have the right idea of what you mean) out of proportion to the understanding that went into them.
      
      Not sure I buy the claims about SL and RL. In the case of SL, it’s only going “a little ways away from the data”, in terms of the structure you get. Or so I claim uncertainly. (Hm… maybe the metaphor of “distance from the data” is quite bad.… really I mean “it’s only exploring a pretty impoverished sector in structurespace, partly due to data and partly due to other Architecture”.) In the case of RL, what are the successes in terms of gaining new learned structure? There’s going to be some—we can point to AlphaZero, and maybe some robotics things—but I’m skeptical that this actually represents all that much structural richness. The actual NNs in AlphaZero would have some nontrivial structure, but hard to tell how much, and it’s going to be pretty narrow / circumscribed, e.g. it wouldn’t represent most interesting math concepts.
      
      Anyway, the claim is of course true of evolution. The general point is true, that learning systems can be powerful, and specifically high-leverage in various ways (e.g. lots of learning from small algorithmic complexity fingerprint as with evolution or Solomonoff induction, or from fairly small compute as in humans).
      
      Of course this doesn’t mean any particular learning process is able to create a strong mind, but, idk, I don’t see a way to put a strong lower bound on how much more powerful a learning process is necessary,
      
      Right, no one knows. Could be next month that everyone dies from AGI. The only claims I’d really argue strongly would be claims like
      
      If you have median 2029 or similar, either you’re overconfident or you know something dispositive that I don’t know.
      If you have probability of AGI by 2029 less than .05%, either you’re overconfident or you know something dispositive that I don’t know.
      
      Besides my comments about the bitter lesson and about the richness of evolution’s search, I’ll also say that it just seems to me like there’s lots of ideas—at the abstract / fundamental / meta level of learning and thinking—that have yet to be put into practice in AI. I wrote in the OP:
      
      The self-play that evolution uses (and the self-play that human children use) is much richer, containing more structural ideas, than the idea of having an agent play a game against a copy of itself.
      
      IME if you think about these sorts of things—that is, if you think about how the 2.5 known great and powerful optimization processes (evolution, humans, humanity/science) do their impressive thing that they do—if you think about that, you see lots of sorts of feedback arrangements and ways of exploring the space of structures / algorithms, many of which are different in some fundamental character from what’s been tried so far in AI. And, these things don’t add up, in my head, to a general intelligence—though of course that is only a deficiency in my imagination, one way or another.
      
      (EDIT: Maybe (you’d say) I should be drawing such a strong lower bound from the point about sample efficiency...?)
      
      I don’t personally lean super heavily on the sample efficiency thing. I mean, if we see a system that’s truly only trained on some human data that’s of size less than 10x the amount that a well-read adult human has read (plus compute / thinking), and it performs like GPT-4 or similar, that would be really weird and surprising, and I would be confused, and I’d be somewhat more scared. But I don’t think it would necessarily imply that you’re about to get AGI.
      
      Conversely, I definitely don’t think that high sample complexity strongly implies that you’re not about to get AGI. (Well, I guess if you’re about to get AGI, there should probably be spikes in sample efficiency in specific areas—e.g. you’d be able to invent much more interesting math with little or no data, whereas previously you had to train on vast math corpora. But we don’t necessarily have to observe these domain spikes before dying of nanopox.)
      
      Yeah, in particular it seems like I’m updating more than you from induction on the conceptual-progress-to-capabilities ratio we’ve seen so far / on what seem like surprises to the ‘we need lots of ideas’ view. (Or maybe you disagree about observations there, or disagree with that frame.) (The “missing update” should weaken this induction, but doesn’t invalidate it IMO.)
      
      Yeah… To add a bit of color, I’d say I’m pretty wary of mushing. Like, we mush together all “capabilities” and then update on how much “capabilities” our current learning programs have. I don’t feel like that sort of reasoning ought to work very well. But I haven’t yet articulated how mushing is anything more specific than categorization, if it is more specific. Maybe what I mean by mushing is “sticking to a category and hanging lots of further cognition (inferences, arguments, plans) on the category, without putting in suitable efforts to refine the category into subcategories”. I wrote:
      
      We should have been trying hard to retrospectively construct new explanations that would have predicted the observations. Instead we went with the best PREEXISTING explanation that we already had.