ryan_greenblatt comments on Views on when AGI comes and on strategy to reduce existential risk

ryan_greenblatt 11 Jan 2025 22:23 UTC
LW: 10 AF: 6
6
AF
I think if you want to convince people with short timelines (e.g., 7 year medians) of your perspective, probably the most productive thing would be to better operationalize things you expect that AIs won’t be able to do soon (but that AGI could do). As in, flesh out a response to this comment such that it is possible for someone to judge.
- TsviBT 11 Jan 2025 22:38 UTC
  LW: 15 AF: 10
  0
  AF Parent
  But ok:
  - Come up, on its own, with many math concepts that mathematicians consider interesting + mathematically relevant on a similar level to concepts that human mathematicians come up with.
  - Do insightful science on its own.
  - Perform at the level of current LLMs, but with 300x less training data.
  What links here?
  - Do confident short timelines make sense? by TsviBT (15 Jul 2025 3:37 UTC; 138 points)
  - TsviBT 11 Jan 2025 23:39 UTC
    LW: 14 AF: 8
    2
    AF Parent
    But like, I wouldn’t be surprised if, say, someone trained something that performed comparably to LLMs on a wide variety of benchmarks, using much less “data”… and then when you look into it, you find that what they were doing was taking activations of the LLMs and training the smaller guy on the activations. And I’ll be like, come on, that’s not the point; you could just as well have “trained” the smaller guy by copy-pasting the weights from the LLM and claimed “trained with 0 data!!”. And you’ll be like “but we met your criterion!” and I’ll just be like “well whatever, it’s obviously not relevant to the point I was making, and if you can’t see that then why are we even having this conversation”. (Or maybe you wouldn’t do that, IDK, but this sort of thing—followed by being accused of “moving the goal posts”—is why this question feels frustrating to answer.)
  - Kaarel 29 Jan 2025 23:18 UTC
    1 point
    0
    Parent
    ¿ thoughts on the following:
    
    solving >95% of IMO problems while never seeing any human proofs, problems, or math libraries (before being given IMO problems in base ZFC at test time). like alphaproof except not starting from a pretrained language model and without having a curriculum of human problems and in base ZFC with no given libraries (instead of being in lean), and getting to IMO combos
    What links here?
    Kaarel's comment on Jesse Hoogland’s Shortform by Jesse Hoogland (12 Feb 2025 2:26 UTC; 6 points)
    - TsviBT 29 Jan 2025 23:43 UTC
      3 points
      0
      Parent
      
      human proofs, problems, or math libraries
      
      (I’m not sure whether I’m supposed to nitpick. If I were nitpicking I’d ask things like: Wait are you allowing it to see preexisting computer-generated proofs? What counts as computer generated? Are you allowing it to see the parts of papers where humans state and discuss propositions and just cutting out the proofs? Is this system somehow trained on a giant human text corpus, but just without the math proofs?)
      
      But if you mean basically “the AI has no access to human math content except a minimal game environment of formal logic, plus whatever abstract priors seep in via the training algorithm+prior, plus whatever general thinking patterns in [human text that’s definitely not mathy, e.g. blog post about apricots]”, then yeah, this would be really crazy to see. My points are trying to be, not minimally hard, but at least easier-ish in some sense. Your thing seems significantly harder (though nicely much more operationalized); I think it’d probably imply my “come up with interesting math concepts”? (Note that I would not necessary say the same thing if it was >25% of IMO problems; there I’d be significantly more unsure, and would defer to you / Sam, or someone who has a sense for the complexity of the full proofs there and the canonicalness of the necessary lemmas and so on.)
      - Kaarel 30 Jan 2025 5:24 UTC
        3 points
        0
        Parent
        I didn’t express this clearly, but yea I meant no pretraining on human text at all, and also nothing computer-generated which “uses human mathematical ideas” (beyond what is in base ZFC), but I’d probably allow something like the synthetic data generation used for AlphaGeometry (Fig. 3) except in base ZFC and giving away very little human math inside the deduction engine. I agree this would be very crazy to see. The version with pretraining on non-mathy text is also interesting and would still be totally crazy to see. I agree it would probably imply your “come up with interesting math concepts”. But I wouldn’t be surprised if like $> 20 %$ of the people on LW who think A[G/S]I happens in like $2 - 3$ years thought that my thing could totally happen in 2025 if the labs were aiming for it (though they might not expect the labs to aim for it), with your things plausibly happening later. E.g. maybe such a person would think “AlphaProof is already mostly RL/search and one could replicate its performance soon without human data, and anyway, AlphaGeometry already pretty much did this for geometry (and AlphaZero did it for chess)” and “some RL+search+self-play thing could get to solving major open problems in math in 2 years, and plausibly at that point human data isn’t so critical, and IMO problems are easier than major open problems, so plausibly some such thing gets to IMO problems in 1 year”. But also idk maybe this doesn’t hang together enough for such people to exist. I wonder if one can use this kind of idea to get a different operationalization with parties interested in taking each side though. Like, maybe whether such a system would prove Cantor’s theorem (stated in base ZFC) (imo this would still be pretty crazy to see)? Or whether such a system would get to IMO combos relying moderately less on human data?
        TsviBT 30 Jan 2025 8:42 UTC
        10 points
        1
        Parent
        
        I’d probably allow something like the synthetic data generation used for AlphaGeometry (Fig. 3) except in base ZFC and giving away very little human math inside the deduction engine
        
        IIUC yeah, that definitely seems fair; I’d probably also allow various other substantial “quasi-mathematical meta-ideas” to seep in, e.g. other tricks for self-generating a curriculum of training data.
        
        But I wouldn’t be surprised if like >20% of the people on LW who think A[G/S]I happens in like 2-3 years thought that my thing could totally happen in 2025 if the labs were aiming for it (though they might not expect the labs to aim for it), with your things plausibly happening later
        
        Mhm, that seems quite plausible, yeah, and that does make me want to use your thing as a go-to example.
        
        whether such a system would prove Cantor’s theorem (stated in base ZFC) (imo this would still be pretty crazy to see)?
        
        This one I feel a lot less confident of, though I could plausibly get more confident if I thought about the proof in more detail.
        
        Part of the spirit here, for me, is something like: Yes, AIs will do very impressive things on “highly algebraic” problems / parts of problems. (See “Algebraicness”.) One of the harder things for AIs is, poetically speaking, “self-constructing its life-world”, or in other words “coming up with lots of concepts to understand the material it’s dealing with, and then transitioning so that the material it’s dealing with is those new concepts, and so on”. For any given math problem, I could be mistaken about how algebraic it is (or, how much of its difficulty for humans is due to the algebraic parts), and how much conceptual progress you have to do to get to a point where the remaining work is just algebraic. I assume that human math is a big mix of algebraic and non-algebraic stuff. So I get really surprised when an AlphaMath can reinvent most of the definitions that we use, but I’m a lot less sure about a smaller subset because I’m less sure if it just has a surprisingly small non-algebraic part. (I think that someone with a lot more sense of the math in general, and formal proofs in particular, could plausibly call this stuff in advance significantly better than just my pretty weak “it’s hard to do all of a wide variety of problems”.)
- TsviBT 11 Jan 2025 22:33 UTC
  LW: 5 AF: 4
  −6
  AF Parent
  I did give a response in that comment thread. Separately, I think that’s not a great standard, e.g. as described in the post and in this comment https://www.lesswrong.com/posts/i7JSL5awGFcSRhyGF/shortform-2?commentId=zATQE3Lhq66XbzaWm :
  
  Second, 2024 AI is specifically trained on short, clear, measurable tasks. Those tasks also overlap with legible stuff—stuff that’s easy for humans to check. In other words, they are, in a sense, specifically trained to trick your sense of how impressive they are—they’re trained on legible stuff, with not much constraint on the less-legible stuff (and in particular, on the stuff that becomes legible but only in total failure on more difficult / longer time-horizon stuff).
  
  In fact, all the time in real life we make judgements about things that we couldn’t describe in terms that would be considered well-operationalized by betting standards, and we rely on these judgements, and we largely endorse relying on these judgements. E.g. inferring intent in criminal cases, deciding whether something is interesting or worth doing, etc. I should be able to just say “but you can tell that these AIs don’t understand stuff”, and then we can have a conversation about that, without me having to predict a minimal example of something which is operationalized enough for you to be forced to recognize it as judgeable and also won’t happen to be surprisingly well-represented in the data, or surprisingly easy to do without creativity, etc.
  What links here?
  - Do confident short timelines make sense? by TsviBT (15 Jul 2025 3:37 UTC; 138 points)
  - ryan_greenblatt 11 Jan 2025 23:26 UTC
    LW: 4 AF: 3
    4
    AF Parent
    (Yeah, you responded, but felt not that operationalized and seemed doable to flesh out as you did.)