Kaarel comments on Raemon’s Shortform

Kaarel 4 Oct 2025 3:43 UTC
6 points
0
But the basic concept of “well, if it was imperfect at either not-getting-resource-pumped, or making suboptimal game theory choices, or if it gave up when it got stuck, it would know that it wasn’t as cognitively powerful as it could be, and would want to find ways to be more cognitively powerful all-else-equal”… seems straightforward to me, and I’m not sure what makes it not straightforward seeming to others

I think there’s a true and fairly straightforward thing here and also a non-straightforward-to-me and in fact imo false/confused adjacent thing. The true and fairly straightforward thing is captured by stuff like:
- as a mind $M$ grows, it comes to have more and better and more efficient technologies (e.g. you get electricity and you make lower-resistance wires)
- (relatedly) as $M$ grows, it employs bigger constellations of parts that cohere (i.e., that work well together; e.g. [hand axes → fighter jets] or [Euclid’s geometry → scheme-theoretic algebraic geometry])
- as $M$ grows, it has an easier time getting any particular thing done, it sees more/better ways to do any particular thing, it can consider more/better plans for any particular thing, it has more and better methods for any particular context, it has more ideas, it asks better questions, it would learn any given thing faster
- as $M$ grows, it becomes more resilient vs some given processes; another mind of some fixed capability would have a harder time pointing out important mistakes $M$ is making or teaching $M$ new useful tricks
The non-straightforward-to-me and in fact imo probably in at least some important sense false/confused adjacent thing is captured by stuff like:
- as a mind $M$ grows, it gets close to never getting stuck
- as $M$ grows, it gets close to not being silly
- as $M$ grows, it gets close to being unexploitable, to being perfect at not getting resource-pumped
- as $M$ grows, it gets close to “being coherent”
- as $M$ grows, it gets close to playing optimal moves in the games it faces
- as $M$ grows, it gets close to being as cognitively powerful as it could be
- as $M$ grows, it gets close to being happy with the way it is — close to full self-endorsement
Hopefully it’s clear from this what the distinction is, and hopefully one can at least “a priori imagine” these two things not being equivalent.^[1] I’m not going to give an argument for propositions in the latter cluster being false/confused here^[2], at least not in the present comment, but I say a bunch of relevant stuff here and I make a small relevant point here.

That said, I think one can say many/most MIRI-esque things without claiming that minds get close to having these properties and without claiming that a growing mind approaches some limit.
1. ↩︎
  If you can’t imagine it at first, maybe try imagining that the growing mind faces a “growing world” — an increasingly difficult curriculum of games etc.. For example, you could have it suck a lot less at playing tic-tac-toe than it used to but still suck a lot at chess, and if it used to play tic-tac-toe but it’s playing chess now then there is a reasonable sense in which it could easily be further from playing optimal moves now — like, if we look at its skill at the games it is supposed to be playing now. Alternatively, when judging how much it sucks, we could always integrate across all games with a measure that isn’t changing in time, but still end up with the verdict that it is always infinitely far from not sucking at games at any finite time, and that it always has more improvements to make (negentropy or whatever willing) than it has already made.
2. ↩︎
  beyond what I said in the previous footnote :)
- Raemon 4 Oct 2025 6:55 UTC
  5 points
  0
  Parent
  The thing I care about here is not “what happens as a mind grows”, in some abstract sense.
  The thing I care about is, “what is the best way for a powerful system to accomplish a very difficult goal quickly/reliably?” (which is what we want the AI for)
  As either we deliberately scale up the AI’s ability to accomplish stuff, it will be true that:
  - if it is getting stuck, it’d achieve stuff better if got stuck less
  - if it is exploitable in ways that are relevant, it’d be better if it wasn’t exploitable
  - if it was acting incoherently in ways that wasted resources, it’d accomplish the goal better
  - if it plays suboptimal moves, it’d achieve the goals better if it it doesn’t.
  - if doesn’t have the best possible working memory / processing speed, it’d achieve the goals better if it had more.
  - if it doesn’t have enough resources to do any the above, it’d achieve the goals better if it had more resources
  - if it could accomplish the above faster if it deliberately self modified to do so, rather than waiting for us to apply more selection pressure to it, it has an incentive to do that.
  And… sure, it could not do those things. Then, either Lab A will put more pressure on the AI to accomplish stuff (and some of the above will become more true). Or Lab A won’t, and some other Lab B will instead.
  And once the AI unlocks “deliberately self-modify” as a strategy to achieve the other stuff, and sufficient resources to do it, then it doesn’t matter what Lab A or B does.
  - Kaarel 4 Oct 2025 17:01 UTC
    4 points
    0
    Parent
    I think I mostly agree with everything you say in this last comment, but I don’t see how my previous comment disagreed with any of that either?
    
    The thing I care about here is not “what happens as a mind grows”, in some abstract sense. The thing I care about is, “what is the best way for a powerful system to accomplish a very difficult goal quickly/reliably?” (which is what we want the AI for)
    
    My lists were intended to be about that. We could rewrite the first list in my previous comment to:
    
    more advanced minds have more and better and more efficient technologies
    more advanced minds have an easier time getting any particular thing done, see more/better ways to do any particular thing, can consider more/better plans for any particular thing, have more and better methods for any particular context, have more ideas, ask better questions, would learn any given thing faster
    and so on
    
    and the second list to:
    
    more advanced minds eventually (and maybe quite soon) get close to never getting stuck
    more advanced minds eventually (and maybe quite soon) get close to being unexploitable
    and so on
    
    I think I probably should have included “I don’t actually know what to do with any of this, because I’m not sure what’s confusing about “Intelligence in the limit.”″ in the part of your shortform I quoted in my first comment — that’s the thing I’m trying to respond to. The point I’m making is:
    
    There’s a difference between stuff like (a) “you become less exploitable by [other minds of some fixed capability level]” and stuff like (b) “you get close to being unexploitable”/”you approach a limit of unexploitability”.
    I could easily see someone objecting to claims of the kind (b), while accepting claims of the kind (a) — well, because I think these are probably the correct positions.
    - Raemon 4 Oct 2025 21:11 UTC
      2 points
      0
      Parent
      I think I mostly agree with everything you say in this last comment, but I don’t see how my previous comment disagreed with any of that either?
      Yeah it doesn’t necessarily disagree with it. But, framing the question:
      The non-straightforward-to-me and in fact imo probably in at least some important sense false/confused adjacent thing is captured by stuff like:
      as a mind $M$ grows, it gets close to never getting stuck
      as $M$ grows, it gets close to not being silly
      seemed like those things were only in some sense false/confused because they are asking the wrong question.
      I think “more advanced” still doesn’t feel like really the right way to frame the question, because “advanced” is still very underspecified.
      - Kaarel 4 Oct 2025 22:33 UTC
        4 points
        0
        Parent
        If we replaced “more advanced minds” with “minds that are better at doing very difficult stuff” or other reasonable alternatives, I would still make the (a) vs (b) distinction, and still say type (b) claims are suspicious.
        Raemon 4 Oct 2025 22:47 UTC
        2 points
        0
        Parent
        The structural thing is less the definition of “what sort of mind” and more, instead of saying “gets more X”, saying “if process Z is causing X to increase, what happens?”. (call this a type C claim)
        But I’m also not sure what feels sus about Type B claims to you, when X is at least pinned down a bit more.