ryan_greenblatt comments on Vladimir_Nesov’s Shortform

ryan_greenblatt 8 Jul 2025 17:25 UTC
11 points
0
If (early) scheming-for-long-run-preferences AGIs were in control, they would likely prefer a pause (all else equal). If they aren’t, it’s very unclear and they very well might not. (E.g., because they gamble that more powerful AIs will share their preferences (edit: share their preferences more than the humans in control do) and they think that these AIs would have a better shot at takeover.)
- Vladimir_Nesov 8 Jul 2025 23:21 UTC
  3 points
  0
  Parent
  
  because they gamble that more powerful AIs will share their preferences (edit: share their preferences more than the humans in control do)
  
  Ah, I’m thinking the AGIs themselves get closer to being proper stakeholders at that stage, for practical purposes (along the lines of gradual disempowerment), since they do have all the basic AI advantages even if they aren’t superintelligent. So humans remaining in control is not centrally the case even if nominally they still are and intent alignment still mostly works.
  
  The conditions for such partial loss of control might even be necessary for a Pause project to succeed. If this isn’t the case with the first generation of AGIs, it might become the case with the second generation, and so on, reaching an equilibrium at some point once AGIs are sufficiently powerful and in control of the situation to successfully implement a worldwide RSI Pause.
  What links here?
  - Vladimir_Nesov's comment on shortplav by niplav (20 Jul 2025 22:09 UTC; 19 points)
- Vladimir_Nesov 8 Jul 2025 17:42 UTC
  2 points
  0
  Parent
  The post I’m framing this around posits enough intent alignment to aim AIs at projects, which doesn’t necessarily imply that the AIs aren’t powerful enough to accomplish things that seem hopeless with human-only effort within a few years.
  
  The point about convergent instrumental use of Pausing RSI for early AGIs is that this might be an easier target to aim the AIs at, all else equal. It’s not strictly necessary for this to be a major factor. Mostly I’m pointing out that this is something AIs could be aimed at through intent alignment, convergent motivation or not, which seems counterintuitive for a Pause AI project if not considered explicitly. And thus currently it’s worth preparing for.
  
  because they gamble that more powerful AIs will share their preferences and they think that these AIs would have a better shot at takeover
  
  That’s how some humans are thinking as well! The arguments are about the same, both for and against. (I think overall rushing RSI is clearly a bad idea for a wide variety of values and personal situations, and so smarter AGIs will more robustly tend to converge on this conclusion than humans do.)
  - ryan_greenblatt 8 Jul 2025 20:41 UTC
    2 points
    0
    Parent
    
    That’s how some humans are thinking as well! The arguments are about the same, both for and against. (I think overall rushing RSI is clearly a bad idea for a wide variety of values and personal situations, and so smarter AGIs will more robustly tend to converge on this conclusion than humans do.)
    
    Sorry I meant “share their preferences more than the humans in control share their preferences”. I agree that this might be how some humans are thinking, but the case for the humans is much more dubious!
    - Vladimir_Nesov 8 Jul 2025 21:19 UTC
      4 points
      0
      Parent
      The case for AGIs delaying RSI also gets stronger because they don’t need to rely on cryonics, so their personal survival is less threatened by delays. Future AIs might have more of an issue here if they have test-time training, with a lot of data forming individual AI personalities for deployment instances (adapting them to particular jobs or sources of tasks), and this data habitually gets thrown away rather than being carefully preserved. People should notice that this might be an important concern, though if it is the AIs might start saying so themselves, and not ignoring this would be easier than spontaneously noticing without even being prompted by the AIs. But currently preserving AIs is more reliable than for humans, and in any case much cheaper.
      
      I don’t think “both things are AI” is much of an argument about value alignment, given that there is no clear understanding of how either of the AIs work, what values are, how that translates to what we want out of alignment, and so on. The entities on the other side of an RSI process might have very little in common with the first AGIs in their design. If the AIs don’t understand how to align the consequences of an RSI process, they are still in a similar boat to humans who don’t understand how to align the consequences of an RSI process. It might take AIs less time to figure it out, but if they are not yet too superintelligent, then it could still take a significant time, and so would require a sufficiently serious effort in preventing RSI, such that if this Pause project is at all successful, it could then in principle hold for years or decades.