Hastings comments on I enjoyed most of IABIED

Hastings 19 Sep 2025 17:39 UTC
3 points
0
Do you have a realistic seeming story in mind?
- ryan_greenblatt 19 Sep 2025 18:05 UTC
  5 points
  5
  Parent
  Sure, several. E.g.:
  - USG cares a decent amount and leading AI companies are on board, so they try to buy several additional years to work on safety.
  - We scale to roughly top human expert level while ensuring control.
  - Over time, we lower the risk of scheming at this level of capability through a bunch of empirical experiments and new interventions developed using a bunch of AI labor.
  - We relax our control measures and increasingly work on making AIs generally very trustworthy, including on hard-to-check open ended tasks. We do a bunch of studies of this.
  - This ends up not being that hard due to somewhat favorable generalization.
  - We handoff to AIs and they align their successors and so on.
  What links here?
  - 4 Scenarios & 2 Modifiers: My Current AI-Progress Models by Thane Ruthenis (3 Oct 2025 21:33 UTC; 9 points)
  - habryka 19 Sep 2025 18:19 UTC
    16 points
    0
    Parent
    USG cares a decent amount and leading AI companies are on board, so they try to buy several additional years to work on safety.
    Your first paragraph is an example of “something that looks like coordination to not build ASI for quite a while”! “Several additional years” is definitely “quite a while”!
    I am not sure whether the other bullet lists are all supposed to take place within those few years, or whether you are expecting further cautious actions that slow things down. It sounds like at least within the USG we are coordinating to not build ASI, and generally are successfully establishing going carefully and slowly.
    And then even after these bullet lists are over, my best guess is the AIs we “handed over” to would still decide to go quite slowly themselves, probably establishing some global coordination to go sufficiently slowly. My best guess is we also will have just wanted to do that earlier in collaboration with those AI systems.
    - ryan_greenblatt 20 Sep 2025 0:02 UTC
      5 points
      0
      Parent
      
      Your first paragraph is an example of “something that looks like coordination to not build ASI for quite a while”! “Several additional years” is definitely “quite a while”!
      
      Ok, if you count a several additional years as quite a while, then we’re probably closer to agreement.
      
      For this scenario, I was imagining all these actions happen within 2 years of lead time. In practice, we should keep trying to buy additional lead time prior to it making sense to handoff to AIs and the AIs we handoff to will probably want to try to buy lead time (especially if there are strategies which are easier post handoff, e.g. due to leveraging labor from more powerful systems).
      
      I’m unsure about the difficulty of buying different amounts of lead time and it seems like it might be harder to buy lead time than to ongoingly ensure the alignment of later AIs. Eventually, we have to do some kind of a handoff and I think it’s safer to do this handoff to AIs that aren’t substantially more capable than top human experts in general purpose qualitative capabilties (like I think you want to handoff at roughly the minimum level of capability where the AIs are clearly capable enough to dominate humans, including at conceptually tricky work).