TAG comments on [Question] What the discontinuity is, if not FOOM?

TAG 1 Oct 2025 11:45 UTC
2 points
0

The whole problem is that alignment, as in “AI doesn’t want to take over in a bad way” is not assumed to be solved

That’s a broken way of thinking about it.

Doomers see AI alignment as a binary, either perfect and final, or non existent. But no other form of safety works like that. No one talks of “solving” car safety for once and all like maths problem, instead it’s assumed to be an engineering problem, an issue of making steady , incremental progress. Good enough alignment is good enough!.

So you think your alignment training works for your current version of pre-takeover ASI, but actually previous versions already schemed for a long time, so running a version capable of takeover suddenly for you creates a discontinuity

Scheming is an assumption, not a fact.
- Felix C. 1 Oct 2025 15:42 UTC
  1 point
  0
  Parent
  I’ll make the point that safety engineering can have discontinuous failure modes. The reason the Challenger collapsed was because some o-ring seals in a booster had gotten too cold before launch, preventing them from sealing off the flow of hot gas to the main engine and blowing up the rocket. The function of these o-rings is pretty binary: either gas is kept in and the rocket works, or it’s let out and the whole thing explodes.
  
  AI research might end up with similar problems. It’s probably true that there is such a thing as good enough alignment, but that doesn’t necessarily imply that progress on solving it can be made incrementally and doesn’t have all or nothing stakes in deployment.
  - TAG 1 Oct 2025 18:17 UTC
    2 points
    0
    Parent
    
    AI research might end up with similar problems
    
    Might. IABIED requires a discontinuity to be almost certain.
- Signer 1 Oct 2025 13:02 UTC
  1 point
  0
  Parent
  I don’t think anyone is against incremental progress. It’s just that if after incremental progress AI takes over, then it’s not good enough alignment. And what’s the source of confidence in it being enough?
  
  “Final or nonexistent” seems to be appropriate for scheming detection—if you missed only one way for AI to hide it’s intentions, it will take over. So yes, degree of scheming in broad sense and how much you can prevent it is a crux and other things depend on it. Again, I don’t see how you can be confident that future AI wouldn’t scheme.
  - TAG 1 Oct 2025 13:47 UTC
    2 points
    0
    Parent
    
    It’s just that if after incremental progress AI takes over,
    
    Why would that be discontinuous?
    
    if you missed only one way for AI to hide it’s intentions, it will take over.
    
    Assuming it has an intention, and a malign one. Deception depends on a chain of assumptions. They all have to be well over 90% to lead to a conclusion of near certain doom.
    
    Again, I don’t see how you can be confident that future AI wouldn’t scheme.
    
    I’m not arguing for 0% p(doom) , I’m arguing against 99%.
    - Signer 1 Oct 2025 15:21 UTC
      1 point
      0
      Parent
      
      Why would that be discontinuous?
      
      Because incremental progress missed deception.
      
      I’m arguing against 99%
      
      I agree such confidence lacks justification.
      - TAG 1 Oct 2025 18:15 UTC
        2 points
        0
        Parent
        
        Why would that be discontinuous?
        
        Because incremental progress missed deception
        
        I’m talking about the how of takeover. Could any AI, even one of many, take over successfully in its first attempt?
        Signer 2 Oct 2025 7:13 UTC
        1 point
        0
        Parent
        If all AIs are scheming, they can take over together. If a world with a powerful AI that is actually on humanity’s side is assumed instead, then at some level of power of friendly AI you probably can run unaligned AI and it will not be able to do much harm. But just assuming there being many AIs doesn’t solve scheming by itself—if training actually works as bad as predicted, then no AI of many would be aligned enough.
        TAG 2 Oct 2025 14:35 UTC
        2 points
        0
        Parent
        All AI’s scheming co-operatively is less likely than on scheming.