ryan_greenblatt comments on Do confident short timelines make sense?

ryan_greenblatt 16 Jul 2025 23:14 UTC
2 points
0

Adversarial robustness is part of agency, so I don’t agree with that aspect of your framing.

Maybe so, but it isn’t clearly required for automating AI R&D!
- Cole Wyeth 17 Jul 2025 0:10 UTC
  5 points
  0
  Parent
  I think that it is. I keep meaning to write my thoughts on this issue up.
  
  I believe adversarially robustness is a core agency skill because reasoning can defeat itself; you have to be unable to fool yourself. You can’t be fooled by the processes you spin off, figuratively or literally. You can’t be fooled by other people’s bad but convincing ideas either.
  
  this is related to an observation I’ve made that exotic counterexamples are likely to show up in wrong proofs, not becuase they are typical, but because mathematicians will tend to construct unusual situations while seeking to misuse true results to prove a false result.
  
  a weaker position is that even if adversarial robustness isn’t itself necessary for agency, an egregious failure to be adversarially robust seems awfully likely to indicate that something deeper is missing or broken.
  - ryan_greenblatt 17 Jul 2025 0:23 UTC
    8 points
    3
    Parent
    IMO, the type of adversarial robustness you’re discussing is sufficiently different than what people typically mean by adversarial robustness that it would be worth tabooing the word. (E.g., I might say “robust self-verification is required”.)
    - Cole Wyeth 17 Jul 2025 2:07 UTC
      4 points
      0
      Parent
      I guess that’s true.
      The way I model this situation is tied to my analysis of joint AIXI which treats the action bits as adversarial because the distribution is not realizable.
      
      so, there are actually a few different concepts here which my mental models link in a non-transparent way.
      
      (I’ve noticed that when people say things like I just said, it seems to be fairly common that their model is just conflating things and they’re wrong. I don’t think that applies to me, but it’s worth a minor update on the outside view)