ryan_greenblatt comments on Do confident short timelines make sense?

ryan_greenblatt 16 Jul 2025 18:41 UTC
13 points
0

Anyway, this is my crux. If we start to see competent agentic behavior I will buy into the short timelines view at 75% +

Seems good to flesh out what you mean by this if it’s such a big crux. Ideally, you’d be able to flesh this out in such a way that bad vision (a key problem for games like pokemon) and poor motivation/adversarial-robustness (a key problem for vending claude because it would sort of knowingly make bad financial decisions) aren’t highlighted.

Would this count as competent agentic behavior?

The AI often successfully completes messy software engineering tasks which require 1 week of work for a skilled human and which require checking back in with the person who specified the task to resolve ambiguities. The way the AI completes these tasks involves doing a bunch of debugging and iteration (though perhaps less than a human would do).
- Cole Wyeth 16 Jul 2025 22:11 UTC
  4 points
  0
  Parent
  Yes, if time horizons on realistic SWE tasks pass 8-16 hours that would change my mind—I have already offered to bet the AI 2027 team cash on on this (not taken up) and you can provide me liquidity on the various existing manifold markets (not going to dig up the specific ones) which I very occasionally trade on.
  Adversarial robustness is part of agency, so I don’t agree with that aspect of your framing.
  - ryan_greenblatt 16 Jul 2025 23:14 UTC
    2 points
    0
    Parent
    
    Adversarial robustness is part of agency, so I don’t agree with that aspect of your framing.
    
    Maybe so, but it isn’t clearly required for automating AI R&D!
    - Cole Wyeth 17 Jul 2025 0:10 UTC
      5 points
      0
      Parent
      I think that it is. I keep meaning to write my thoughts on this issue up.
      
      I believe adversarially robustness is a core agency skill because reasoning can defeat itself; you have to be unable to fool yourself. You can’t be fooled by the processes you spin off, figuratively or literally. You can’t be fooled by other people’s bad but convincing ideas either.
      
      this is related to an observation I’ve made that exotic counterexamples are likely to show up in wrong proofs, not becuase they are typical, but because mathematicians will tend to construct unusual situations while seeking to misuse true results to prove a false result.
      
      a weaker position is that even if adversarial robustness isn’t itself necessary for agency, an egregious failure to be adversarially robust seems awfully likely to indicate that something deeper is missing or broken.
      - ryan_greenblatt 17 Jul 2025 0:23 UTC
        8 points
        5
        Parent
        IMO, the type of adversarial robustness you’re discussing is sufficiently different than what people typically mean by adversarial robustness that it would be worth tabooing the word. (E.g., I might say “robust self-verification is required”.)
        Cole Wyeth 17 Jul 2025 2:07 UTC
        4 points
        0
        Parent
        I guess that’s true.
        The way I model this situation is tied to my analysis of joint AIXI which treats the action bits as adversarial because the distribution is not realizable.
        
        so, there are actually a few different concepts here which my mental models link in a non-transparent way.
        
        (I’ve noticed that when people say things like I just said, it seems to be fairly common that their model is just conflating things and they’re wrong. I don’t think that applies to me, but it’s worth a minor update on the outside view)