Jan Betley comments on Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance

Jan Betley 24 Jul 2025 12:02 UTC
1 point
0
Yeah, that makes sense—thx.

I’m scared of models getting long term unbounded goals

This is surely scary. I think on some level I’m not worried about that, but maybe because I’m worried enough even about less scary scenarios (“let’s try to deal at least with the easy problems, and hope the hard ones don’t happen”). This feels somewhat similar to my disagreements with Sam here.
- Neel Nanda 25 Jul 2025 10:55 UTC
  2 points
  0
  Parent
  I could get on board with “lets try to deal at least with the easy problems, and ~~hope~~ ensure the hard ones don’t happen”?
  - Jan Betley 25 Jul 2025 11:04 UTC
    1 point
    0
    Parent
    That sounds great. I think I’m just a bit less optimistic about our chances at ensuring things : )
    - Neel Nanda 25 Jul 2025 11:45 UTC
      7 points
      0
      Parent
      Oh, I said try to ensure for a reason. I do think it’s somewhat tractable though