ryan_greenblatt comments on Machines of Faithful Obedience

ryan_greenblatt 25 Jun 2025 15:12 UTC
4 points
1
I think if the attitude in AI was “there can’t be any even slightly plausible routes to misalignment related catastrophe” and this was consistently upheld in a reasonable way, that would address my concerns. (So, e.g. by the time we’re deploying AIs which could cause huge problems if they were conspiring against us there needs to be a robust solution to alignment faking / scheming which has broad consensus among researchers in the area.)

I don’t expect this because we seem very far from success and this might happen rapidly. (Though we might end up here after eating a bunch of ex-ante risk for some period while using AIs to do safety work.)
- Boaz Barak 25 Jun 2025 22:12 UTC
  5 points
  1
  Parent
  Different people might have different interpretation for “slightly plausible” but I agree we are very far right now and need to step up our game!
  - ryan_greenblatt 25 Jun 2025 22:26 UTC
    3 points
    1
    Parent
    Agreed, maybe the relevant operationalization would be “broad consensus” and then we could outline a relevant group of researchers.