I think if the attitude in AI was “there can’t be any even slightly plausible routes to misalignment related catastrophe” and this was consistently upheld in a reasonable way, that would address my concerns. (So, e.g. by the time we’re deploying AIs which could cause huge problems if they were conspiring against us there needs to be a robust solution to alignment faking / scheming which has broad consensus among researchers in the area.)
I don’t expect this because we seem very far from success and this might happen rapidly. (Though we might end up here after eating a bunch of ex-ante risk for some period while using AIs to do safety work.)
I think if the attitude in AI was “there can’t be any even slightly plausible routes to misalignment related catastrophe” and this was consistently upheld in a reasonable way, that would address my concerns. (So, e.g. by the time we’re deploying AIs which could cause huge problems if they were conspiring against us there needs to be a robust solution to alignment faking / scheming which has broad consensus among researchers in the area.)
I don’t expect this because we seem very far from success and this might happen rapidly. (Though we might end up here after eating a bunch of ex-ante risk for some period while using AIs to do safety work.)
Different people might have different interpretation for “slightly plausible” but I agree we are very far right now and need to step up our game!
Agreed, maybe the relevant operationalization would be “broad consensus” and then we could outline a relevant group of researchers.