I’m having trouble seeing that (2) is actually a thing?
The whole problem is that there is no generally agreed ” chance of catastrophe” so ” same change of catastrophe” has no real meaning. It seems this kind of talk is being backchained from what governance people want as opposed to the reality of safety guarantees or safety/risk probabilities—which is that they don’t meaningfully exist [outside of super heuristic guesses].
Indeed, to estimate this probability in a non-bullshit way we exactly need fundamental scientific progress, i.e. (1).
EA has done this exercise a dozen times: if you ask experts the probabilities of doom it ranges all the way from 99% to 0.0001 %.
Will that change? Will expert judgement converge? Maybe. Maybe not. I don’t have a crystal ball.
Even if they do [outside of meaningful progress on (1)] those probabilities won’t actually reflect reality as opposed to political reality.
The problem is there is no ′ scientific’ way to estimate p(doom) and as long as we don’t make serious progress on 1. there won’t be.
I don’t see how cot/activation/control monitoring will have any significant and scientifically -grounded [as opposed to purely story-telling/politics] influence in a way that can be measured and can be utilized to make risk-tradeoffs.
if you ask experts the probabilities of doom it ranges all the way from 99% to 0.0001 %.
What matters here is the chance that a particular model in a particular deployment plan will cause a particular catastrophe. And this is after the model has been trained and evaluated and redteamed and mech interped (imperfectly ofc). I don’t except such divergent probabilities from experts.
I’m having trouble seeing that (2) is actually a thing?
The whole problem is that there is no generally agreed ” chance of catastrophe” so ” same change of catastrophe” has no real meaning.
It seems this kind of talk is being backchained from what governance people want as opposed to the reality of safety guarantees or safety/risk probabilities—which is that they don’t meaningfully exist [outside of super heuristic guesses].
Indeed, to estimate this probability in a non-bullshit way we exactly need fundamental scientific progress, i.e. (1).
EA has done this exercise a dozen times: if you ask experts the probabilities of doom it ranges all the way from 99% to 0.0001 %.
Will that change? Will expert judgement converge? Maybe. Maybe not. I don’t have a crystal ball.
Even if they do [outside of meaningful progress on (1)] those probabilities won’t actually reflect reality as opposed to political reality.
The problem is there is no ′ scientific’ way to estimate p(doom) and as long as we don’t make serious progress on 1. there won’t be.
I don’t see how cot/activation/control monitoring will have any significant and scientifically -grounded [as opposed to purely story-telling/politics] influence in a way that can be measured and can be utilized to make risk-tradeoffs.
What matters here is the chance that a particular model in a particular deployment plan will cause a particular catastrophe. And this is after the model has been trained and evaluated and redteamed and mech interped (imperfectly ofc). I don’t except such divergent probabilities from experts.
Referring to particular models and particular deployment plans and particular catastrophe doesn’t help—the answer is the same.
We don’t know how to scientifically quantify any of these probabilities.
You can replace “deployment plan P1 has same chance of catastrophe as deployment plan P2” with “the safety team is equally worried by P1 and P2″.