I agree with this point as stated, but think the probability is more like 5% than 0.1%
Same.
I do think our chances look not-great overall, but most of my doom-probability is on things which don’t look like LLMs scheming.
Also, are you making sure to condition on “scaling up networks, running pretraining + light RLHF produces tranformatively powerful AIs which obsolete humanity”
That’s not particularly cruxy for me either way.
Separately, I’m uncertain whether the current traning procedure of current models like GPT-4 or Claude 3 is still well described as just “light RLHF”.
Fair. Insofar as “scaling up networks, running pretraining + RL” does risk schemers, it does so more as we do more/stronger RL, qualitatively speaking.
Same.
I do think our chances look not-great overall, but most of my doom-probability is on things which don’t look like LLMs scheming.
That’s not particularly cruxy for me either way.
Fair. Insofar as “scaling up networks, running pretraining + RL” does risk schemers, it does so more as we do more/stronger RL, qualitatively speaking.