johnswentworth comments on Many arguments for AI x-risk are wrong

johnswentworth 5 Mar 2024 16:53 UTC
LW: 7 AF: 6
0
AF
I agree with this point as stated, but think the probability is more like 5% than 0.1%
Same.
I do think our chances look not-great overall, but most of my doom-probability is on things which don’t look like LLMs scheming.
Also, are you making sure to condition on “scaling up networks, running pretraining + light RLHF produces tranformatively powerful AIs which obsolete humanity”
That’s not particularly cruxy for me either way.
Separately, I’m uncertain whether the current traning procedure of current models like GPT-4 or Claude 3 is still well described as just “light RLHF”.
Fair. Insofar as “scaling up networks, running pretraining + RL” does risk schemers, it does so more as we do more/stronger RL, qualitatively speaking.