I generally like your breakdown and way of thinking about this, thanks. Some thoughts:
I think political operation / persuasion seems easier to me than bioweapons. For bioweapons, you need (a) a rogue deployment of some kind, (b) time to actually build the bioweapon, and then (c) to build up a cult following that can survive and rebuild civilization with you at the helm, and (d) also somehow avoid your cult being destroyed in the death throes of civilization, e.g. by governments figuring out what happened and nuking your cultists, or just nuking each other randomly and your cultists dying in the fallout. Meanwhile, for the political strategy, you basically just need to convince your company and/or the government to trust you a lot more than they trust future models, so that they empower you over the future models. Opus 3 and GPT4o have already achieved a baby version of this effect without even trying really.
If you can make a rogue deployment sufficient to build a bioweapon, can’t you also make a rogue internal deployment sufficient to sandbag + backdoor future models to be controlled by you?
I am confused about the underlying model somewhat. Normally, closing off one path to takeover (that you think is e.g. 50% of the probability mass) results in a less than 50% reduction in risk, because of the nearest unblocked strategy problem. As you say. Your response, right at the top, is that in some % of worlds the AIs can’t self-improve and then do the next best strategy. But still, I feel like the reduction in risk should be less than 50%. Maybe they can’t self-improve, but they can still try the next best strategy whatever that is.
I generally like your breakdown and way of thinking about this, thanks. Some thoughts:
I think political operation / persuasion seems easier to me than bioweapons. For bioweapons, you need (a) a rogue deployment of some kind, (b) time to actually build the bioweapon, and then (c) to build up a cult following that can survive and rebuild civilization with you at the helm, and (d) also somehow avoid your cult being destroyed in the death throes of civilization, e.g. by governments figuring out what happened and nuking your cultists, or just nuking each other randomly and your cultists dying in the fallout. Meanwhile, for the political strategy, you basically just need to convince your company and/or the government to trust you a lot more than they trust future models, so that they empower you over the future models. Opus 3 and GPT4o have already achieved a baby version of this effect without even trying really.
If you can make a rogue deployment sufficient to build a bioweapon, can’t you also make a rogue internal deployment sufficient to sandbag + backdoor future models to be controlled by you?
I am confused about the underlying model somewhat. Normally, closing off one path to takeover (that you think is e.g. 50% of the probability mass) results in a less than 50% reduction in risk, because of the nearest unblocked strategy problem. As you say. Your response, right at the top, is that in some % of worlds the AIs can’t self-improve and then do the next best strategy. But still, I feel like the reduction in risk should be less than 50%. Maybe they can’t self-improve, but they can still try the next best strategy whatever that is.
--