How many of the decision makers in the companies mentioned care about or even understand the control problem? My impression was: not many.
Coordination is hard even when you share the same goals, but we don’t have that luxury here.
An OpenAI team is getting ready to train a new model, but they’re worried about it’s self improvement capabilities getting out of hand. Luckily, they can consult MIRI’s 2025 Reflexivity Standards when reviewing their codebase, and get 3rd-party auditing done by The Actually Pretty Good Auditing Group (founded 2023).
Current OpenAI wants to build AGI.[1]Current MIRI could confidently tell them that this is a very bad idea. Sure they could be advised that step 25 of their AGI building plan is dangerous, but so were steps 1 through 24.
MIRI’s advice to them won’t be “oh implement this safety measure and you’re golden” because there’s no such safety measure because we won’t have solved alignment by then. The advice will be “don’t do that”, as it is currently, and OpenAI will ignore it, as they do currently.
Sure, they could actually mean “build AGI in a few decades when alignment is solved and we’re gonna freeze all our current AGI building efforts long before then”, but no they don’t.
At one point (working off memory here), Sam Altman (leader of OpenAI) didn’t quite agree with the orthogonality thesis. After some discussion and emailing with someone on the Eleuther discord (iirc), he shifted to agree with it more fully. I think.
This ties into my overall point of “some of this might be adversarial, but first let’s see if it’s just straight-up neglected along some vector we haven’t looked much at yet”.
How many of the decision makers in the companies mentioned care about or even understand the control problem? My impression was: not many.
Coordination is hard even when you share the same goals, but we don’t have that luxury here.
Current OpenAI wants to build AGI.[1] Current MIRI could confidently tell them that this is a very bad idea. Sure they could be advised that step 25 of their AGI building plan is dangerous, but so were steps 1 through 24.
MIRI’s advice to them won’t be “oh implement this safety measure and you’re golden” because there’s no such safety measure because we won’t have solved alignment by then. The advice will be “don’t do that”, as it is currently, and OpenAI will ignore it, as they do currently.
Sure, they could actually mean “build AGI in a few decades when alignment is solved and we’re gonna freeze all our current AGI building efforts long before then”, but no they don’t.
At one point (working off memory here), Sam Altman (leader of OpenAI) didn’t quite agree with the orthogonality thesis. After some discussion and emailing with someone on the Eleuther discord (iirc), he shifted to agree with it more fully. I think.
This ties into my overall point of “some of this might be adversarial, but first let’s see if it’s just straight-up neglected along some vector we haven’t looked much at yet”.