Two clarifications about “Strategic Background”

I’ve talked to a few peo­ple who mi­s­un­der­stood im­por­tant parts of the “strate­gic back­ground” dis­cus­sion in https://​in­tel­li­gence.org/​2017/​12/​01/​miris-2017-fundraiser/​#3.

First, at least two peo­ple thought the 1-8 num­bered list was “MIRI’s or­ga­ni­za­tional plan” rather than “what we’d be least sur­prised to see hap­pen in the world, con­di­tional on good out­comes.” MIRI is try­ing to de-con­fuse it­self about step 8 and help put AGI de­vel­op­ers in a bet­ter po­si­tion in the fu­ture to se­lect for AGI de­signs that are al­ign­ment-con­ducive, not try­ing to de­velop AGI.

Se­cond, at least two other peo­ple mis­read “min­i­mal al­igned AGI” as “min­i­mally al­igned AGI”, and thought MIRI was say­ing that de­vel­op­ers should do the bare min­i­mum of al­ign­ment work and then de­ploy im­me­di­ately; or they saw that we were recom­mend­ing build­ing “sys­tems with the bare min­i­mum of ca­pa­bil­ities for end­ing the acute risk pe­riod” and thought we were recom­mend­ing this as an al­ter­na­tive to work­ing re­ally hard to achieve highly re­li­able and ro­bust sys­tems.

The MIRI view isn’t “rather than mak­ing al­ign­ment your top pri­or­ity and work­ing re­ally hard to over-en­g­ineer your sys­tem for safety, try to build a sys­tem with the bare min­i­mum of ca­pa­bil­ities”. It’s: “in ad­di­tion to mak­ing al­ign­ment your top pri­or­ity and work­ing re­ally hard to over-en­g­ineer your sys­tem for safety, also build the sys­tem to have the bare min­i­mum of ca­pa­bil­ities”.

The idea isn’t that you can get away with cut­ting cor­ners on safety by keep­ing the sys­tem weak; per Eliezer’s se­cu­rity mind­set posts, a good plan should work (or fail safely) if the sys­tem ends up be­ing a lot smarter than in­tended. In­stead, the idea is that shoot­ing for the bare min­i­mum of ca­pa­bil­ities adds a lot of value if your fun­da­men­tals are re­ally good. Every ad­di­tional ca­pa­bil­ity a de­vel­oper needs to al­ign adds some ex­tra difficulty and ad­di­tional points of failure, so de­vel­op­ers should tar­get min­i­mal­ity in ad­di­tion to al­ign­ment.