2022 MIRI Alignment Discussion

A collection of MIRI write-ups and conversations about alignment released in 2022, following the Late 2021 MIRI Conversations.

Six Di­men­sions of Oper­a­tional Ad­e­quacy in AGI Projects

AGI Ruin: A List of Lethalities

A cen­tral AI al­ign­ment prob­lem: ca­pa­bil­ities gen­er­al­iza­tion, and the sharp left turn

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

The in­or­di­nately slow spread of good AGI con­ver­sa­tions in ML

A note about differ­en­tial tech­nolog­i­cal development

Brain­storm of things that could force an AI team to burn their lead

AGI ruin sce­nar­ios are likely (and dis­junc­tive)

Where I cur­rently dis­agree with Ryan Green­blatt’s ver­sion of the ELK approach

Why all the fuss about re­cur­sive self-im­prove­ment?

Hu­mans aren’t fit­ness maximizers

Warn­ing Shots Prob­a­bly Wouldn’t Change The Pic­ture Much

What does it mean for an AGI to be ‘safe’?

Don’t leave your finger­prints on the future

Nice­ness is unnatural

Con­tra shard the­ory, in the con­text of the di­a­mond max­i­mizer problem

Notes on “Can you con­trol the past”

De­ci­sion the­ory does not im­ply that we get to have nice things

Su­per­in­tel­li­gent AI is nec­es­sary for an amaz­ing fu­ture, but far from sufficient

How could we know that an AGI sys­tem will have good con­se­quences?

Dist­in­guish­ing test from training

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

Thoughts on AGI or­ga­ni­za­tions and ca­pa­bil­ities work