Alignment Hot Take Advent Calendar

Take 1: We’re not go­ing to re­verse-en­g­ineer the AI.

Take 2: Build­ing tools to help build FAI is a le­gi­t­i­mate strat­egy, but it’s dual-use.

Take 3: No in­de­scrib­able heav­en­wor­lds.

Take 4: One prob­lem with nat­u­ral ab­strac­tions is there’s too many of them.

Take 5: Another prob­lem for nat­u­ral ab­strac­tions is laz­i­ness.

Take 6: CAIS is ac­tu­ally Or­wellian.

Take 7: You should talk about “the hu­man’s util­ity func­tion” less.

Take 8: Queer the in­ner/​outer al­ign­ment di­chotomy.

Take 9: No, RLHF/​IDA/​de­bate doesn’t solve outer al­ign­ment.

Take 10: Fine-tun­ing with RLHF is aes­thet­i­cally un­satis­fy­ing.

Take 11: “Align­ing lan­guage mod­els” should be weirder.

Take 12: RLHF’s use is ev­i­dence that orgs will jam RL at real-world prob­lems.

Take 13: RLHF bad, con­di­tion­ing good.

Take 14: Cor­rigi­bil­ity isn’t that great.