Quintin’s Alignment Papers Roundup

Quintin’s al­ign­ment pa­pers roundup—week 1

Quintin’s al­ign­ment pa­pers roundup—week 2

QAPR 3: in­ter­pretabil­ity-guided train­ing of neu­ral nets

QAPR 4: In­duc­tive biases

QAPR 5: grokking is maybe not *that* big a deal?