“Why Not Just...”johnswentworth8 Aug 2022 18:15 UTCA compendium of rants about alignment proposals, of varying charitability.Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etcjohnswentworth4 Jun 2022 5:41 UTC142 points53 comments2 min readLW link1 reviewGodzilla Strategiesjohnswentworth11 Jun 2022 15:44 UTC146 points71 comments3 min readLW linkRant on Problem Factorization for Alignmentjohnswentworth5 Aug 2022 19:23 UTC90 points51 comments6 min readLW linkInterpretability/Tool-ness/Alignment/Corrigibility are not Composablejohnswentworth8 Aug 2022 18:05 UTC129 points12 comments3 min readLW linkHow To Go From Interpretability To Alignment: Just Retarget The Searchjohnswentworth10 Aug 2022 16:08 UTC179 points33 comments3 min readLW link1 reviewOversight Misses 100% of Thoughts The AI Does Not Thinkjohnswentworth12 Aug 2022 16:30 UTC97 points50 comments1 min readLW linkHuman Mimicry Mainly Works When We’re Already Closejohnswentworth17 Aug 2022 18:41 UTC80 points16 comments5 min readLW linkWorlds Where Iterative Design Failsjohnswentworth30 Aug 2022 20:48 UTC190 points30 comments10 min readLW link1 reviewWhy Not Just… Build Weak AI Tools For AI Alignment Research?johnswentworth5 Mar 2023 0:12 UTC156 points17 comments6 min readLW linkWhy Not Just Outsource Alignment Research To An AI?johnswentworth9 Mar 2023 21:49 UTC126 points47 comments9 min readLW linkOpenAI Launches Superalignment TaskforceZvi11 Jul 2023 13:00 UTC149 points40 comments49 min readLW link(thezvi.wordpress.com)