“Why Not Just...”

8 Aug 2022 18:15 UTC

A compendium of rants about alignment proposals, of varying charitability.

Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

johnswentworth4 Jun 2022 5:41 UTC

148 points

55 comments2 min readLW link 1 review

Godzilla Strategies

johnswentworth11 Jun 2022 15:44 UTC

145 points

71 comments3 min readLW link

Rant on Problem Factorization for Alignment

johnswentworth5 Aug 2022 19:23 UTC

90 points

51 comments6 min readLW link

Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth8 Aug 2022 18:05 UTC

130 points

12 comments3 min readLW link

How To Go From Interpretability To Alignment: Just Retarget The Search

johnswentworth10 Aug 2022 16:08 UTC

195 points

34 comments3 min readLW link 1 review

Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth12 Aug 2022 16:30 UTC

100 points

50 comments1 min readLW link

Human Mimicry Mainly Works When We’re Already Close

johnswentworth17 Aug 2022 18:41 UTC

81 points

16 comments5 min readLW link

Worlds Where Iterative Design Fails

johnswentworth30 Aug 2022 20:48 UTC

205 points

30 comments10 min readLW link 1 review

Why Not Just… Build Weak AI Tools For AI Alignment Research?

johnswentworth5 Mar 2023 0:12 UTC

158 points

18 comments6 min readLW link

Why Not Just Outsource Alignment Research To An AI?

johnswentworth9 Mar 2023 21:49 UTC

126 points

47 comments9 min readLW link

OpenAI Launches Superalignment Taskforce

Zvi11 Jul 2023 13:00 UTC

149 points

40 comments49 min readLW link

(thezvi.wordpress.com)