AI Safety Subprojects

Stuart_Armstrong

20 Sep 2021 12:18 UTC

There are the AI safety subprojects designed for elucidating “model splintering” and “learning the preferences of irrational agents”.

Immobile AI makes a move: anti-wireheading, ontology change, and model splintering

Stuart_Armstrong17 Sep 2021 15:24 UTC

32 points

3 comments2 min readLW link

AI, learn to be conservative, then learn to be less so: reducing side-effects, learning preserved features, and going beyond conservatism

Stuart_Armstrong20 Sep 2021 11:56 UTC

14 points

4 comments3 min readLW link

AI learns betrayal and how to avoid it

Stuart_Armstrong30 Sep 2021 9:39 UTC

30 points

4 comments2 min readLW link

Force neural nets to use models, then detect these

Stuart_Armstrong5 Oct 2021 11:31 UTC

17 points

8 comments2 min readLW link

Preferences from (real and hypothetical) psychology papers

Stuart_Armstrong6 Oct 2021 9:06 UTC

15 points

0 comments2 min readLW link

Finding the multiple ground truths of CoinRun and image classification

Stuart_Armstrong8 Dec 2021 18:13 UTC

15 points

4 comments2 min readLW link