“Designing agent incentives to avoid reward tampering”, DeepMind

gwern14 Aug 2019 16:57 UTC

LW: 28 AF: 8

15 comments1 min readLW link

Goodhart's Law Outer Alignment Machine Learning (ML)

What links here?

2019 AI Alignment Literature Review and Charity Comparison by Larks (EA Forum; 19 Dec 2019 2:58 UTC; 147 points)
2019 AI Alignment Literature Review and Charity Comparison by Larks (19 Dec 2019 3:00 UTC; 130 points)
Steven Byrnes's comment on Discussion with Eliezer Yudkowsky on AGI interventions by Rob Bensinger (15 Nov 2021 16:20 UTC; 45 points)
AI Safety Interventions by Gunnar_Zarncke (24 Nov 2025 22:28 UTC; 28 points)
Wei Dai's comment on Problems in AI Alignment that philosophers could potentially contribute to by Wei Dai (18 Aug 2019 17:35 UTC; 25 points)
gwern's comment on Is keeping AI “in the box” during training enough? by tgb (7 Jul 2021 3:04 UTC; 4 points)