“Designing agent incentives to avoid reward tampering”, DeepMind
gwern
14 Aug 2019 16:57 UTC
LW: 28 AF: 8
15
comments
1
min read
LW
link
Outer Alignment
Goodhart's Law
Machine Learning (ML)
Post permalink
Link without comments
Link without top nav bars
Link without comments or top nav bars
Link post
What links here?
2019 AI Alignment Literature Review and Charity Comparison by
Larks
(EA Forum;
19 Dec 2019 2:58 UTC
; 147 points)
2019 AI Alignment Literature Review and Charity Comparison by
Larks
(
19 Dec 2019 3:00 UTC
; 130 points)
Steven Byrnes
's comment on Discussion with Eliezer Yudkowsky on AGI interventions by
Rob Bensinger
(
15 Nov 2021 16:20 UTC
; 45 points)
Wei Dai
's comment on Problems in AI Alignment that philosophers could potentially contribute to by
Wei Dai
(
18 Aug 2019 17:35 UTC
; 25 points)
gwern
's comment on Is keeping AI “in the box” during training enough? by
tgb
(
7 Jul 2021 3:04 UTC
; 4 points)
Back to top
“Designing agent incentives to avoid reward tampering”, DeepMind
Link post