Francis Rhys Ward

Karma: 310

On Agent Incentives to Manipulate Human Feedback in Multi-Agent Reward Learning Scenarios

Francis Rhys Ward3 Apr 2022 18:20 UTC

27 points

10 comments8 min readLW link

For every choice of AGI difficulty, conditioning on gradual take-off implies shorter timelines.

Francis Rhys Ward21 Apr 2022 7:44 UTC

31 points

13 comments3 min readLW link