tom4everitt

Karma: 454

Research Scientist at DeepMind

Reward Hacking from a Causal Perspective

tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott and RyanCarey

21 Jul 2023 18:27 UTC

29 points

6 comments7 min readLW link

Incentives from a causal perspective

tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall and Jonathan Richens

10 Jul 2023 17:16 UTC

27 points

0 comments6 min readLW link

Agency from a causal perspective

tom4everitt, mattmacdermott, James Fox, Francis Rhys Ward and Jonathan Richens

30 Jun 2023 17:37 UTC

40 points

5 comments6 min readLW link

Causality: A Brief Introduction

tom4everitt, Lewis Hammond, Jonathan Richens, Francis Rhys Ward, RyanCarey, sbenthall and James Fox

20 Jun 2023 15:01 UTC

49 points

18 comments6 min readLW link

Introduction to Towards Causal Foundations of Safe AGI

tom4everitt, Lewis Hammond, Francis Rhys Ward, RyanCarey, James Fox, mattmacdermott and sbenthall

12 Jun 2023 17:55 UTC

74 points

6 comments4 min readLW link

Progress on Causal Influence Diagrams

tom4everitt30 Jun 2021 15:34 UTC

73 points

6 comments9 min readLW link

Specification gaming: the flip side of AI ingenuity

Vika, Vlad Mikulik, Matthew Rahtz, tom4everitt, Zac Kenton and janleike

6 May 2020 23:51 UTC

69 points

9 comments6 min readLW link

CIRL Wireheading

tom4everitt8 Aug 2017 6:33 UTC

3 points

4 comments2 min readLW link

Sequential Extensions of Causal and Evidential Decision Theory

tom4everitt15 Oct 2015 23:45 UTC

2 points

0 comments1 min readLW link

(arxiv.org)