mattmacdermott

Karma: 1,184

Is instrumental convergence a thing for virtue-driven agents?

mattmacdermottApr 2, 2025, 3:59 AM

33 points

37 comments2 min readLW link

Validating against a misalignment detector is very different to training against one

mattmacdermottMar 4, 2025, 3:41 PM

29 points

4 comments4 min readLW link

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Yoshua Bengio, Jesse Richardson, dwk and mattmacdermott

Feb 24, 2025, 6:31 PM

44 points

15 comments11 min readLW link

Context-dependent consequentialism

Jeremy Gillen and mattmacdermott

Nov 4, 2024, 9:29 AM

31 points

6 comments27 min readLW link

Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)

mattmacdermottSep 1, 2024, 7:46 AM

26 points

0 comments5 min readLW link

(yoshuabengio.org)

Bengio’s Alignment Proposal: “Towards a Cautious Scientist AI with Convergent Safety Bounds”

mattmacdermottFeb 29, 2024, 1:59 PM

76 points

19 comments14 min readLW link

(yoshuabengio.org)

mattmacdermott’s Shortform

mattmacdermottJan 3, 2024, 9:08 AM

4 points

32 comments LW link

What’s next for the field of Agent Foundations?

Nora_Ammann, Alexander Gietelink Oldenziel and mattmacdermott

Nov 30, 2023, 5:55 PM

59 points

23 comments10 min readLW link

Optimisation Measures: Desiderata, Impossibility, Proposals

mattmacdermott and Alexander Gietelink Oldenziel

Aug 7, 2023, 3:52 PM

36 points

9 comments1 min readLW link

Reward Hacking from a Causal Perspective

tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott and RyanCarey

Jul 21, 2023, 6:27 PM

29 points

6 comments7 min readLW link

Incentives from a causal perspective

tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall and Jonathan Richens

Jul 10, 2023, 5:16 PM

27 points

0 comments6 min readLW link

Agency from a causal perspective

tom4everitt, mattmacdermott, James Fox, Francis Rhys Ward and Jonathan Richens

Jun 30, 2023, 5:37 PM

40 points

5 comments6 min readLW link

Introduction to Towards Causal Foundations of Safe AGI

tom4everitt, Lewis Hammond, Francis Rhys Ward, RyanCarey, James Fox, mattmacdermott and sbenthall

Jun 12, 2023, 5:55 PM

67 points

6 comments4 min readLW link

Some Summaries of Agent Foundations Work

mattmacdermottMay 15, 2023, 4:09 PM

62 points

1 comment13 min readLW link

Towards Measures of Optimisation

mattmacdermott and Alexander Gietelink Oldenziel

May 12, 2023, 3:29 PM

53 points

37 comments4 min readLW link

Normative vs Descriptive Models of Agency

mattmacdermottFeb 2, 2023, 8:28 PM

26 points

5 comments4 min readLW link