David Scott Krueger (formerly: capybaralet)(David Krueger)

Karma: 1,840

I’m more active on Twitter than LW/AF these days: https://twitter.com/DavidSKrueger

Bio from https://www.davidscottkrueger.com/:
I am an Assistant Professor at the University of Cambridge and a member of Cambridge’s Computational and Biological Learning lab (CBL). My research group focuses on Deep Learning, AI Alignment, and AI safety. I’m broadly interested in work (including in areas outside of Machine Learning, e.g. AI governance) that could reduce the risk of human extinction (“x-risk”) resulting from out-of-control AI systems. Particular interests include:

Reward modeling and reward gaming
Aligning foundation models
Understanding learning and generalization in deep learning and foundation models, especially via “empirical theory” approaches
Preventing the development and deployment of socially harmful AI systems
Elaborating and evaluating speculative concerns about more advanced future AI systems

A Somewhat Vague Proposal for Grounding Ethics in Physics

David Scott Krueger (formerly: capybaralet)27 Jan 2015 5:45 UTC

−3 points

15 comments1 min readLW link

A Basic Problem of Ethics: Panpsychism?

David Scott Krueger (formerly: capybaralet)27 Jan 2015 6:27 UTC

−5 points

16 comments1 min readLW link

Should we enable public binding precommitments?

David Scott Krueger (formerly: capybaralet)31 Jul 2016 19:47 UTC

1 point

19 comments1 min readLW link

Inefficient Games

David Scott Krueger (formerly: capybaralet)23 Aug 2016 17:47 UTC

26 points

13 comments1 min readLW link

Risks from Approximate Value Learning

David Scott Krueger (formerly: capybaralet)27 Aug 2016 19:34 UTC

7 points

10 comments1 min readLW link

Problems with learning values from observation

David Scott Krueger (formerly: capybaralet)21 Sep 2016 0:40 UTC

2 points

4 comments1 min readLW link

Disambiguating “alignment” and related notions

David Scott Krueger (formerly: capybaralet)5 Jun 2018 15:35 UTC

22 points

21 comments2 min readLW link

Conceptual Analysis for AI Alignment

David Scott Krueger (formerly: capybaralet)30 Dec 2018 0:46 UTC

26 points

3 comments2 min readLW link

Imitation learning considered unsafe?

David Scott Krueger (formerly: capybaralet)6 Jan 2019 15:48 UTC

20 points

11 comments1 min readLW link

The role of epistemic vs. aleatory uncertainty in quantifying AI-Xrisk

David Scott Krueger (formerly: capybaralet)31 Jan 2019 6:13 UTC

15 points

6 comments2 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

David Scott Krueger (formerly: capybaralet)6 Feb 2019 19:09 UTC

25 points

17 comments1 min readLW link

My use of the phrase “Super-Human Feedback”

David Scott Krueger (formerly: capybaralet)6 Feb 2019 19:11 UTC

13 points

0 comments1 min readLW link

X-risks are a tragedies of the commons

David Scott Krueger (formerly: capybaralet)7 Feb 2019 2:48 UTC

9 points

19 comments1 min readLW link

Let’s talk about “Convergent Rationality”

David Scott Krueger (formerly: capybaralet)12 Jun 2019 21:53 UTC

44 points

33 comments6 min readLW link

False assumptions and leaky abstractions in machine learning and AI safety

David Scott Krueger (formerly: capybaralet)28 Jun 2019 4:54 UTC

21 points

3 comments1 min readLW link

Project Proposal: Considerations for trading off capabilities and safety impacts of AI research

David Scott Krueger (formerly: capybaralet)6 Aug 2019 22:22 UTC

25 points

11 comments2 min readLW link

[Question] What are the reasons to not consider reducing AI-Xrisk the highest priority cause?

David Scott Krueger (formerly: capybaralet)20 Aug 2019 21:45 UTC

29 points

27 comments1 min readLW link

[Question] Can indifference methods redeem person-affecting views?

David Scott Krueger (formerly: capybaralet)12 Nov 2019 4:23 UTC

10 points

3 comments1 min readLW link

A fun calibration game: “0-hit Google phrases”

David Scott Krueger (formerly: capybaralet)21 Nov 2019 1:13 UTC

6 points

1 comment1 min readLW link

What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address.

David Scott Krueger (formerly: capybaralet)2 Dec 2019 18:20 UTC

29 points

13 comments3 min readLW link