RSS

David Scott Krueger (formerly: capybaralet)(David Krueger)

Karma: 1,839

I’m more active on Twitter than LW/​AF these days: https://​​twitter.com/​​DavidSKrueger

Bio from https://​​www.davidscottkrueger.com/​​:
I am an Assistant Professor at the University of Cambridge and a member of Cambridge’s Computational and Biological Learning lab (CBL). My research group focuses on Deep Learning, AI Alignment, and AI safety. I’m broadly interested in work (including in areas outside of Machine Learning, e.g. AI governance) that could reduce the risk of human extinction (“x-risk”) resulting from out-of-control AI systems. Particular interests include:

An Up­date on Academia vs. In­dus­try (one year into my fac­ulty job)

David Scott Krueger (formerly: capybaralet)3 Sep 2022 20:43 UTC
121 points
18 comments4 min readLW link

“Pub­lish or Per­ish” (a quick note on why you should try to make your work leg­ible to ex­ist­ing aca­demic com­mu­ni­ties)

David Scott Krueger (formerly: capybaralet)18 Mar 2023 19:01 UTC
98 points
48 comments1 min readLW link

AI x-risk re­duc­tion: why I chose academia over industry

David Scott Krueger (formerly: capybaralet)14 Mar 2021 17:25 UTC
56 points
14 comments3 min readLW link

[Question] Is “gears-level” just a syn­onym for “mechanis­tic”?

David Scott Krueger (formerly: capybaralet)13 Dec 2021 4:11 UTC
48 points
29 comments1 min readLW link

“Cars and Elephants”: a hand­wavy ar­gu­ment/​anal­ogy against mechanis­tic interpretability

David Scott Krueger (formerly: capybaralet)31 Oct 2022 21:26 UTC
48 points
25 comments2 min readLW link

[An email with a bunch of links I sent an ex­pe­rienced ML re­searcher in­ter­ested in learn­ing about Align­ment /​ x-safety.]

David Scott Krueger (formerly: capybaralet)8 Sep 2022 22:28 UTC
47 points
1 comment5 min readLW link

A (EtA: quick) note on ter­minol­ogy: AI Align­ment != AI x-safety

David Scott Krueger (formerly: capybaralet)8 Feb 2023 22:33 UTC
46 points
20 comments1 min readLW link

Let’s talk about “Con­ver­gent Ra­tion­al­ity”

David Scott Krueger (formerly: capybaralet)12 Jun 2019 21:53 UTC
44 points
33 comments6 min readLW link

A list of good heuris­tics that the case for AI x-risk fails

David Scott Krueger (formerly: capybaralet)2 Dec 2019 19:26 UTC
43 points
14 comments2 min readLW link

Why I hate the “ac­ci­dent vs. mi­suse” AI x-risk di­chotomy (quick thoughts on “struc­tural risk”)

David Scott Krueger (formerly: capybaralet)30 Jan 2023 18:50 UTC
32 points
41 comments2 min readLW link

[Question] Is there a name for the the­ory that “There will be fast take­off in real-world ca­pa­bil­ities be­cause al­most ev­ery­thing is AGI-com­plete”?

David Scott Krueger (formerly: capybaralet)2 Sep 2021 23:00 UTC
31 points
8 comments1 min readLW link

What I talk about when I talk about AI x-risk: 3 core claims I want ma­chine learn­ing re­searchers to ad­dress.

David Scott Krueger (formerly: capybaralet)2 Dec 2019 18:20 UTC
29 points
13 comments3 min readLW link

[Question] What are the rea­sons to *not* con­sider re­duc­ing AI-Xrisk the high­est pri­or­ity cause?

David Scott Krueger (formerly: capybaralet)20 Aug 2019 21:45 UTC
29 points
27 comments1 min readLW link

Mechanis­tic In­ter­pretabil­ity as Re­v­erse Eng­ineer­ing (fol­low-up to “cars and elephants”)

David Scott Krueger (formerly: capybaralet)3 Nov 2022 23:19 UTC
28 points
3 comments1 min readLW link

Con­cep­tual Anal­y­sis for AI Align­ment

David Scott Krueger (formerly: capybaralet)30 Dec 2018 0:46 UTC
26 points
3 comments2 min readLW link

Quick thoughts on “scal­able over­sight” /​ “su­per-hu­man feed­back” research

David Scott Krueger (formerly: capybaralet)25 Jan 2023 12:55 UTC
26 points
9 comments2 min readLW link

Ineffi­cient Games

David Scott Krueger (formerly: capybaralet)23 Aug 2016 17:47 UTC
26 points
13 comments1 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

David Scott Krueger (formerly: capybaralet)6 Feb 2019 19:09 UTC
25 points
17 comments1 min readLW link