RSS

David Scott Krueger (formerly: capybaralet)(David Krueger)

Karma: 1,909

I’m more active on Twitter than LW/​AF these days: https://​​twitter.com/​​DavidSKrueger

Bio from https://​​www.davidscottkrueger.com/​​:
I am an Assistant Professor at the University of Cambridge and a member of Cambridge’s Computational and Biological Learning lab (CBL). My research group focuses on Deep Learning, AI Alignment, and AI safety. I’m broadly interested in work (including in areas outside of Machine Learning, e.g. AI governance) that could reduce the risk of human extinction (“x-risk”) resulting from out-of-control AI systems. Particular interests include:

[Link Post] “Foun­da­tional Challenges in As­sur­ing Align­ment and Safety of Large Lan­guage Models”

David Scott Krueger (formerly: capybaralet)6 Jun 2024 18:55 UTC
69 points
2 comments6 min readLW link
(llm-safety-challenges.github.io)

Test­ing for con­se­quence-blind­ness in LLMs us­ing the HI-ADS unit test.

David Scott Krueger (formerly: capybaralet)24 Nov 2023 23:35 UTC
25 points
2 comments2 min readLW link