Finding warning shots of Loss of Control in controlled scenarios and Improving Corrigibility in LLMs to mitigate this, connecting work to AI Governance.
Previously built a VC-backed AI Agents startup and graduated with a degree in AI from Carnegie Mellon.
I subscribe to Crocker’s Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html—inspired by Daniel Kokotajlo.
(xkcd meme)