I research intelligence and it’s emergence and expression in neural networks to ensure advanced AI is safe and beneficial.
I’m currently a Research Scientist at UK AISI working on training and interpreting model organisms of misalignment — such as of reward hacking, evaluation awareness, and sandbagging.
For more, check out my scholar profile and personal website.
Thanks! Yep, makes sense—that’s one of the things we’ll be working on and hope to share some results soon!