I like breaking models.
Researching red-teaming as part of the MATS 10.0 program with UK AISI. Dabbled a bit with mechanistic interp, blue-teaming and reverse engineering; I’m especially interested in discovering vulnerabilities and flaws in generative models and the science of post-training.