I’m an Astra Fellow working with Redwood Research on high-stakes control methods.
In reverse date order, I have been a:
MATS 8.1 scholar, mentored by Micah Carroll
We wrote the paper Prompt Optimization Makes Misalignment Legible
Software engineer at Google Gemini
Worked part-time with GDM Scalable Alignment on their MONA paper
President of Cornell Effective Altruism
I enjoy tabletop games (as a player or GM), board games, meditation, partner dancing, bouldering, making music, reading (esp. hard sci-fi/fantasy), podcasts, and hanging out with my friends.
The kind of intellectual work I enjoy often involves thinking about systems, working out what they incentivize, and iterating to improve those incentives.
I have not signed any contracts that I can’t mention exist, as of March 27, 2026. I’ll try to update this statement at least once a year, so long as it’s true. I added this statement thanks to the one in the gears to ascension’s bio.
While I can’t exactly say I’m excited about AI doing massive ES tasks, it could be good news for interpretability research that depends on exhaustively searching for verified explanations of an AI’s behaviors or features. This post on automated SAE research comes to mind.
I’m particularly interested in scaffold optimization for interpretability. E.g. given a task domain, a trusted model T (small, or hasn’t been RLed on the domain), and an untrusted model U (large, or has been RLed on the domain), can we “distill” U into interpretable code that explains to T how to act like U or achieve higher reward? We can verify the scaffold by running it against a validation set and measuring reward or KL divergence from U (easy; may or may not be cheap).