I’m an Astra Fellow working with Redwood Research on high-stakes control methods.
In reverse date order, I have been a:
MATS 8.1 scholar, mentored by Micah Carroll
We wrote the paper Prompt Optimization Makes Misalignment Legible
Software engineer at Google Gemini
Worked part-time with GDM Scalable Alignment on their MONA paper
President of Cornell Effective Altruism
I enjoy tabletop games (as a player or GM), board games, meditation, partner dancing, bouldering, making music, reading (esp. hard sci-fi/fantasy), podcasts, and hanging out with my friends.
The kind of intellectual work I enjoy often involves thinking about systems, working out what they incentivize, and iterating to improve those incentives.
I have not signed any contracts that I can’t mention exist, as of March 27, 2026. I’ll try to update this statement at least once a year, so long as it’s true. I added this statement thanks to the one in the gears to ascension’s bio.
One approach to automating AI safety research:
Scrape LessWrong for AI safety experiment ideas
Tell your favorite AI to implement everything that looks doable
Profit (in the sweet currency of impact)
Many people have posted ideas about AI safety on LW. But not many competent researchers want to read through other people’s random experiment ideas, figure out whether they’re any good, and implement them, when they could be working on their own ideas. We may be able to pick a lot of low-hanging fruit by making AIs do that work.
Could we try this plan with, say, Claude Mythos? I’m not sure. One major obstacle is taste: the AI has to independently implement a high-level experiment idea, and even proactively modify the original idea when it notices flaws. Also, once it’s run thousands of different experiments, it needs to understand which results are most exciting and worth showing to a human, and make the reasons for its excitement legible enough for the human to verify.
This kind of taste might not be strictly necessary for capabilities research, which is mostly about hill-climbing on benchmarks, but it’s critical for fuzzy, conceptual tasks like AI safety research. What would we need to do in order to point an AI at a giant pile of vague, underspecified AI safety research ideas and feel confident we’d get something useful out of it?[1]
Partly for this reason, I’d encourage people to release any AI-safety-related ideas they’re sitting on. (H/t to @Kaarel for publishing their own notes and inspiring this post.) I’ve been wanting to polish and write up several ideas myself, but in the meantime, I just made my “AI safety ideas” Google Doc public. While I can’t promise my personal notes will make sense to anyone else, any interested humans or AIs should feel free to take a look :)
Excluding capabilities improvements that we expect to happen soon by default, which safety-focused people should probably not work on.