I’m an Astra Fellow working with Redwood Research on high-stakes control methods.
In reverse date order, I have been a:
MATS 8.1 scholar, mentored by Micah Carroll
We wrote the paper Prompt Optimization Makes Misalignment Legible
Software engineer at Google Gemini
Worked part-time with GDM Scalable Alignment on their MONA paper
President of Cornell Effective Altruism
I enjoy tabletop games (as a player or GM), board games, meditation, partner dancing, bouldering, making music, reading (esp. hard sci-fi/fantasy), podcasts, and hanging out with my friends.
The kind of intellectual work I enjoy often involves thinking about systems, working out what they incentivize, and iterating to improve those incentives.
I have not signed any contracts that I can’t mention exist, as of March 27, 2026. I’ll try to update this statement at least once a year, so long as it’s true. I added this statement thanks to the one in the gears to ascension’s bio.
Maybe the person who bumped into you walks very carefully almost all the time, and this is a once-in-a-decade freak accident. Maybe they were speedwalking to extinguish a fire that was about to spread and burn their house down. In this case, maybe their policy update should actually be “you know, this is the first time I’ve bumped into someone in 10 years, and my house burned down because I didn’t run. I should probably move faster.”
Nonetheless, they can feel bad and responsible for bumping into you, and apologize for it. I think that sort of feeling is more or less what makes an apology genuine. I think this is true even if the apologizer is about to make the opposite policy update from what you’d naively expect!
But how can you know that they are genuine, that they’re not just putting on a show, that their claim of contrition is credible? Well, prosocial humans usually feel bad/responsible when they hurt someone, because they feel real love and respect for their fellow humans. A genuine apology is evidence that they are such a human. Often you can tell if someone actually feels bad/responsible by judging their tone and facial expressions and so forth.
This is better, perhaps even more credible, than offering to sign a legally-binding contract saying that they owe a specific behavioral change or IOU. I’d rather someone genuinely feel sorry about what happened, and be on the lookout for ways to make it up to me (or even others in my reference class) as they see fit, than for us to mutually agree upon terms by which they will half-heartedly recompense me. For something as minor as bumping into me, being extra friendly to me in a future conversation is probably more than enough, not that I’d necessarily notice or track that explicitly. This agreement is way vaguer and less enforceable, but it’s way more flexible and has lower transaction costs, which I think is overall a big win!
If someone bumps into me, I don’t actually care much whether they update their policy about walking quickly, unless I think they clearly do walk too quickly. If they don’t apologize, I’ll update at least slightly that they’re rude and self-absorbed, and I think that’s a reasonable update to make.
Of course, if I learn later that they were trying to stop their house from burning down, I’ll almost entirely revert that update. I will do this automatically, using the neat social machinery that is already built into my head, rather than theorizing about ledgers and expected values and so on.
In general I’m pretty skeptical of ideas to adopt “new and improved” social norms that are substantially different from what society has already landed on, especially for a norm as ancient and culturally universal as “apologizing.” If you think we should be doing something very different, I think you’re probably overlooking something!