Hi, I’m Rohan! I aim to promote welfare and reduce suffering as much as possible for all sentient beings, which has led me to work on AGI safety research. I am particularly interested in foundation model agents (FMAs): systems like Claude Code that equip foundation models with memory, tool use, and other affordances so they can perform multi-step tasks autonomously.
I am the founder of Aether, an independent research lab focused on foundation model agent safety. I’m also a PhD student at the University of Toronto, where I am supervised by Professor Zhijing Jin and continue to run Aether. Previously, I completed an undergrad in CS and Math at Columbia, where I helped run Columbia Effective Altruism and Columbia AI Alignment Club (CAIAC). I have done research internships with AI Safety Hub Labs (now LASR Labs), UC Berkeley’s Center for Human-Compatible AI (CHAI), and the ML Alignment & Theory Scholars (MATS) program.
I love playing tennis, listening to rock and indie pop music, playing social deduction games, reading fantasy books, watching a fairly varied set of TV shows and movies, and playing the saxophone, among other things.
[Edit: I don’t think this is saying anything that different than my comment above, but it is a slightly different framing.]
Another point that I think might be quite important: we often set ourselves complex subgoals in line with our existing values, and then we try hard to achieve those goals, and we learn how to be more effective consequentialist agents at achieving that type of subgoal. There may be clearer feedback on how well we did at the subgoal than how well we achieved our existing values, but in lots of cases we notice if there’s a significant divergence between what we achieved and our underlying values, which moderates the consequentialist learning and is a pressure towards maintaining alignment.