Was on Vivek Hebbar’s team at MIRI, now working with Adrià Garriga-Alonso onvarious empirical alignment projects.
I’m looking for projects in interpretability, activation engineering, and control/oversight; DM me if you’re interested in working with me.
I want to see a dialogue happen between someone with Nate’s beliefs and someone with Nora’s beliefs. The career decisions of hundreds of people, including myself, depend on clearly thinking through the arguments behind various threat models. I find it pretty embarrassing for the field that there is mutual contempt between people who disagree the most, when such a severe disagreement means the greatest opportunity to understand basic dynamics behind AGI.
Sure, communication is hard sometimes so maybe the dialogue is infeasible, and in fact I can’t think of any particular people I’d want to do this. It still makes me sad.