Was on Vivek Hebbar’s team at MIRI, now working with Adrià Garriga-Alonso onvarious empirical alignment projects.
I’m looking for projects in interpretability, activation engineering, and control/oversight; DM me if you’re interested in working with me.
Seems reasonable except that Eliezer’s p(doom | trying to solve alignment) in early 2023 was much higher than 50%, probably more like 98%. AGI Ruin was published in June 2022 and drafts existed since early 2022. MIRI leadership had been pretty pessimistic ever since AlphaGo in 2016 and especially since their research agenda collapsed in 2019.