I’m interested in doing in-depth dialogues to find cruxes. Message me if you are interested in doing this.
I do alignment research, mostly stuff that is vaguely agent foundations. Currently doing independent alignment research on ontology identification. Formerly on Vivek’s team at MIRI.
2 is close enough. Extrapolating the results of safe tests to unsafe settings requires a level of theoretical competence we don’t currently have. Steve Brynes just made a great post that is somewhat related, I endorse everything in that post.