I’m interested in doing in-depth dialogues to find cruxes. Message me if you are interested in doing this.
I do alignment research, mostly stuff that is vaguely agent foundations. Currently doing independent alignment research on ontology identification. Formerly on Vivek’s team at MIRI.
That’s closer to being a conclusion rather than a premise. This section of this post or this is the main argument for that. It’s just an underspecification argument, you could see it as a generalization of Carlsmith’s counting argument.
It’s interesting that your framing is “high confidence there’s no underlying corrigible motivation”, and mine is more like “unlikely it starts without flaws and the improvement process is under-specified in ways that won’t fix large classes of flaws”. I think the arguments linked support my view. Possibly I’ve not made some background reasoning or assumptions explicit.
I’d be happy to video call if you want to talk about this, I think that’d be a quicker way for us to work out where the miscommunication is.