Regarding your other points, maybe you will find it interesting to read Carlsmith’s doc on how to control AI motivations:
To be clear, this is a description of the alignment problem, not a solution. To quote from it:
in many respects, the picture above functions, in my head, centrally as a structured decomposition of the problems that an adequate approach to motivation control needs to overcome. It’s certainly not a “solution” to the alignment problem, in the sense of “a detailed, do-able, step-by-step plan that will work with high-confidence, and which requires only realistic deviation from the default trajectory.” And on its own, I’m not sure it even warrants the term “plan.”
(from the other comment):
I don’t have capacity to continue this discussion, but thanks for sharing your perspective.
Sure, I suspect this is a fine enough place to stop the conversation for now.
To be clear, this is a description of the alignment problem, not a solution. To quote from it:
(from the other comment):
Sure, I suspect this is a fine enough place to stop the conversation for now.