I agree that a distillation of a complex problem statement to a simple technical problem represents real understanding and progress, and is valuable thereby. But I don’t think your summary of the first half of the AI safety problem is one of these.
The central difficulty that stops this from being a “mere” engineering problem is that we don’t know what “safe” is to mean in practice; that is, we don’t understand in detail *what properties we would desire a solution to satisfy*. From an engineering perspective, that marks the difference between a hard problem, and a confused (and, usually, confusing) problem.
When people were first trying to build an airplane, they could write down a simple property that would characterize a solution to the problem they were solving: (thing) is heavier than air yet manages to stay out of contact with the ground, for, let say at least minutes at a time. Of course this was never the be-all end-all of what they were trying to accomplish, but this was the central hard problem a solution of which they expected to be able to build on incrementally into the unknown direction of Progress.
I can say the same for, for example, the “intelligence” part of the AI safety problem. Using Eliezer Yudkowsky’s optimization framework, I think I have a decent idea of what properties I would want a system to have when I say I want to build an “intelligence”. That understanding may or may not be the final word on the topic for all time, but at least it is a distillation that can function as a “mere” engineering problem, a solution for which I can recognize as such and which we can then improve on.
But for the “safe” part of the problem, I don’t have a good idea about what properties I want the system to achieve at all. I have a lot of complex intuitions on the problem, including simple-ish ideas that seem to be an important part of it and some insight of what is definitely *not* what I want, but I can’t distill this down to a technical requirement that I could push towards. If you were to just hand me a candidate safe AI on a platter, I don’t think I could recognize it for what it is; I could definitely reject *some* failed attempts, but I could not tell whether your candidate solution is actually correct or whether it has a flaw I did not see yet. Unless your solution comes with a mighty lecture series explaining exactly why your solution is what I actually want, it will not count as a “solution”. Which makes the “safe” part of your summary, in my mind, neither really substantive understanding, nor a technical engineering problem yet.
To me, this form of “epistemic should” doesn’t feel like a responsibility-dodge at all. To me, it carries a very specific meaning of a particular warning: “my abstract understanding predicts that X will happen, but there are a thousand and one possible gotchas that could render that abstract understanding inapplicable, and I have no specific concrete experience with this particular case, so I attach low confidence to this prediction; caveat emptor”. It is not a shoving off of responsibility, so much as a marker of low confidence, and a warning to everyone not to put their weight down on this prediction.
Of course, if you make such a claim and then proceed to DO put your weight down on the low-confidence prediction without a very explicit decision to gamble in this way, then you really are shoving responsibility under the carpet. But that is not how I have experienced this term being used, either by me or by those around me.