mishka comments on Nontrivial pillars of IABIED

mishka 18 Oct 2025 5:04 UTC
8 points
0
There seem to be more cruxes.

E.g. Eliezer’s approach tends to assume that the ability to impart arbitrary goals and values to the ASIs is 1) necessary for a good outcome, and 2) not a detriment for a good outcome.

It’s kind of strange. Why do we want to have a technical ability for any Mr.X from the defense department of a superpower Y to impart his goals and values to some ASI? It’s very easy to imagine how this could be detrimental.

And the assumption that we need a technical ability which is that strong to have a decent shot at a good outcome, rather than an ability to only impart goals and values for a very restricted carefully selected class of values and goals (selected not only for desirability, but also for feasibility, so not CEV, but something more modest and less distant from instrumental drives of advanced AI systems), this assumption needs a much stronger justification that justifications which have ever been given (to the best of my knowledge).

This seems like a big crux. This superstrong “arbitrary alignment capability” is very difficult (almost impossible) to achieve, and it’s not clear if that much is needed, and there seem to be big downsides of having that much because of all kinds of misuse potential.