Concerning point 35 about playing AIs off against each other: I analyzed a particular scenario like this in a recent post and also came to the conclusion that cooperation between the AIs is the default outcome in many scenarios. However, in the last subsection of that post, I start thinking about some ways to prevent an acausal trade as Eliezer describes it here (committing to sharing the universe with any AI reviewing the code). The idea is roughly that the code and as much information as possible about the AI doing the checking will be deleted before the checked nanosystem is released. In this way, it’s not possible for the escaping AI to share half the universe with its checker, since not enough information about the checker and its goals is left in the universe to enact this compromise. This reduces the incentive of the checker to cooperate/hide alignment faults.
The post above describes various caveats and other things that could go wrong in such a setup. I tried my best at being pessimistic and pointing out various failure modes.
I think that “the value alignment problem” is not something that currently has a universally acknowledged and precise definition and a lot of the work that is currently being done is to get less confused about what is meant by this.
From what I see, in your proof you have started from a particular meaning of this term and then went on to show it is impossible.
Here you observe that if “the value alignment problem” means to construct something which has the values of all humans at the same time, it is impossible because there exist humans with contradictory values. So you propose the new definition “to construct something with all human moral values”. You continue to observe that the four moral values you give are also contradictory, so this is also impossible.
So now we are looking at the definition “to program for the four different utility functions at the same time”. As has been observed in a different comment, this is somewhat underspecified and there might be different ways to interpret and implement it. For one such way you predict
It seems to me that the scenario behind this course of events would be: we build an AI, give it the four moralities and noticing their internal contradictions, it analyzes them to find that they serve the purpose of conflict resolution. Then it proceeds to make this its new, consistent goal and builds these tiny conflict scenarios. I’m not saying that this is implausible, but I don’t think it is a course of events without alternatives (and these would depend on the way the AI is built to resolve conflicting goals).
To summarize, I think out of the possible specifications of “the value alignment problem”, you picked three (all human values, all human moral values, “optimizing the four moralities”) and showed that the first two are impossible and the third leads to undesired consequences (under some further assumptions).
However, I think there are many things which people would consider a solution of “the value alignment problem” and which don’t satisfy one of these three descriptions. Maybe there is a subset of the human values without contradiction, such that most people would be reasonably happy with the result of a superhuman AI optimizing these values. Maybe the result of an AI maximizing only the “Maximize Flourishing”-morality would lead to a decent future. I would be the first to admit that those scenarios I describe are themselves severely underspecified, just vaguely waving at a subset of the possibility space, but I imagine that these subsets could contain things we would call “a solution of the value alignment problem”.