I think that “the value alignment problem” is not something that currently has a universally acknowledged and precise definition and a lot of the work that is currently being done is to get less confused about what is meant by this.
From what I see, in your proof you have started from a particular meaning of this term and then went on to show it is impossible.
Which means that human values, or at least the individual non-morality-based values don’t converge, which means that you can’t design an artificial superintelligence that contains a term for all human values
Here you observe that if “the value alignment problem” means to construct something which has the values of all humans at the same time, it is impossible because there exist humans with contradictory values. So you propose the new definition “to construct something with all human moral values”. You continue to observe that the four moral values you give are also contradictory, so this is also impossible.
And even if somehow you could program an intelligence to optimize for those four competing utility functions at the same time,
So now we are looking at the definition “to program for the four different utility functions at the same time”. As has been observed in a different comment, this is somewhat underspecified and there might be different ways to interpret and implement it. For one such way you predict
that would just cause it to optimize for conflict resolution, and then it would just tile the universe with tiny artificial conflicts between artificial agents for it to resolve as quickly and efficiently as possible without letting those agents do anything themselves.
It seems to me that the scenario behind this course of events would be: we build an AI, give it the four moralities and noticing their internal contradictions, it analyzes them to find that they serve the purpose of conflict resolution. Then it proceeds to make this its new, consistent goal and builds these tiny conflict scenarios. I’m not saying that this is implausible, but I don’t think it is a course of events without alternatives (and these would depend on the way the AI is built to resolve conflicting goals).
To summarize, I think out of the possible specifications of “the value alignment problem”, you picked three (all human values, all human moral values, “optimizing the four moralities”) and showed that the first two are impossible and the third leads to undesired consequences (under some further assumptions).
However, I think there are many things which people would consider a solution of “the value alignment problem” and which don’t satisfy one of these three descriptions. Maybe there is a subset of the human values without contradiction, such that most people would be reasonably happy with the result of a superhuman AI optimizing these values. Maybe the result of an AI maximizing only the “Maximize Flourishing”-morality would lead to a decent future. I would be the first to admit that those scenarios I describe are themselves severely underspecified, just vaguely waving at a subset of the possibility space, but I imagine that these subsets could contain things we would call “a solution of the value alignment problem”.
Given how central the execution of a pivotal act seems to be to our current best attempt at an alignment strategy (see point 6 of EY’s post) I was confused about finding very little discussion about possible approaches here in the forum. Does the quote above already fully explain this (since all promising approaches are too far out of the Overton window to discuss publically)? Or has no one gotten around to initializing such a conversation? Or, quite possibly, have I overlooked extensive discussions in this direction?
It seems to me that having a long document with the 20 most commonly proposed approaches to such a pivotal act together with an analysis of their strengths and weaknesses, possibilities to give comments etc could be quite valuable for people who want to start thinking about such approaches. Also there is always a possibility of someone just having a really great idea (or maybe person A having a flawed idea containing the seed of a great idea, that inspires person B to propose a fix). Would other people also find this useful?
On the other hand, given possible counter-indications of such public discourse (proposals outside the Overton window representing a PR problem, or some proposals only being feasible without being publically announced), are there other strategies for reaping the benefits of many people with different backgrounds thinking about this problem? Things that come to mind: maybe a non-public essay contest where people can hand in a description of a possible pivotal act together with their own analysis concerning its feasibility. Those could be read by a panel of trusted experts (trusted both to have some competence in their judgement and in their confidentiality). Then harmless but insightful ones could be released for the public. Dangerous and/or non-insightful ones could be returned to their creators with a brief description why they are deemed a bad idea. And finally promising ones could be brought to the attention of people with ressources to further pursue them.