Roman Leventov comments on Types and Degrees of Alignment

Roman Leventov 1 Jun 2023 13:57 UTC
1 point
−1
It is impossible in theory to have all these different kinds of alignment simultaneously. You cannot simultaneously (without any claim of completeness):
1. Do what I say
2. Also do what I mean
3. Also do what I should have said and meant
4. Also do what is best for me
5. Also do what broader society or humanity says
6. Also do what broader society or humanity means or should have said
7. Also do what broader society or humanity should have said given their values
8. Also do what is best for everyone
9. Do some ideal friendly combination of all of it that a broadly good guy would do, in a way that is respectful of and preserves what is valuable on all levels
10. Strictly follow some other set of rules that were set up long ago, no matter the cost
I think we should already forget about items 1 and 2 (as well as 10, but that goes without saying). In the context of communicating and aligning with superhumanly smart AI, it is really hubristic and stupid to think that although we will be much stupider than AI, it will be able to write a superhumanly good theory of ethics (such that humans didn’t come up with in thousands of years) in a second, and design an AI (or change oneself) to follow this theory, humans’ thoughts about value will still be somehow “covered with gold” and worth adhering to, for that AI.
Given this, I disagree that “we will need to: 1) Be able to get an AGI to do something a human selects at all, rather than something not selected. Be able to retain some form of control over what it does in the future, or set it on a chosen course. At all. …”. I’m not sure that this is possible (which implies that the orthogonality thesis holds to a sufficiently strong degree, which I doubt), but if this is possible, I don’t think this would be desirable.
Items 3-9 raise an important and valid concern, that of the (infinity) multiplicity of (collective) identity and alignment subjectivity.
We must pick at most one of those, or another variation on them, or something else, as our primary target. A machine cannot serve two masters any more than a man can, and the ability to put the machine under arbitrary stress, and its additional capabilities and intelligence, makes this that much more clear.
I think this is a wrong reaction to the above concern. Humans can serve multiple masters (themselves, their family, their community, their nation/society, the whole of humanity, and Gaia), so why AIs couldn’t?
In the phrase “It is impossible in theory to have all these different kinds of alignment simultaneously”, it’s unclear what you mean by “having alignment”. If you meant something like formal, “total” alignment, then sure, it’s impossible to have perfect alignment in the physical reality outside us (i.e., outside simple simulations and mathematical abstractions). But if we strive for continuously increasing alignment along various dimensions and various levels of intelligent entities (alignment subjects), it should definitely be possible in general (although some “bad” situations may call for a choice where the alignment between certain entities or along a certain dimension is worsened in favour of the alignment between some other types of entities or along other dimensions).
I think that to increase our chances to realise a good future, we must find a principled way of addressing the issue of the multiplicity of identity and alignment subjectivity, which is the essence of a scale-free theory of ethics.