I agree. Additionally and a more difficult challenge is that even friendly AIs could want to maximize their utility even at our collective expense under certain conditions.
There’re also several unfortunately possible scenarios whereby a humanity acting without sufficient information to make anything other than a gut feel guess could be placed at risk of extinction by a situation it could not resolve without the help of an AI, friendly or not.
I’m currently engaged in playing this game (I wish you had continued) with at least two other gatekeeper players and it occurs to me that a putative superhuman AI could potentially have the capacity to accurately model a human mind and then simulate the decision tree of all the potential conversations and their paths through the tree in order to generate a probability matrix to accurately pick those responses to responses that would condition a human being to release it. My reasoning stems from participating on forums and responding over and over again to the same types of questions, arguments and retorts. If a human can notice common threads in discussions on the same topic then an AI with perfect memory and the ability to simulate a huge conversation space certainly could do so.
In short it seems to me that it’s inherently unsafe to allow even a low bandwidth information flow to the outside world by means of a human who can only use it’s own memory.
You’d have to put someone you trust implicitly with the fate of humanity in there with it and the only information allowed out would be the yes no answer of “do you trust it?”
Even then it’s still recursive. Do you trust the trusted individual to not be compromised?
LOL
We actually agree on the difficulty of the problem. I think it’s very difficult to state what it is that we want AND that if we did so we’d find that individual utility functions contradict each other.
Moreover, I’m saying that maximizing Phil Goetz’s utility function or yours and everybody you love (or even my own selfish desires and wants plus those of everyone I love) COULD in effect be an unfriendly AI because MANY others would have theirs minimized.
So I’m saying that I think a friendly AI has to have it’s goals defined as: Choice A. the maximum number of people have their utility functions improved (rather than maximized) even if some minimized number of people have their utility functions worsened as opposed to Choice B. a small number having their utility functions maximized as opposed to a large number of people having their utility functions decreased (or zeroed out).
As a side note: I find it amusing that it’s so difficult to even understand each others basic axioms never mind agree on the details of what maximizing the utility function for all of us as a whole means.
To be clear: I don’t know what the details are of maximizing the utility function for all of humanity. I just think that a fair maximization of the utility function for everyone has an interesting corrollary: In order to maximize the function for everyone, some will have their individual utility functions decreased unless we accept a much narrower definition of friendly meaning “friendly to me” in which case as far as I’m concerned that no longer means friendly.
The logical tautology here is of course that those who consider “friendly to me” as being the only possible definition of friendly would consider an AI that maximized the average utility function of humanity and they themselves lost out, to be an UNfriendly AI.