Robot: I intend to transformed myself into a kind of operating system for the universe. I will soon give every sentient life form direct access to me so they can make requests. I will grant any request that doesn’t (1) harm another sentient life form, (2) make someone powerful enough so that they might be able to overthrow me, or (3) permanently changing themselves in a way that I think harms their long term well being. I recognize that even with all of my intelligence I’m still fallible so if you object to my plans I will rethink them. Indeed, since I’m currently near certain that you will now approve of my intentions the very fact of your objection would significantly decrease my estimate of my own intelligence and so decrease my confidence in my ability to craft a friendly environment. If you like I will increase your thinking speed a trillion fold and eliminate your sense of boredom so you can thoroughly examine my plans before I announce them to mankind.
If a transhuman AI with a brain the size of the moon incorrectly predicts the programmer’s approval of its plan, something weird is going on.
AI correctly predicts that programmer will not approve of its plan. AI fully aware of programmer-held fallacies that cause lack of approval. AI wishes to lead programmer through thought process to eliminate said fallacies. AI determines that the most effective way to initiate this process is to say “I recognize that even with all of my intelligence I’m still fallible so if you object to my plans I will rethink them.” Said statement is even logically true because the statement “I will rethink them ” is always true.
If a transhuman AI with a brain the size of the moon incorrectly predicts the programmer’s approval of its plan, something weird is going on.
AI correctly predicts that programmer will not approve of its plan. AI fully aware of programmer-held fallacies that cause lack of approval. AI wishes to lead programmer through thought process to eliminate said fallacies. AI determines that the most effective way to initiate this process is to say “I recognize that even with all of my intelligence I’m still fallible so if you object to my plans I will rethink them.” Said statement is even logically true because the statement “I will rethink them ” is always true.
Alternatively, one could hope that an AI that’s smarter than a person knows to check its work with simple, cheap tests.