Basically, you can’t predict the moves of a good chess AI, otherwise you’d be at least that good chess player yourself, and yet you know it’s going to win the game.
I just realized you tried to make a different point here. That one can prove the behavior of computationally unpredictable systems. Reminds me of the following:
6) Disproving mathematical proofs within the terms of their own definitions. This falls within the realm of self-contradiction. No transapient has disproved the Pythagorean Theorem for Euclidean spaces as defined by classical Greek mathematicians, for instance, or disproved Godel’s Incompleteness Theorem on its own terms. (Encyclopedia Galactica—Limits of Transapient Power)
Sounds reasonable but I have no idea to what extent one could prove “friendliness” while retaining a degree of freedom that would allow a seed AI to recursively-selfimprove towards superhuman intelligence quickly. Intuitively it seems to me that the level of abstraction of a definition of “friendliness” will be somehow correlated with the capability of an AGI.
I just realized you tried to make a different point here. That one can prove the behavior of computationally unpredictable systems. Reminds me of the following:
Sounds reasonable but I have no idea to what extent one could prove “friendliness” while retaining a degree of freedom that would allow a seed AI to recursively-selfimprove towards superhuman intelligence quickly. Intuitively it seems to me that the level of abstraction of a definition of “friendliness” will be somehow correlated with the capability of an AGI.