Stuart_Armstrong comments on True answers from AI

Stuart_Armstrong 13 Mar 2016 14:51 UTC
0 points
0
AF
Actually, $- (q - u)^{2}$ does work, but “by coincidence” and has other negative properties.

Let me explain. First of all, note that things like $- (q - u)^{4}$ do not work.

To show this: Let $u = + 2$ with probability $1 / 3$ , and $- 1$ with probability $2 / 3$ (I’m dropping the $0 \leq u \leq 1$ for this example, for simplicity). Then $E (u) = 0$ (so the correct $q$ is 0) while $E (u^{3}) = 2 \neq 0$ . Then in the expansion of $- (q - u)^{4}$ , you will get $4 q u^{3}$ , which in expectation is not 0. Hence the $q^{1}$ term in $E (- (q - u)^{4})$ is non-zero, which means that $q = 0$ cannot be a maximum of this function.

Then why does $- (q - u)^{2}$ work then? Because it’s $- q^{2} + 2 q u$ (which is linear in $u$ ), minus $u^{2}$ (non-linear in $u$ , but the AI can’t affect its value, so it’s irrelevant in a boxed setup).

What other “negative properties” might $- (q - u)^{2}$ have? Suppose we allow the AI to affect the value of $u$ , somehow, by something that is independent of the value of its output $q$ . Then an AI maximising $- q^{2} + 2 q u$ will always set $q = E (u)$ , for a total expectation of $E (u)^{2}$ . Therefore it will also seek to maximise $E (u)^{2}$ , which maximises $E (u)$ if $u \geq 0$ . So the agent will output the correct $q$ and maximise $E (u)$ simultaneously.

But if it instead tries to maximise $- (q - u)^{2}$ , then it will still pick $q = E (u)$ , and gets expected utility of $E (u)^{2} - E (u^{2}) = - V a r (u)$ . Therefore it will pick actions that minimise the variance of $u$ , irrelevant of expectation.

Even without being able to affect $u$ , this messes up the rest of my setup. In particular, my “pick $y$ and $q$ so that you maximise $v_{y} (2 q u - q^{2})$ ” becomes maximising $v_{y} (- (q - u)^{2})$ and the AI will now select the $y$ that minimises $P (Y = y) * V a r (u | Y = y)$ , instead of maximising $E (u \land (Y = y))$ . If ever $V a r (u | Y = y) = 0$ or $P (Y = y) = 0$ , it will choose those $y$ s.