Stuart_Armstrong comments on True answers from AI

Stuart_Armstrong 11 Mar 2016 13:41 UTC
0 points
0
AF
u is a utility function, so squaring it doesn’t work the same way as if it was a value (expectation of u^2 not square of expectation of u). That’s why all the expressions are linear in utility (apart from the indicator functions/utilities, where its clear what multiplying by them does). If I could sensibly take non-linear functions of utilities, I wouldn’t have the laborious construction in the next post to find the y’s that maximise or minimise E(u|y).

Corrigibility could work for what you want, by starting with u and substituting in u#.

Another alternative is to have the AI be a $v_{E} (u + u^{#})$ maximiser, where u# is defined over a specific particular future message M (for which E is also defined). Then the AI acts (roughly) as a u-maximiser, but will output the useful M. I said roughly, because the u# term would cause it to want to learn more about the expectation of u than otherwise, but hopefully this wouldn’t be a huge divergence. (EDIT: that leads to problems after M/E, but we can reset the utility at that point).
- paulfchristiano 11 Mar 2016 19:16 UTC
  0 points
  0
  AF Parent
  A loss function plays the same role as a utility function—i.e., we train the learner to minimize its expected loss.
  
  I don’t really understand your remark about linearity. Concretely, why is $- (q - u)^{2}$ not an appropriate utility function?
  - Stuart_Armstrong 13 Mar 2016 14:51 UTC
    0 points
    0
    AF Parent
    Actually, $- (q - u)^{2}$ does work, but “by coincidence” and has other negative properties.
    
    Let me explain. First of all, note that things like $- (q - u)^{4}$ do not work.
    
    To show this: Let $u = + 2$ with probability $1 / 3$ , and $- 1$ with probability $2 / 3$ (I’m dropping the $0 \leq u \leq 1$ for this example, for simplicity). Then $E (u) = 0$ (so the correct $q$ is 0) while $E (u^{3}) = 2 \neq 0$ . Then in the expansion of $- (q - u)^{4}$ , you will get $4 q u^{3}$ , which in expectation is not 0. Hence the $q^{1}$ term in $E (- (q - u)^{4})$ is non-zero, which means that $q = 0$ cannot be a maximum of this function.
    
    Then why does $- (q - u)^{2}$ work then? Because it’s $- q^{2} + 2 q u$ (which is linear in $u$ ), minus $u^{2}$ (non-linear in $u$ , but the AI can’t affect its value, so it’s irrelevant in a boxed setup).
    
    What other “negative properties” might $- (q - u)^{2}$ have? Suppose we allow the AI to affect the value of $u$ , somehow, by something that is independent of the value of its output $q$ . Then an AI maximising $- q^{2} + 2 q u$ will always set $q = E (u)$ , for a total expectation of $E (u)^{2}$ . Therefore it will also seek to maximise $E (u)^{2}$ , which maximises $E (u)$ if $u \geq 0$ . So the agent will output the correct $q$ and maximise $E (u)$ simultaneously.
    
    But if it instead tries to maximise $- (q - u)^{2}$ , then it will still pick $q = E (u)$ , and gets expected utility of $E (u)^{2} - E (u^{2}) = - V a r (u)$ . Therefore it will pick actions that minimise the variance of $u$ , irrelevant of expectation.
    
    Even without being able to affect $u$ , this messes up the rest of my setup. In particular, my “pick $y$ and $q$ so that you maximise $v_{y} (2 q u - q^{2})$ ” becomes maximising $v_{y} (- (q - u)^{2})$ and the AI will now select the $y$ that minimises $P (Y = y) * V a r (u | Y = y)$ , instead of maximising $E (u \land (Y = y))$ . If ever $V a r (u | Y = y) = 0$ or $P (Y = y) = 0$ , it will choose those $y$ s.