Vanessa Kosoy comments on Notes from a conversation on act-based and goal-directed systems

Vanessa Kosoy 8 Mar 2016 10:28 UTC
LW: 2 AF: 2
0
AF
Actually, I think that if we consider only deterministic maximization policies then an optimal predictor for $U$ wrt a bounded-Somonoff-type measure is sufficient to get an optimal maximization policy. In this case we can do maximization using Levin’s universal search $L$ . A significantly better maximization policy $M$ cannot exist since it would allow us to improve our estimates of $U (x, M (x))$ and/or $U (x, L (x))$ .

Of course under standard assumptions about derandomization the best deterministic policy is about as good as the best random policy (on average), and in particular requiring $U$ is generatable wrt a suitable measure is sufficient. Random only gives significant advantage in adversarial setups: your toy model of informed oversight is essentially an adversarial setup between $A$ and $B$ .

Also, obviously Levin search is grossly inefficient in practice (like the optimal predictor $Λ$ which is basically a variant of Levin search) but this model suggests that applying a more practical learning algorithm would give satisfactory results.
What links here?
- Vanessa Kosoy's comment on Notes from a conversation on act-based and goal-directed systems by jessicata (8 Mar 2016 10:57 UTC; 2 points)
- Vanessa Kosoy 16 May 2016 19:53 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Also, I think it’s possible to use probabilistic policies which produce computable distributions (since for such a policy outputs with low and high probabilities are always distinguishable).