Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo 21 Apr 2025 4:16 UTC
3 points
0
if you move towards everything but move towards X even more, then in the long run you will do more of X on net, because you only have so much probability mass to go around
I have a mental category of “results that are almost entirely irrelevant for realistically-computationally-bounded agents” (e.g. results related to AIXI), and my gut sense is that this seems like one such result.
- Garrett Baker 21 Apr 2025 7:05 UTC
  2 points
  0
  Parent
  I mean this situation is grounded & formal enough you can just go and implement the relevant RL algorithm and see if its relevant for that computationally bounded agent, right?
- leogao 21 Apr 2025 6:45 UTC
  2 points
  0
  Parent
  is this for a reason other than the variance thing I mention?
  I think the thing I mention is still important is because it means there is no fundamental difference between positive and negative motivation. I agree that if everything was different degrees of extreme bliss then the variance would be so high that you never learn anything in practice. but if you shift everything slightly such that some mildly unpleasant things are now mildly pleasant, I claim this will make learning a bit faster or slower but still converge to the same thing.
  - Richard_Ngo 21 Apr 2025 20:42 UTC
    6 points
    2
    Parent
    Suppose you’re in a setting where the world is so large that you will only ever experience a tiny fraction of it directly, and you have to figure out the rest via generalization. Then your argument doesn’t hold up: shifting the mean might totally break your learning. But I claim that the real world is like this. So I am inherently skeptical of any result (like most convergence results) that rely on just trying approximately everything and gradually learning which to prefer and disprefer.
    - leogao 23 Apr 2025 21:15 UTC
      8 points
      2
      Parent
      are you saying something like: you can’t actually do more of everything except one thing, because you’ll never do everything. so there’s a lot of variance that comes from exploration that multiplies with your $O (k^{2})$ variance from having a suboptimal zero point. so in practice your $k$ needs to be very close to optimal. so my thing is true but not useful in practice.
      i feel people do empirically shift $k$ quite a lot throughout life and it does seem to change how effectively they learn. if you’re mildly depressed your $k$ is slightly too low and you learn a little bit slower. if you’re mildly manic your $k$ is too high and you also learn a little bit slower. therapy, medications, and meditations shift $k$ mildly.