Vladimir_Nesov comments on Humans Reflecting on HRH

Vladimir_Nesov 29 Jul 2022 22:46 UTC
LW: 17 AF: 10
3
AF
A point that doesn’t seem to be in the water supply is that even superintelligences won’t have (unerringly accurate estimates of) results of CEV to work with. Any predictions of values are goodhart cursed proxy values. Predictions that are not value-laden are even worse. So no AGIs that would want to run a CEV would be utility maximizers, and AGIs that are utility maximizers are maximizing something that isn’t CEV of anything, including that of humanity.

Thus utility maximization is necessarily misaligned, not just very hard to align, until enough time has already passed for CEV to run its course, to completion and not merely in foretelling. Which likely never actually happens (reflection is unbounded), so utility maximization can only be approached with increasingly confident mild optimization. And there is currently mostly confusion on what mild optimization does as decision theory.
- leogao 29 Jul 2022 22:59 UTC
  LW: 1 AF: 1
  0
  AF Parent
  I agree that in practice you would want to point mild optimization at it, though my preferred resolution (for purely aesthetic reasons) is to figure out how to make utility maximizers that care about latent variables, and then make it try to optimize the latent variable corresponding to whatever the reflection converges to (by doing something vaguely like logical induction). Of course the main obstacles are how the hell we actually do this, and how we make sure the reflection process doesn’t just oscillate forever.