Lucius Bushnaq comments on Value Learning Needs a Low-Dimensional Bottleneck

Lucius Bushnaq 23 Jan 2026 8:21 UTC
3 points
0
‘Internally coherent’, ‘explicit’, and ‘stable under reflection’ do not seem to me to be opposed to ‘simple’.
I also don’t think you’d necessarily need some sort of bias toward simplicity introduced by a genetic bottleneck to make human values tend (somewhat) toward simplicity.^[1] Effective learning algorithms, like those in the human brain, always need a strong simplicity bias anyway to navigate their loss landscape and find good solutions without getting stuck. It’s not clear to me that the genetic bottleneck is actually doing any of the work here. Just like an AI can potentially learn complicated things and complicated values from its complicated and particular training data even if its loss function is simple, the human brain can learn complicated things and complicated values from its complicated and particular training data even if the reward functions in the brain stem are (somewhat) simple. The description length of the reward function doesn’t seem to make for a good bound on the description length of the values learned by the mind the reward function is training, because what the mind learns is also determined by the very high description length training data.^[2]
1. ^
  I don’t think human values are particularly simple at all, but they’re just not so big they eat up all spare capacity in the human brain.
2. ^
  At least so long as we consider description length under realistic computational bounds. If you have infinite compute for decompression or inference, you can indeed figure out the values with just a few bits, because the training data is ultimately generated by very simple physical laws, and so is the reward function.