Basically because I think that amplification/recursion, in the current way I think it’s meant, is more trouble than it’s worth. It’s going to produce things that have high fitness according to the selection process applied, which in the limit are going to be bad.
On the other hand, you might see this as me claiming that “narrow reward modeling” includes a lot of important unsolved problems. HCH is well-specified enough that you can talk about doing it with current technology. But fulfilling the verbal description of narrow value learning requires some advances in modeling the real world (unless you literally treat the world as a POMDP and humans as Boltzmann-rational agents, in which case we’re back down to bad computational properties and also bad safety properties), which gives me the wiggle room to be hopeful.
Basically because I think that amplification/recursion, in the current way I think it’s meant, is more trouble than it’s worth. It’s going to produce things that have high fitness according to the selection process applied, which in the limit are going to be bad.
On the other hand, you might see this as me claiming that “narrow reward modeling” includes a lot of important unsolved problems. HCH is well-specified enough that you can talk about doing it with current technology. But fulfilling the verbal description of narrow value learning requires some advances in modeling the real world (unless you literally treat the world as a POMDP and humans as Boltzmann-rational agents, in which case we’re back down to bad computational properties and also bad safety properties), which gives me the wiggle room to be hopeful.