Noosphere89 comments on Alexander Gietelink Oldenziel’s Shortform

Noosphere89 5 Jan 2025 23:25 UTC
2 points
0
I’d split it into how do we manage to instill in any goal/value that is ideally at least somewhat stable, ala inner alignment, and outer alignment, which is selecting a goal that is resistant to Goodharting.
- Alexander Gietelink Oldenziel 5 Jan 2025 23:36 UTC
  2 points
  0
  Parent
  Let’s focus on inner alignment. By instill you presumably mean train. What values get trained is ultimately a learning problem which in many cases (as long as one can formulate approximately a boltzmann distribution) comes down to a simplicity-accuracy tradeoff.