quetzal_rainbow comments on Maintaining Alignment during RSI as a Feedback Control Problem

quetzal_rainbow 2 Mar 2025 11:07 UTC
11 points
14
I think the general problem with your metaphor is that we don’t know “relevant physics” of self-improvement. We can’t plot “physically realistic” trajectory of landing in “good values” land and say “well, we need to keep ourselves in direction of this trajectory”. BTW, MIRI has a dialogue with this metaphor.
And most of your suggestions are like “let’s learn physics of alignment”? I have nothing against that, but it is the hard part, and control theory doesn’t seem to provide a lot of insight here. It’s a framework at best.