I think the general problem with your metaphor is that we don’t know “relevant physics” of self-improvement. We can’t plot “physically realistic” trajectory of landing in “good values” land and say “well, we need to keep ourselves in direction of this trajectory”. BTW, MIRI has a dialogue with this metaphor.
And most of your suggestions are like “let’s learn physics of alignment”? I have nothing against that, but it is the hard part, and control theory doesn’t seem to provide a lot of insight here. It’s a framework at best.
I think the general problem with your metaphor is that we don’t know “relevant physics” of self-improvement. We can’t plot “physically realistic” trajectory of landing in “good values” land and say “well, we need to keep ourselves in direction of this trajectory”. BTW, MIRI has a dialogue with this metaphor.
And most of your suggestions are like “let’s learn physics of alignment”? I have nothing against that, but it is the hard part, and control theory doesn’t seem to provide a lot of insight here. It’s a framework at best.