Eliezer had written that the two most important mathematical problems were the friendly goal itself that a self modifying program would have and goal stability under self modification.
Now, a useful perspective I have often used in the past is to see incentive structures and see who has the motivation to take a crack at a problem. Now, goal stability seems to be a very very useful problem to solve for many institutions. Almost all of us have heard of institutional missions getting diluted and even seen it when people change in a firm. National governments, corporates (especially from the shareholder’s perspective), pension funds, n number of foundations, all could benefit from good goal stability, right?
Is the lack of research into goal stability just due to a disbelief in AI? Or is the fact that large corporates already have a strong and stable goal in profits, that further research into goal stability is not occurring in the world?
But the top foundations of the world also seem pretty big enough to support such research. And their goals are multiple and complex, not reducible to profit.
Have you ever followed electoral reform debates? There is a distinct pattern, people out of power want change, people in power do not. They are happy with the status quo. Only the people in power have the power to enact change. So change tends not to happen.
That is changes to the goal system rarely happen, especially if they might disadvantage the incumbents.
I have ideas that I’m going to try out. However they are more to do with acquiring feedback from other people outside of the organisation, which would foster goal stability iff the feedback from outside the system does. I.e. if the feedback came from donors, that consistently wanted a charity to do one thing, it would keep the charity on track.
Thought: Consider the history of the Seventeeth Amendment to the Constitution—it was really hard to get the Senate to approve a change in how Senators were chosen.
Eliezer had written that the two most important mathematical problems were the friendly goal itself that a self modifying program would have and goal stability under self modification.
Now, a useful perspective I have often used in the past is to see incentive structures and see who has the motivation to take a crack at a problem. Now, goal stability seems to be a very very useful problem to solve for many institutions. Almost all of us have heard of institutional missions getting diluted and even seen it when people change in a firm. National governments, corporates (especially from the shareholder’s perspective), pension funds, n number of foundations, all could benefit from good goal stability, right?
Is the lack of research into goal stability just due to a disbelief in AI? Or is the fact that large corporates already have a strong and stable goal in profits, that further research into goal stability is not occurring in the world?
But the top foundations of the world also seem pretty big enough to support such research. And their goals are multiple and complex, not reducible to profit.
Have you ever followed electoral reform debates? There is a distinct pattern, people out of power want change, people in power do not. They are happy with the status quo. Only the people in power have the power to enact change. So change tends not to happen.
That is changes to the goal system rarely happen, especially if they might disadvantage the incumbents.
I have ideas that I’m going to try out. However they are more to do with acquiring feedback from other people outside of the organisation, which would foster goal stability iff the feedback from outside the system does. I.e. if the feedback came from donors, that consistently wanted a charity to do one thing, it would keep the charity on track.
Thought: Consider the history of the Seventeeth Amendment to the Constitution—it was really hard to get the Senate to approve a change in how Senators were chosen.
The transcendentally important problem is to identify “the friendly goal itself”. Compared to that, goal stability is just a technicality.