I am no where near caught up on FAI readings but here are is a humble thought.
What I have read so far seems to be assuming a single jump FAI. That is once the FAI is set it must take us to where we ultimately want to go without further human input. Please correct me if I am wrong.
What about a multistage approach?
The problem that people might immediately bring up is that a multistage approach might lead elevating subgoals to goals. We say, “take us to mastery of nanotech” and the AI decides to rip us apart and organize all existing ribosomes under a coherent command.
However, perhaps what we need to do is verify that any intermediate state goal better than the current state.
So what if we have the AI guess a goal state. Then simulate that goal state and expose some subset of humans to that simulation. The AI the asks “Proceed to this stage or no” The humans answer.
Once in the next stage we can reassess.
To give a sense of motivation: it seems that verifying the goodness of future-state is easier than trying to construct the basic rules of good statedness.
I am no where near caught up on FAI readings but here are is a humble thought.
What I have read so far seems to be assuming a single jump FAI. That is once the FAI is set it must take us to where we ultimately want to go without further human input. Please correct me if I am wrong.
What about a multistage approach?
The problem that people might immediately bring up is that a multistage approach might lead elevating subgoals to goals. We say, “take us to mastery of nanotech” and the AI decides to rip us apart and organize all existing ribosomes under a coherent command.
However, perhaps what we need to do is verify that any intermediate state goal better than the current state.
So what if we have the AI guess a goal state. Then simulate that goal state and expose some subset of humans to that simulation. The AI the asks “Proceed to this stage or no” The humans answer.
Once in the next stage we can reassess.
To give a sense of motivation: it seems that verifying the goodness of future-state is easier than trying to construct the basic rules of good statedness.