I’m just saying: here’s a major problem with this approach, let’s put it aside for now.
Are you penalizing the AI for the predictable consequences of it existing, rather than just the actions it takes?
We are penailising the master AI for the predictable consequences of the existence of the particular disciple AI it choose to make.
I’m also not sure the reduced impact intuitions hold for any narrow AIs whose task is to somehow combat existential risk.
No, it doesn’t hold. We could hook it up to something like utility indifference or whatever, but the most likely is that reduced impact AI would be an interim stage on the way to friendly AI.
I’m just saying: here’s a major problem with this approach, let’s put it aside for now.
We are penailising the master AI for the predictable consequences of the existence of the particular disciple AI it choose to make.
No, it doesn’t hold. We could hook it up to something like utility indifference or whatever, but the most likely is that reduced impact AI would be an interim stage on the way to friendly AI.