riceissa comments on My Understanding of Paul Christiano’s Iterated Amplification AI Safety Research Agenda

riceissa 5 Oct 2020 20:35 UTC
LW: 4 AF: 2
0
AF
IDA tries to prevent catastrophic outcomes by searching for a competitive AI that never intentionally optimises for something harmful to us and that we can still correct once it’s running.
I don’t see how the “we can still correct once it’s running” part can be true given this footnote:
However, I think at some point we will probably have the AI system autonomously execute the distillation and amplification steps or otherwise get outcompeted. And even before that point we might find some other way to train the AI in breaking down tasks that doesn’t involve human interaction.
After a certain point it seems like the thing that is overseeing the AI system is another AI system and saying that “we” can correct the first AI system seems like a confusing way to phrase this situation. Do you think I’ve understood this correctly / what do you think?
- TurnTrout 11 Dec 2020 3:58 UTC
  LW: 2 AF: 1
  0
  AF Parent
  One interpretation is that even if the AI is autonomously executing distillation/amplification steps, the relevant people are still able to say “hold on, we need to modify your algorithm” and have the AI actually let us correct it.