Adele Lopez comments on Another (outer) alignment failure story

Adele Lopez 13 Apr 2021 0:15 UTC
LW: 3 AF: 2
AF
How bad is the ending supposed to be? Are just people who fight the system killed, and otherwise, humans are free to live in the way AI expects them to (which might be something like keep consuming goods and providing AI-mediated feedback on the quality of those goods)? Or is it more like once humans are disempowered no machine has any incentive to keep them around anymore, so humans are not-so-gradually replaced with machines?

The main point of intervention in this scenario that stood out to me would be making sure that (during the paragraph beginning with “For many people this is a very scary situation.”) we at least attempt to use AI-negotiators to try to broker an international agreement to stop development of this technology until we understood it better (and using AI-designed systems for enforcement/surveillance). Is there anything in particular that makes this infeasible?
- paulfchristiano 13 Apr 2021 1:17 UTC
  LW: 6 AF: 6
  AF Parent
  I think that most likely either humans are killed incidentally as part of the sensor-hijacking (since that’s likely to be the easiest way to deal with them), or else AI systems reserve a negligible fraction of their resources to keep humans alive and happy (but disempowered) based on something like moral pluralism or being nice or acausal trade (e.g. the belief that much of their influence comes from the worlds in which they are simulated by humans who didn’t mess up alignment and who would be willing to exchange a small part of their resources in order to keep the people in the story alive and happy).
  The main point of intervention in this scenario that stood out to me would be making sure that (during the paragraph beginning with “For many people this is a very scary situation.”) we at least attempt to use AI-negotiators to try to broker an international agreement to stop development of this technology until we understood it better (and using AI-designed systems for enforcement/surveillance). Is there anything in particular that makes this infeasible?
  I don’t think this is infeasible. It’s not the intervention I’m most focused on, but it may be the easiest way to avoid this failure (and it’s an important channel for advance preparations to make things better / important payoff for understanding what’s up with alignment and correctly anticipating problems).