Richard_Ngo comments on Distinguishing claims about training vs deployment

Richard_Ngo 5 Feb 2021 15:42 UTC
LW: 4 AF: 2
0
AF
The burden of proof should be on whoever wants to claim that AI will be fine by default, not on whoever wants to claim it won’t be fine by default.
I’m happy to wrap up this conversation in general, but it’s worth noting before I do that I still strongly disagree with this comment. We’ve identified a couple of interesting facts about goals, like “unbounded large-scale final goals lead to convergent instrumental goals”, but we have nowhere near a good enough understanding of the space of goal-like behaviour to say that everything apart from a “very small region” will lead to disaster. This is circular reasoning from the premise that goals are by default unbounded and consequentialist to the conclusion that it’s very hard to get bounded or non-consequentialist goals. (It would be rendered non-circular by arguments about why coherence theorems about utility functions are so important, but there’s been a lot of criticism of those arguments and no responses so far.)
- Daniel Kokotajlo 5 Feb 2021 16:37 UTC
  LW: 2 AF: 1
  0
  AF Parent
  OK, interesting. I agree this is a double crux. For reasons I’ve explained above, it doesn’t seem like circular reasoning to me, it doesn’t seem like I’m assuming that goals are by default unbounded and consequentialist etc. But maybe I am. I haven’t thought about this as much as you have, my views on the topic have been crystallizing throughout this conversation, so I admit there’s a good chance I’m wrong and you are right. Perhaps I/we will return to it one day, but for now, thanks again and goodbye!