[Question] What do you make of AGI:unaligned::spaceships:not enough food?

In Buck’s transcript for his talk on Cruxes for working on AI safety there’s an example of a bad argument for why we should be worried about people building spaceships too small.

Imagine if I said to you: “One day humans are going to try and take humans to Mars. And it turns out that most designs of a spaceship to Mars don’t have enough food on them for humans to not starve over the course of their three-month-long trip to Mars. We need to work on this problem. We need to work on the problem of making sure that when people build spaceships to Mars they have enough food in them for the people who are in the spaceships.”

It’s implied that you could turn this into an argument for why people are not going to build unaligned AGI. I pars it as an analogical argument. People would not build spaceships that don’t have enough food on them because they don’t want people to die. Analogously, people will not build unaligned AGI because they don’t want people to die.

So what are the disanalogies? One is that it is harder to tell whether an AGI is aligned than whether a spaceship has enough food on it. I don’t think this can do much of the work, because then people would just spend more effort on telling whether spaceships have enough food on them, or not build them. Similarly, if this were the only problem, then people would just put more effort into determining whether an AGI is aligned before turning it on, or they would not build one until it got cheaper to tell. A related disanalogy is that there is more agreement about what spaceships designs have enough food and how to tell than there is about what AGI designs are aligned and how to tell.

Another disanalogy is that everybody knows that if you design a spaceship without enough food on it and send people to Mars with it, then those people will die. Not very many people know that if you design an AGI that is not aligned and turn it on that people will likely die (or some other bad outcome will happen).

Are these disanologies enough to make the argument not go through? Are there other important disanaologies? If you don’t find the analogical argument convincing, how come?

No comments.