JoshuaFox comments on xkcd on the AI box experiment

JoshuaFox 22 Nov 2014 17:29 UTC
4 points
0

Suppose I buy shares in a company that builds an AI, which then works for the good of the company, which rewards share-owners. This is ordinary causality: I contributed towards its building, and was rewarded later.

What makes it possible to be rewarded as a shareholder is a legal system which enforces your ownership rights: a kind of pre-commitment which is feasible even among humans who cannot show proofs about their “source code.” The legal system is a mutual enforcement system which sets up a chain of causality towards your being paid back.

Suppose I contribute towards something other than its building, in the belief that an AI which will later come into being will reward me for having done this. Still doesn’t seem acausal to me.

It’s interesting what to consider what happens when the second agent cannot precommit to repaying you. For example, if the agent does not yet exist.

Suppose I believe an AI is likely to be built that will conquer the world and transfer all wealth to its builders.

The question is: Why would it do that? In the future, when this new agent comes into existence, why would it consume resources to repay its builders (assuming that it receives no benefit at that future time)? The “favor” that the builders did is past and gone; repaying them gives the agent no benefit. Since we are talking in this comment subthread about an FAI that is truly friendly to all humanity, it might distribute its efforts equality to all humanity rather than “wasting” resources on differential payback.

The answer to this question has to do with acausal trade. I wrote a LW Wiki article on the topic. It’s pretty mind-bending and it took me a while to grasp, but here is a summary. If Agent P (in this case the AI) can model or simulate Agent Q (in this case humans in P’s past) to prove statements (probably probabalistic statements) about it, and Q can model P, then P’s optimal move is to do what Q wants, and Q’s optimal move is to do what P wants. This holds in the limiting case of perfect knowledge and infinite computational power, but in real life, clearly, it depends on a lot of assumptions about P’s and Q’s ability to model each other, and the relative utility they can grant each other.
- DanArmak 22 Nov 2014 18:06 UTC
  2 points
  0
  Parent
  What I don’t quite understand is why the following, simpler argument isn’t sufficient. It seems to lead to the same results, and it doesn’t require acausal trade.
  
  I’m not building just any AI. I want to build an AI that will, by design, reward its builders. Just like any other tool I build, I wouldn’t do it if I didn’t expect it to do certain things and not do other ones.
  
  Similarly, if you cooperate with Roko’s Basilisk, you try to build it because it’s the kind of AI that punishes those who didn’t try to build it. You know it punishes non-builders, because that’s how you’re building it. And the reason you’re building it is that you fear that if you don’t, someone else will, and then the AI will punish you for not building it first.
  
  If you have a valid reason to fear someone else will build it, and you can’t avert it by other means, then it makes sense for you to build it first. Similarly, if you think a likely outcome of an AI race is an AI that helps its builders (and doesn’t harm anyone else), then you try to build the first one (and if helping others is part of your utility function the AI will do that too to reward you).
  
  Of course, like any argument, if you don’t accept the premises, then the conclusion doesn’t hold. And I have no strong reason to think someone else is going to build a torture-everyone-else AI.
  
  What does the acausal trade argument tell us beyond this simple model? Does it tell us to cooperate with the future AI even if we don’t think it will be built if we cooperate, or will be built by someone else if we don’t? Or does it tell us to cooperate quantitatively more? Or in other situations?