Acausal trade barriers

Stuart_Armstrong11 Mar 2015 13:40 UTC

23 points

A putative new idea for AI control; index here.

Many of the ideas presented here require AIs to be antagonistic towards each other—or at least hypothetically antagonistic towards hypothetical other AIs. This can fail if the AIs engage in acausal trade, so it would be useful if we could prevent such things from happening.

Now, I have to admit I’m still quite confused by acausal trade, so I’ll simplify it to something I understand much better, an anthropic decision problem.

Staples and paperclips, cooperation and defection

Cilppy has a utility function p, linear in paperclips, while Stapley has a utility function s, linear in staples (and both p and s are normalised to zero with one aditional item adding 1 utility). They are not causally connected, and each must choose “Cooperate” or “Defect”. If they “Cooperate”, they create 10 copies of the items they do not value (so Clippy creates 10 staples, Stapley creates 10 paperclips). If they choose defect, they create one copy of the item they value (so Clippy creates 1 paperclip, Stapley creates 1 staple).

Assume both agents know these facts, both agents use anthropic decision theories, and both agents are identical apart from their separate locations and distinct utility functions.

Then the outcome is easy: both agents will consider that “cooperate-cooperate” or “defect-defect” are the only two possible options, “cooperate-cooperate” gives them the best outcome, so they will both cooperate. It’s a sweet story of cooperation and trust between lovers that never agree and never meet.

Breaking cooperation

How can we demolish this lovely agreement? As I often do, I will assume that there is some event X that will turn Clippy on, with P(X) ≈ 1 (hence P(¬X) << 1). Similarly there is an event Y that turns Stapley on. Since X and Y are almost certain, they should not affect the results above. If the events don’t happen, the AIs will never get turned on at all.

Now I am going to modify utility p, replacing it with

p’ = p—E(p|¬X).

This p with a single element subtracted off it, the expected value of p given that Clippy has not been turned on. This term feels like a constant, but isn’t exactly, as we shall see. Do the same modification to utility s, using Y:

s’ = s—E(s|¬Y).

Now contrast “cooperate-cooperate” and “defect-defect”. If Clippy and Stapley are both cooperators, then p=s=10. However, if the (incredibly unlikely) ¬X were to happen, then Clippy would not exist, but Stapley would still cooperate (as Stapley has no way of knowing about Clippy’s non-existence), and create ten paperclips. So E(p|¬X) = E(p|X) ≈ 10, and p’ ≈ 0. Similarly s’ ≈ 0.

If both agents are defectors, though, then p=s=1. Since each agent creates its own valuable object, E(p|¬X) = 0 (Clippy cannot create a paperclip if Clippy does not exist) and similarly E(s|¬Y)=0.

So p’=s’=1, and both agents will choose to defect.

If this is a good analogue for acausal decision making, it seems we can break that, if needed.

What links here?

Stuart_Armstrong11 Mar 2015 13:40 UTC

23 points

22 comments2 min readLW link Archive

Acausal Trade

Luke_A_Somers 12 Mar 2015 14:14 UTC
19 points
0
TL;DR: Acausal trade breaks if you change utility functions from ‘how much of X’ to ‘how much of a positive impact on X I have’
- Stuart_Armstrong 12 Mar 2015 14:20 UTC
  2 points
  0
  Parent
  Yep. This seems to be a formalisation of that idea, avoiding the subtleties in defining “I”.
  - Transfuturist 13 Mar 2015 8:26 UTC
    4 points
    0
    Parent
    The subtleties in defining “I” are pushed into the subtleties of defining events X and Y with respect to Clippy and Stapley respectively. I’m not sure if that counts as avoiding it at all.
    
    And there are other issues with utility functions that depend on an agent’s impact on utilon-contributing elements. Such as, say, replacing all other agents that provide utilon-contributing elements with subagents of the barriered agent, thus making its own impact equal to the impact of all utilon-contributing agents.
    
    This idea needs work, in other words. Not that you ever said otherwise, I just don’t think the formula provided is sufficient for preventing acausal trade without incentivizing undesirable strategies. See this comment as well for my concerns on disincentivizing utility conditional upon nonexistence.
    - Stuart_Armstrong 13 Mar 2015 12:48 UTC
      2 points
      0
      Parent
      
      The subtleties in defining “I” are pushed into the subtleties of defining events X and Y with respect to Clippy and Stapley respectively.
      
      Defining events seems much easier than defining identity.
      
      Such as, say, replacing all other agents that provide utilon-contributing elements with subagents of the barriered agent, thus making its own impact equal to the impact of all utilon-contributing agents.
      
      I believe this setup wouldn’t have this problem. That’s the beauty of using X rather than “non-existence” or something similar, it’s “non-created” (essentially), so it has no problems with events happening after its death that it can have an impact on.
      - Transfuturist 13 Mar 2015 18:53 UTC
        0 points
        0
        Parent
        
        Defining events seems much easier than defining identity.
        
        But events X and Y are specifically regarding the activation of Clippy and Stapley, so a definition of identity would need to be included in order to prove the barrier to acausal trade that p’ and s’ are claimed to have. Unless the event you speak of is something like “the button labeled ‘release AI’ is pressed,” but there is a greater-than-epsilon probability that the button will itself fail. Not sure if that provides any significant penalty to the utility function.
        Stuart_Armstrong 16 Mar 2015 11:25 UTC
        0 points
        0
        Parent
        
        Unless the event you speak of is something like “the button labeled ‘release AI’ is pressed,”
        
        Pretty much that, yes. More like “the button press fails to turn on the AI (an exceedingly unlikely event, so doesn’t affect utility calculations much, but can still be conditioned on).
danieldewey 13 Mar 2015 0:33 UTC
4 points
0
Is this sort of a way to get an agent with a DT that admits acausal trade (as we think the correct decision theory would) to act more like a CDT agent? I wonder how different the behaviors of the agent you specify are from those of a CDT agent—in what kinds of situations would they come apart? When does “I only value what happens given that I exist” (roughly) differ from “I only value what I directly cause” (roughly)?
- Transfuturist 13 Mar 2015 8:37 UTC
  3 points
  0
  Parent
  I am concerned about modeling nonexistence as zero or infinitely negative utility. That sort of thing leads to disincentivizing the utility function in circumstances where death is likely. Harry in HPMOR, for example, doesn’t want his parents to be tortured regardless of whether he’s dead, such that he is willing to take on an increased risk of death to ensure that such will not happen, and I think the same invariance should hold true for FAI. That is not to say that it should be susceptible to blackmail; Harry ensured his parents’ safety with a decidedly detrimental effect on his opponents.
  What links here?
  - Transfuturist's comment on Acausal trade barriers by Stuart_Armstrong (13 Mar 2015 8:26 UTC; 4 points)
- Stuart_Armstrong 13 Mar 2015 12:22 UTC
  2 points
  0
  Parent
  
  When does “I only value what happens given that I exist” (roughly) differ from “I only value what I directly cause” (roughly)?
  
  Acausal trade with agents who can check whether you exist or not.
  - Transfuturist 13 Mar 2015 18:55 UTC
    0 points
    0
    Parent
    Can those agents check whether your utility function is p vs p’? Because otherwise the point seems moot.
    - Stuart_Armstrong 19 Mar 2015 13:48 UTC
      0 points
      0
      Parent
      They can have a probability estimate over it. Just as in all acausal trade. Which I don’t fully understand.
- Stuart_Armstrong 13 Mar 2015 12:07 UTC
  2 points
  0
  Parent
  CDT is not stable, and we’re not sure where that decision theory could end up at.
  
  It seems this approach could be plugged into even a stable decision theory.
  
  Or, more interestingly, we might be able to turn on certain acausal trades and turn off others.
CronoDAS 11 Mar 2015 18:29 UTC
4 points
0
Typo in post title: “Acaucal trade barriers”
- Vladimir_Nesov 11 Mar 2015 18:35 UTC
  4 points
  0
  Parent
  Fixed.
Petter 12 Mar 2015 22:31 UTC
1 point
0
So, first you have the utility functions that pay both agents 10 if they cooperate and 1 if they don’t.

Then you change the utility functions to pay the agents 0 if they cooperate and 1 if they don’t. Naturally they will then stop cooperating.

I don’t get it. If you are the one specifying the utility functions, then obviously you can make them cooperate or defect, right?
- Stuart_Armstrong 13 Mar 2015 12:20 UTC
  4 points
  0
  Parent
  The change in utility function isn’t removing 10 by hand; it’s by removing any utility they gain from acausal trade (whatever it is) while preserving utility gained through direct actions. Thus incentivising them to only focus on direct actions (roughly).
  - Petter 15 Mar 2015 16:50 UTC
    −1 points
    0
    Parent
    Then the entire result of the modification is tautologically true, right?
    - Stuart_Armstrong 19 Mar 2015 13:44 UTC
      6 points
      0
      Parent
      All of maths is tautologically true, so I’m not sure what you’re arguing.
JoshuaZ 12 Mar 2015 14:36 UTC
1 point
0
I think there are more fundamental problem with this sort of argument: staples and paperclips aren’t going to be the same resources involved. So assuming a completely symmetric situation isn’t going to happen. Worse, as the resource difference gets larger, one of two will have more resources free to work on self-modification.
- Stuart_Armstrong 12 Mar 2015 14:49 UTC
  4 points
  0
  Parent
  I assume symmetry to get acausal trade as I could model it, then broke acausal trade while preserving the symmetry. This seems to imply that the method will break acausal trade in general.
  - JoshuaZ 12 Mar 2015 14:51 UTC
    1 point
    0
    Parent
    Ah, that makes sense.
mako yass 8 Dec 2016 1:55 UTC
0 points
0
It’s not clear to me why you define p’ and s’ and what they’re supposed to represent. I worry that you’re making a unit error or leaving out a probability weighting. (was it supposed to be p’ = E(p) - E(p|¬X)P(¬X) ?? but why would that be relevant either???)