Jacob Pfau comments on Instrumental Convergence For Realistic Agent Objectives

Jacob Pfau 25 Jan 2022 9:14 UTC
LW: 3 AF: 2
0
AF
Am I correct to assume that the discussion of StarCraft and Minecraft are discussing single-player variants of those games?

It seems to me that in a competitive, 2-player, minimize-resource-competition StarCraft, you would want to go kill your opponent so that they could no longer interfere with your resource loss? More generally, I think competitions to minimize resources might still usually involve some sort of power-seeking. I remember reading somewhere that ‘losing chess’ involves normal-looking (power-seeking?) early game moves.
- TurnTrout 25 Jan 2022 18:18 UTC
  LW: 4 AF: 4
  0
  AF Parent
  I’m implicitly assuming a fixed opponent policy, yes.
  Without being overly familiar with SC2—you don’t have to kill your opponent to get to 0 resources, do you? From my experience with other RTS games, I imagine you can just quickly build units and deplete your resources, and then your opponent can’t make you accrue more resources. Is that wrong?
  - Jacob Pfau 26 Jan 2022 10:32 UTC
    3 points
    0
    AF Parent
    Yes, I agree that in the simplest case, SC2 with default starting resources, you just build one or two units and you’re done. However, I don’t see why this case should be understood as generically explaining the negative alpha weights setting. Seems to me more like a case of an excessively simple game?
    
    Consider the set of games starting with various quantities of resources and negative alpha weights. As starting resources increase, you will be incentivised to go attack your opponent to interfere with their resource depletion. Indeed, if the reward is based on end-of-game resource minimisation, you end up participating in an unbounded resource-maximisation competition trying to guarantee control over your opponent; then you spend your resources safely after crippling your opponent? In the single player setting, you will be incentivised to build up your infrastructure so as to spend your resources more quickly.
    
    It seems to me the multi-player case involves power-seeking. Then, it seems like negative alpha weights don’t generically imply anything about the existence of power-seeking incentives?
    
    (I’m actually not clear on whether the single-player case should be seen as power-seeking or not? Maybe it depends on your choice of discount rate, gamma? You are building up infrastructure, i.e. unit-producing buildings, which seems intuitively power-seeking. But the number of long-term possibilities available to you following spending resources on infrastructure is reduced—assuming gamma=1 -- OTOH the number of short-term possibilities may be higher given infrastructure, so you may have increased power assuming gamma<1?)
    - TurnTrout 26 Jan 2022 18:18 UTC
      LW: 3 AF: 3
      0
      AF Parent
      I agree that in certain conceivable games which are not baseline SC2, there will be different power-seeking incentives for negative alpha weights. My commentary wasn’t intended as a generic takeaway about negative feature weights in particular.
      But in the game which actually is SC2, where you don’t start with a huge number of resources, negative alpha weights don’t incentivize power-seeking. You do need to think about the actual game being considered, before you can conclude that negative alpha weighs imply such-and-such a behavior.
      But the number of long-term possibilities available to you following spending resources on infrastructure is reduced
      I think that either $γ << 1$ or considering suboptimal power-seeking resolves the situation. The reason that building infrastructure intuitively seems like power-seeking is that we are not optimal logically omniscient agents; all possible future trajectores do not lay out immediately before our minds. But the suboptimal power-seeking metric (Appendix C in Optimal Policies Tend To Seek Power) does match intuition here AFAICT, where cleverly building infrastructure has the effect of navigating the agent to situations with more cognitively exploitable opportunities.
- Pattern 21 Jun 2022 22:20 UTC
  2 points
  0
  Parent
  It seems to me that in a competitive, 2-player, minimize-resource-competition StarCraft, you would want to go kill your opponent so that they could no longer interfere with your resource loss?
  I would say that in general it’s more about what your opponent is doing. If you are trying to lose resources and the other player is trying to lose them, you’re going to get along fine. (This would be likely be very stable and common if players can kill units and scavenge them for parts.) If both of you are trying to lose them...
  Trying to minimize resources is a weird objective for StarCraft. As is gain resources. Normally it’s a means to an end—destroying the other player first. Now, if both sides start out with a lot of resources and the goal is to hit zero first...how do you interfere with resource loss? If you destroy the other player don’t their resources go to zero? Easy to construct, by far, is ‘losing StarCraft’. And I’m not sure how you’d force a win.
  This starts to get into ‘is this true for Minecraft’ and...it doesn’t seem like there’s conflict of the ‘what if they destroy me, so I should destroy them from’ kind, so much as ‘hey stop stealing my stuff!’. Also, death isn’t permanent, so… There’s not a lot of non-lethal options. If a world is finite (and there’s enough time) eventually, yeah, there could be conflict.
  More generally, I think competitions to minimize resources might still usually involve some sort of power-seeking.
  In the real world maybe I’d be concerned with self nuking. Also starting a fight, and stuff like that—to ensure destruction—could work very well.