TurnTrout comments on Instrumental Convergence For Realistic Agent Objectives

TurnTrout 26 Jan 2022 18:18 UTC
LW: 3 AF: 3
0
AF
I agree that in certain conceivable games which are not baseline SC2, there will be different power-seeking incentives for negative alpha weights. My commentary wasn’t intended as a generic takeaway about negative feature weights in particular.
But in the game which actually is SC2, where you don’t start with a huge number of resources, negative alpha weights don’t incentivize power-seeking. You do need to think about the actual game being considered, before you can conclude that negative alpha weighs imply such-and-such a behavior.
But the number of long-term possibilities available to you following spending resources on infrastructure is reduced
I think that either $γ << 1$ or considering suboptimal power-seeking resolves the situation. The reason that building infrastructure intuitively seems like power-seeking is that we are not optimal logically omniscient agents; all possible future trajectores do not lay out immediately before our minds. But the suboptimal power-seeking metric (Appendix C in Optimal Policies Tend To Seek Power) does match intuition here AFAICT, where cleverly building infrastructure has the effect of navigating the agent to situations with more cognitively exploitable opportunities.