Alex Flint comments on Clarifying the Agent-Like Structure Problem

Alex Flint 29 Sep 2022 23:57 UTC
LW: 8 AF: 4
0
AF
Would it be sufficient, for disproof, to show one system that does steer far-away parts of the world into a relatively-small chunk of their state space, but does not internally contain a world-model or do planning?
- johnswentworth 30 Sep 2022 0:28 UTC
  LW: 2 AF: 2
  0
  AF Parent
  That could be sufficient in principle, though I would not be surprised if I look at a counterexample and realize that the problem description was missing something rather than that the claim is basically false. For instance, it wouldn’t be too surprising if there’s some class of supposed-counterexamples which only work under very specific conditions (i.e. they’re not robust), and can be ruled out by some “X isn’t very likely to happen spontaneously in a large system” argument.
  The bottom line is that a disproof should argue that there isn’t anything basically like the claim which is true. Finding a physical counterexample would at least show that the counterexample isn’t sensitive to the details of mathematical framework/formulation.
  Do you have a particular counterexample in mind? A thermometer, perhaps?
  - Alex Flint 30 Sep 2022 0:52 UTC
    LW: 16 AF: 10
    2
    AF Parent
    Well ok just to brainstorm some naive things that don’t really rule the conjecture out:
    
    A nuclear bomb steers a lot of far-away objects into a high-entropy configuration, and does so very robustly, but that perhaps is not a “small part of the state space”
    
    A biological pathogen, let loose in a large human population, might steer all the humans towards the configuration “coughing”, but the virus is not itself a consequentialist. You might say that the pathogen had to have been built by a consequentialist, though.
    
    Generalizing the above: Suppose I discover some powerful zero-day exploit for the linux kernel. I automate the exploit, setting my computer up to wait 24 hours and then take over lots of computers on the internet. Viewing this whole thing from the outside, it might look as if it’s my computer that is “doing” the take-over, but my computer itself doesn’t have a world model or a planning routine.
    
    Consider some animal species spreading out from an initial location and making changes to the environments they colonize. If you think of all the generations of animals that underwent natural selection before spreading out as the “system that controls some remote parts of the system” and the individual organisms as kind of messages or missiles, then this seems like a pretty robust, though slow form of remote control. Maybe you would say that natural selection has a world model and a planning process, though.
    - Adam Jermyn 30 Sep 2022 21:03 UTC
      LW: 6 AF: 4
      0
      AF Parent
      These are interesting examples!
      In the first example there’s an element of brute force. Nuclear bombs only robustly achieve their end states because ~nothing is robust to that kind of energy. In the same way that e.g. humans can easily overcome small numbers of ants. So maybe the theorem needs to specify that the actions that achieve the end goal need to be specific to the starting situation? That would disqualify nukes because they just do the same thing no matter their environments.
      In the third example, the computer doesn’t robustly steer the world. It only steers the world until someone patches that exploit. Whereas e.g. an agent with a world model and planning ability would still be able to steer the world by e.g. discovering other exploits.
      I think the same objection holds for the second example: to the extent that the pathogen doesn’t evolve it is unable to robustly steer the world because immune systems exist and will adapt (by building up immunity or by humans inventing vaccines etc.). To the extent that the pathogen does evolve it starts to look like the fourth example.
      I think the fourth example is the one that I’m most confused about. Natural selection kind of has a world model, in the sense that the organisms have DNA which is adapted to the world. Natural selection also kind of has a planning process, it’s just a super myopic one on the time-scale of evolution (involving individuals making mating choices). But it definitely feels like “natural selection has a world model and planning process” is a sentence that comes with caveats, which makes me suspect that these may not be the right concepts.
      - Alex Flint 1 Oct 2022 18:16 UTC
        LW: 9 AF: 6
        2
        AF Parent
        
        I think the fourth example is the one that I’m most confused about. Natural selection kind of has a world model, in the sense that the organisms have DNA which is adapted to the world. Natural selection also kind of has a planning process, it’s just a super myopic one on the time-scale of evolution (involving individuals making mating choices). But it definitely feels like “natural selection has a world model and planning process” is a sentence that comes with caveats, which makes me suspect that these may not be the right concepts.
        
        Yeah I think you’ve said it well here.
        
        Another similar example: Consider a computer that trains robots and deploys a new one. Suppose for the sake of this example that the individual robots definitely do not do planning or have a world model, but still can execute some simple policy such as “go to this place, collect this resource, construct this simple structure, etc”. The computer that trains and deploys the robots does so by taking all the robots that were deployed on the previous day, selecting the ones that performed best according to a certain objective such as collecting a certain resource, and deploying more robots like that. This is a basic evolutionary algorithm.
        
        Like in the case of evolution, it’s a bit difficult to say where the “world model” and “planning process” are in this example. If they are anywhere, they are kind of distributed through the computer/robot/world system.
        
        OK now consider a modification to the above example. The previous example is going to optimize very slowly. Suppose we make the optimization go faster in the following way: we collect video data from each of the robots, and the central computer uses the data collected by each of the robots on the previous day to train, using reinforcement learning rather than evolutionary search, the robots for the next day. To do this, it trains, using supervised learning on the raw video data, a predictor that maps robot policies to predicted outcomes, and then, using reinforcement learning, searches for robot policies that are predicted to perform well. Now we have a very clear world model and planning process—the world model is the trained prediction function and the planning process is the search over robot policies with respect to that prediction function. But the way we got here was as a performance optimization of a process that had a very unclear world model and planning process.
        
        It seems to me that human AI engineers have settled on a certain architecture for optimizing design processes. That architecture, roughly speaking, is to form an explicit world model and do explicit search over it. But I suspect this is just one architecture by which one can organize information in order to take an action. It seems like a very clean architecture to me, but I’m not sure that all natural processes that organize information in order to take an action will do so using this architecture.
    - Eigil Rischel 6 Oct 2022 12:36 UTC
      LW: 3 AF: 3
      0
      AF Parent
      I mean, “is a large part of the state space” is basically what “high entropy” means!
      
      For case 3, I think the right way to rule out this counterexample is the probabilistic criterion discussed by John—the vast majority of initial states for your computer don’t include a zero-day exploit and a script to automatically deploy it. The only way to make this likely is to include you programming your computer in the picture, and of course you do have a world model (without which you could not have programmed your computer)
      - Alex Flint 6 Oct 2022 15:39 UTC
        LW: 5 AF: 3
        0
        AF Parent
        But the vast majority of initial states for a lump of carbon/oxygen/hydrogen/nitrogen atoms do not include a person programming a computer with the intention of taking over the internet. Shouldn’t you apply the same logic there that you apply to the case of a computer?
        
        In fact a single zero day exploit is certainly much simpler than a full human, so aprior it’s more likely for a computer with a zero day exploit to form from the void than for a computer with a competent human intent on taking over the internet to form from the void.
    - tailcalled 30 Sep 2022 8:35 UTC
      LW: 3 AF: 1
      1
      AF Parent
      
      A nuclear bomb steers a lot of far-away objects into a high-entropy configuration, and does so very robustly, but that perhaps is not a “small part of the state space”
      
      This example reminds me of a thing I have been thinking about, namely that it seems like optimization can only occur in cases where the optimization produces/is granted enough “energy” to control the level below. In this example, the model works in a quite literal way, as a nuclear bomb floods an area with energy, and I think this example generalizes to e.g. markets with Dutch disease.
      
      Flooding the lower level with “energy” is presumably not the only way this problem can occur; lack of incentives/credit assignment in the upper level generates this result simply because no incentives means that the upper level does not allocate “energy” to the area.
      - Alex Flint 30 Sep 2022 20:47 UTC
        LW: 4 AF: 3
        0
        AF Parent
        Yeah I think you’re right. I have the sense that the pure algorithmic account of optimization—that optimization is about algorithms that do search over plans using models derived from evidence to evaluate each plan’s merit—doesn’t quite account for what an optimizer really is in the physical world.
        
        The thing is that I can implement some very general-purpose modelling + plan-search algorithm on my computer (for example, monte carlo versions of AIXI) and hook it up to real sensors and actuators and it will not do anything interesting much at all. It’s too slow and unreflective to really work.
        
        Therefore, an object running a consequentialist computation is definitely not a sufficient condition for remote control as per John’s conjecture, but perhaps it is a necessary condition—that’s what the OP is asking for a proof or disproof of.