Jan_Kulveit comments on Multi-agent predictive minds and AI alignment

Jan_Kulveit 17 Dec 2018 22:03 UTC
12 points
0
The thing I’m trying to argue is complex and yes, it is something in the middle between the two options.
1. Predictive processing (in the “perception” direction) makes some brave predictions, which can be tested and match data/experience. My credence in predictive processing in a narrow sense: 0.95
2. Because of the theoretical beauty, I think we should take active inference seriously as an architectural principle. Vague introspective evidence for active inference comes from an ability to do inner simulations. Possibly boldest claim I can make from the principle alone is that people will have a bias to take actions which will “prove their models are right” even at the cost of the actions being actually harmful for them in some important sense. How it may match everyday experience: for example, here. My credence in active inference as a basic design mechanism: 0.6
3. So far, the description was broadly Bayesian/optimal/”unbounded”. Unbounded predictive processing / active inference agent is a fearsome monster in a similar way as a fully rational VNM agent. The other key ingredient is bounded rationality. Most biases are consequence of computational/signal processing boundedness, both in PP/AI models and non PP/AI models. My credence in boundedness being a key ingredient: 0.99
4. What is missing from the picture so far is some sort of “goals” or “motivation” (or in other view, a way how evolution can insert into the brain some signal). How Karl Friston deals with this, e.g.
We start with the premise that adaptive agents or pheno-types must occupy a limited repertoire of physical states. For a phenotype to exist, it must possess defining characteristics or traits; both in terms of its morphology and exchange with the environment. These traits essentially limit the agent to a bounded region in the space of all states it could be in. Once outside these bounds, it ceases to possess that trait (cf., a fish out of water).
is something which I find unsatisfactory. My credence in this being complete explanation: 0.1
5. My hypothesis is ca. this:
- evolution inserts some “goal-directed” sub-parts into the PP/AI machinery
- these sub-parts do not somehow “directly interface the world”, but are “burried” within the hierarchy of the generative layers; so they not care about people or objects or whatever, but about some abstract variables
- they are quite “agenty”, optimizing some utility function
- from the point of view of such sub-agent, other sub-agents inside of the same mind are possibly competitors; at least some sub-agents likely have access to enough computing power to not only “care about what they are intended to care about”, but do a basic modelling of other sub-agents; internal game theoretical mess ensues
6. This hypothesis bridges the framework of PP/AI and the world of theories viewing the mind as a multi agent system. Multi-agent theories of mind have some introspective support in various styles of psychotherapy, IFS, meditative experience, some rationality techniques. And also seem to be explain behavior where humans seem to “defect against themselves”. Credence: 0.8
(I guess a predictive processing purist would probably describe 5. & 6. as just a case of competing predictive models, not adding anything conceptually new.)
Now I would actually want to draw a graph how strongly 1...6. motivate different possible problems with alignment, and how these problems motivate various research questions. For example the question about understanding hierarchical modelling is interesting even if there is no multi-agency, scaling of sub-agents can be motivated even without active inference, etc.
- abramdemski 22 Dec 2018 4:09 UTC
  9 points
  0
  Parent
  Vague introspective evidence for active inference comes from an ability to do inner simulations.
  I would take this as introspective evidence in favor of something model-based, but it could look more like model-based RL rather than active inference. (I am not specifically advocating for model-based RL as the right model of human thinking.)
  Possibly boldest claim I can make from the principle alone is that people will have a bias to take actions which will “prove their models are right” even at the cost of the actions being actually harmful for them in some important sense.
  I believe this claim based on social dynamics—among social creatures, it seems evolutionarity useful to try to prove your models right. An adaptation for doing this may influence your behavior even when you have no reason to believe anyone is looking or knows about the model you are confirming.
  So, an experiment which would differentiate between socio-evolutionary causes and active inference would be to look for the effect in non-social animals. An experiment which comes to mind is that you somehow create a situation where an animal is trying to achieve some goal, but you give false feedback so that the animal momentarily thinks it is less successful than it is. Then, you suddenly replace the false feedback with real feedback. Does the animal try and correct to the previously believed (false) situation, in order to minimize predictive error? Rather than continuing to optimize in a way consistent with the task reward?
  There are a lot of confounders. For example, one version of the experiment would involve trying to put your paw as high in the air as possible, and (somehow) initially getting false feedback about how well you are doing. When you suddenly start getting good feedback, do you re-position the paw to restore the previous level of feedback (minimizing predictive error) before trying to get it higher again? A problem with the experiment is that you might re-position your paw just because the real feedback changes the cost-benefit ratio, so a rational agent would try less hard at the task if it found out it was doing better than it thought.
  A second example: pushing an object to a target location on the floor. If (somehow) you initially get bad feedback about where you are on the floor, and suddenly the feedback gets corrected, do you go to the location you thought you were at before continuing to make progress toward the goal? A confounder here is that you may have learned a procedure for getting the object to the desired location, and you are more confident in the results of following the procedure than you are otherwise. So, you prefer to push the object to the target location along the familiar route rather than in the efficient route from the new location, but this is a consequence of expected utility maximization under uncertainty about the task rather than any special desire to increase familiarity.
  Note that I don’t think of this as a prediction made by active inference, since active inference broadly speaking may precisely replicate max-expected-utility, or do other things. However, it seems like a prediction made by your favored version of active inference.
  Because of the theoretical beauty, I think we should take active inference seriously as an architectural principle.
  I think we may be able to make some progress on the question of its theoretical beauty. I share a desire for unified principles of epistemic and instrumental reasoning. However, I have an intuition that active inference is just not the right way to go about it. The unification is too simplistic, and has too many degrees of freedom. It should have some initial points for its simplicity, but it should lose those points when the simplest versions don’t seem right (eg, when you conclude that the picture is missing goals/motivation).
  So far, the description was broadly Bayesian/optimal/”unbounded”. Unbounded predictive processing / active inference agent is a fearsome monster in a similar way as a fully rational VNM agent. The other key ingredient is bounded rationality. Most biases are consequence of computational/signal processing boundedness, both in PP/AI models and non PP/AI models.
  FWIW, I want to mention logical induction as a theory of bounded rationality. It isn’t really bounded enough to be the picture of what’s going on in humans, but it is certainly major progress on the question of what should happen to probability theory when you have bounded processing power.
  I mention this not because it is directly relevant, but because I think people don’t necessarily realize logical induction is in the “bounded rationality” arena (even though “logical uncertainty” is definitionally very very close to “bounded rationality”, the type of person who tends to talk about logical uncertainty is usually pretty different from the type of person who talks about bounded rationality, I think).
  ---
  Another thing I want to mention—although not every version of active inference predicts that organisms actively seek out the familiar and avoid the unfamiliar, it does seem like one of the central intended predictions, and a prediction I would guess most advocates of active inference would argue matches reality. One of my reasons for not liking the theory much is because I don’t think it is likely to capture curiosity well. Humans engage in both familiarity-seeking and novelty-seeking behavior, and both for a variety of reasons (both terminal-goal-ish and instrumental-goal-ish), but I think we are closer to novelty-seeking than active inference would predict.
  In Delusion, Survival, and Intelligent Agents (Ring & Orseau), behavior of a knowledge-seeking agent and a predictive-accuracy seeking agent are compared. Note that the knowledge-seeking agent and predictive-accuracy seeking agent have exactly opposite utility functions: the knowledge-seeking agent likes to be surprised, whereas the accuracy-seeking agent dislikes surprises. The knowledge-seeking agent behaves in (what I see as) a much more human way than the accuracy-seeking agent. The accuracy-seeking agent will try to gain information to a limited extent, but will ultimately try to remove all sources of novel stimuli to the extent possible. The knowledge-seeking agent will try to do new things forever.
  I would also expect evolution to produce something more like the knowledge-seeking agent than the accuracy-seeking agent. In RL, curiosity is a major aid to learning. The basic idea is to augment agents with an intrinsic motive to gain information, in order to ultimately achieve better task performance. There are a wide variety of formulas for curiosity, but as far as I know they are all closer to valuing surprise than avoiding surprise, and this seems like what they should be. So, to the extent that evolution did something similar to designing a highly effective RL agent, it seems more likely that organisms seek novelty as opposed to avoid it.
  So, I think the idea that organisms seek familiar experiences over unfamiliar is actually the opposite of what we should expect overall. It is true that for an organism which has learned a decent amount about its environment, we expect to see it steering toward states that are familiar to it. But this is just a consequence of the fact that it has optimized its policy quite a bit; so, it steers toward rewarding states, and it will have seen rewarding states frequently in the past for the same reason. However, in order to get organisms to this place as reliably as possible, it is more likely that evolution would have installed a decision procedure which steers disproportionately toward novelty (all else being equal) than one which steers disproportionately away from novelty (all else being equal).