Subagents, akrasia, and coherence in humans

In my previous posts, I have been building up a model of mind as a collection of subagents with different goals, and no straightforward hierarchy. This then raises the question of how that collection of subagents can exhibit coherent behavior: after all, many ways of aggregating the preferences of a number of agents fail to create consistent preference orderings.

We can roughly describe coherence as the property that, if you become aware that there exists a more optimal strategy for achieving your goals than the one that you are currently executing, then you will switch to that better strategy. If an agent is not coherent in this way, then bad things are likely to happen to them.

Now, we all know that humans sometimes express incoherent behavior. But on the whole, people still do okay: the median person in a developed country still manages to survive until their body starts giving up on them, and typically also manages to have and raise some number of initially-helpless children until they are old enough to take care of themselves.

For a subagent theory of mind, we would like to have some explanation of when exactly the subagents manage to be collectively coherent (that is, change their behavior to some better one), and what are the situations in which they fail to do so. The conclusion of this post will be:

We are capable of changing our behaviors on occasions when the mind-system as a whole puts sufficiently high probability on the new behavior being better, when the new behavior is not being blocked by a particular highly weighted subagent (such as an IFS-style protector) that puts high probability on it being bad, and when we have enough slack in our lives for any new behaviors to be evaluated in the first place. Akrasia is subagent disagreement about what to do.

Correcting your behavior as a default

There are many situations in which we exhibit incoherent behavior simply because we’re not aware of it. For instance, suppose that I do my daily chores in a particular order, when doing them in some other order would save more time. If you point this out to me, I’m likely to just say “oh”, and then adopt the better system.

Similarly, several of the experiments which get people to exhibit incoherent behavior rely on showing different groups of people different formulations of the same question, and then indicating that different framings of the same question get different answers from people. It doesn’t work quite as well if you show the different formulations to the same people, because then many of them will realize that differing answers would be inconsistent.

But there are also situations in which someone realizes that they are behaving in a nonsensical way, yet will continue behaving in that way. Since people usually can change suboptimal behaviors, we need an explanation for why they sometimes can’t.

Towers of protectors as a method for coherence

In my post about Internal Family Systems, I discussed a model of mind composed of several different kinds of subagents. One of them, the default planning subagent, is a module just trying to straightforwardly find the best thing to do and then execute that. On the other hand, protector subagents exist to prevent the system from getting into situations which were catastrophic before. If they think that the default planning subagent is doing something which seems dangerous, they will override it and do something else instead. (Previous versions of the IFS post called the default planning agent, “a reinforcement learning subagent”, but this was potentially misleading since several other subagents were reinforcement learning ones too, so I’ve changed the name.)

Thus, your behavior can still be coherent even if you feel that you are failing to act in a coherent way. You simply don’t realize that a protector is carrying out a routine intended to avoid dangerous outcomes—and this might actually be a very successful way of keeping you out of danger. Some subagents in your mind think that doing X would be a superior strategy, but the protector thinks that it would be a horrible idea—so from the point of view of the system as a whole, doing X is not a better strategy, so not switching to it is actually better.

On the other hand, it may also be the case that the protector’s behavior, while keeping you out of situations which the protector considers unacceptable, is causing other outcomes which are also unacceptable. The default planning subagent may realize this—but as already established, any protector can overrule it, so this doesn’t help.

Evolution’s answer here seems to be spaghetti towers. The default planning subagent might eventually figure out the better strategy, which avoids both the thing that the protector is trying to block and the new bad outcome. But it could be dangerous to wait that long, especially since the default planning agent doesn’t have direct access to the protector’s goals. So for the same reasons why a separate protector subagent was created to avoid the first catastrophe, the mind will create or recruit a protector to avoid the second catastrophe—the one that the first protector keeps causing.

With permission, I’ll borrow the illustrations from eukaryote’s spaghetti tower post to illustrate this.

Example Eric grows up in an environment where he learns that disagreeing with other people is unsafe, and that he should always agree to do things that other people ask of him. So Eric develops a protector subagent running a pleasing, submissive behavior.

Unfortunately, while this tactic worked in Eric’s childhood home, once he became an adult he starts saying “yes” to too many things, without leaving any time for his own needs. But saying “no” to anything still feels unsafe, so he can’t just stop saying “yes”. Instead he develops a protector which tries to keep him out of situations where people would ask him to do anything. This way, he doesn’t need to say “no”, and also won’t get overwhelmed by all the things that he has promised to do. The two protectors together form a composite strategy.

While this helps, it still doesn’t entirely solve the issue. After all, there are plenty of reasons that might push Eric into situations where someone would ask something of him. He still ends up agreeing to do lots of things, to the point of neglecting his own needs. Eventually, his brain creates another protector subagent. This one causes exhaustion and depression, so that he now has a socially-acceptable reason for being unable to do all the things that he has promised to do. He continues saying “yes” to things, but also keeps apologizing for being unable to do things that he (honestly) intended to do as promised, and eventually people realize that you probably shouldn’t ask him to do anything that’s really important to get done.

And while this kind of a process of stacking protector on top of a protector is not perfect, for most people it mostly works out okay. Almost everyone ends up having their unique set of minor neuroses and situations where they don’t quite behave rationally, but as they learn to understand themselves better, their default planning subagent gets better at working around those issues. This might also make the various protectors relax a bit, since the various threats are generally avoided and there isn’t a need to keep avoiding them.

Gradually, as negative consequences to different behaviors become apparent, behavior gets adjusted—either by the default planning subagents or by spawning more protectors—and remains coherent overall.

But sometimes, especially for people in highly stressful environments where almost any mistake may get them punished, or when they end up in an environment that their old tower of protectors is no longer well-suited for (distributional shift), things don’t go as well. In that situation, their minds may end up looking like this a hopelessly tangled web, where they have almost no flexibility. Something happens in their environment, which sets off one protector, which sets off another, which sets off another—leaving them with no room for flexibility or rational planning, but rather forcing them to act in a way which is almost bound to only make matters worse.

This kind of an outcome is obviously bad. So besides building spaghetti towers, the second strategy which the mind has evolved to employ for keeping its behavior coherent while piling up protectors, is the ability to re-process memories of past painful events.

As I discussed in my original IFS post, the mind has methods for bringing up the original memories which caused a protector to emerge, in order to re-analyze them. If ending up in some situation is actually no longer catastrophic (for instance, you are no longer in your childhood home where you get punished simply for not wanting to do something), then the protectors which were focused on avoiding that outcome can relax and take a less extreme role.

For this purpose, there seems to be a built-in tension. Exiles (the IFS term for subagents containing memories of past trauma) “want” to be healed and will do things like occasionally sending painful memories or feelings into consciousness so as to become the center of attention, especially if there is something about the current situation which resembles the past trauma. This also acts as what my IFS post called a fear model—something that warns of situations which resemble the past trauma enough to be considered dangerous in their own right. At the same time, protectors “want” to keep the exiles hidden and inactive, doing anything that they can for keeping them so. Various schools of therapy—IFS one of them—seek to tap into this existing tension so as to reveal the trauma, trace it back to its original source, and heal it.

Coherence and conditioned responses

Besides the presence of protectors, another possibility for why we might fail to change our behavior are strongly conditioned habits. Most human behavior involves automatic habits: behavioral routines which are triggered by some sort of a cue in the environment, and lead to or have once led to a reward. (Previous discussion; see also.)

The problem with this is that people might end up with habits that they wouldn’t want to have. For instance, I might develop a habit of checking social media on their phone when I’m bored, creating a loop of boredom (cue) → looking at social media (behavior) → seeing something interesting on social media (reward).

Reflecting on this behavior, I notice that back when I didn’t do it, my mind was more free to wander when I was bored, generating motivation and ideas. I think that my old behavior was more valuable than my new one. But even so, my new behavior still delivers enough momentary satisfaction to keep reinforcing the habit.

Subjectively, this feels like an increasing compulsion to check my phone, which I try to resist since I know that long-term it would be a better idea to not be checking my phone all the time. But as the compulsion keeps growing stronger and stronger, eventually I give up and look at the phone anyway.

The exact neuroscience of what is happening at such a moment remains only partially understood (Simpson & Balsam 2016). However, we know that whenever different subsystems in the brain produce conflicting motor commands, that conflict needs to be resolved, with only one at a time being granted access to the “final common motor path”. This is thought to happen in the basal ganglia, a part of the brain closely involved in action selection and connected to the global neuronal workspace.

One model (e.g. Redgrave 2007, McHaffie 2005) is that the basal ganglia receives inputs from many different brain systems; each of those systems can send different “bids” supporting or opposing a specific course of action to the basal ganglia. A bid submitted by one subsystem may, through looped connections going back from the basal ganglia, inhibit other subsystems, until one of the proposed actions becomes sufficiently dominant to be taken.

The above image from Redgrave 2007 has a conceptual image of the model, with two example subsystems shown. Suppose that you are eating at a restaurant in Jurassic Park when two velociraptors charge in through the window. Previously, your hunger system was submitting successful bids for the “let’s keep eating” action, which then caused inhibitory impulses to the be sent to the threat system. This inhibition prevented the threat system from making bids for silly things like jumping up from the table and running away in a panic. However, as your brain registers the new situation, the threat system gets significantly more strongly activated, sending a strong bid for the “let’s run away” action. As a result of the basal ganglia receiving that bid, an inhibitory impulse is routed from the basal ganglia to the subsystem which was previously submitting bids for the “let’s keep eating” actions. This makes the threat system’s bids even stronger relative to the (inhibited) eating system’s bids.

Soon the basal ganglia, which was previously inhibiting the threat subsystem’s access to the motor system while allowing the eating system access, withdraws that inhibition and starts inhibiting the eating system’s access instead. The result is that you jump up from your chair and begin to run away. Unfortunately, this is hopeless since the velociraptor is faster than you. A few moments later, the velociraptor’s basal ganglia gives the raptor’s “eating” subsystem access to the raptor’s motor system, letting it happily munch down its latest meal.

But let’s leave velociraptors behind and go back to our original example with the phone. Suppose that you have been trying to replace the habit of looking at your phone when bored, to instead smiling and directing your attention to pleasant sensations in your body, and then letting your mind wander.

Until the new habit establishes itself, the two habits will compete for control. Frequently, the old habit will be stronger, and you will just automatically check your phone without even remembering that you were supposed to do something different. For this reason, behavioral change programs may first spend several weeks just practicing noticing the situations in which you engage in the old habit. When you do notice what you are about to do, then more goal-directed subsystems may send bids towards the “smile and look for nice sensations” action. If this happens and you pay attention to your experience, you may notice that long-term it actually feels more pleasant than looking at the phone, reinforcing the new habit until it becomes prevalent.

To put this in terms of the subagent model, we might drastically simplify things by saying that the neural pattern corresponding to the old habit is a subagent reacting to a specific sensation (boredom) in the consciousness workspace: its reaction is to generate an intention to look at the phone. At first, you might train the subagent responsible for monitoring the contents of your consciousness, to output moments of introspective awareness highlighting when that intention appears. That introspective awareness helps alert a goal-directed subagent to try to trigger the new habit instead. Gradually, a neural circuit corresponding to the new habit gets trained up, which starts sending its own bids when it detects boredom. Over time, reinforcement learning in the basal ganglia starts giving that subagent’s bids more weight relative to the old habit’s, until it no longer needs the goal-directed subagent’s support in order to win.

Now this model helps incorporate things like the role of having a vivid emotional motivation, a sense of hope, or psyching yourself up when trying to achieve habit change. Doing things like imagining an outcome that you wish the habit to lead to, may activate additional subsystems which care about those kinds of outcomes, causing them to submit additional bids in favor of the new habit. The extent to which you succeed at doing so, depends on the extent to which your mind-system considers it plausible that the new habit leads to the new outcome. For instance, if you imagine your exercise habit making you strong and healthy, then subagents which care about strength and health might activate to the extent that you believe this to be a likely outcome, sending bids in favor of the exercise action.

On this view, one way for the mind to maintain coherence and readjust its behaviors, is its ability to re-evaluate old habits in light of which subsystems get activated when reflecting on the possible consequences of new habits. An old habit having been strongly reinforced reflects that a great deal of evidence has accumulated in favor of it being beneficial, but the behavior in question can still be overridden if enough influential subsystems weigh in with their evaluation that a new behavior would be more beneficial in expectation.

Some subsystems having concerns (e.g. immediate survival) which are ranked more highly than others (e.g. creative exploration) means that the decision-making process ends up carrying out an implicit expected utility calculation. The strengths of bids submitted by different systems do not just reflect the probability that those subsystems put on an action being the most beneficial. There are also different mechanisms giving the bids from different subsystems varying amounts of weight, depending on how important the concerns represented by that subsystem happen to be in that situation. This ends up doing something like weighting the probabilities by utility, with the kinds of utility calculations that are chosen by evolution and culture in a way to maximize genetic fitness on average. Protectors, of course, are subsystems whose bids are weighted particularly strongly, since the system puts high utility on avoiding the kinds of outcomes they are trying to avoid.

The original question which motivated this section was: why are we sometimes incapable of adopting a new habit or abandoning an old one, despite knowing that to be a good idea? And the answer is: because we don’t know that such a change would be a good idea. Rather, some subsystems think that it would be a good idea, but other subsystems remain unconvinced. Thus the system’s overall judgment is that the old behavior should be maintained.

Interlude: Minsky on mutually bidding subagents

I was trying to concentrate on a certain problem but was getting bored and sleepy. Then I imagined that one of my competitors, Professor Challenger, was about to solve the same problem. An angry wish to frustrate Challenger then kept me working on the problem for a while. The strange thing was, this problem was not of the sort that ever interested Challenger.
What makes us use such roundabout techniques to influence ourselves? Why be so indirect, inventing misrepresentations, fantasies, and outright lies? Why can’t we simply tell ourselves to do the things we want to do? [...]
Apparently, what happened was that my agency for Work exploited Anger to stop Sleep. But why should Work use such a devious trick?
To see why we have to be so indirect, consider some alternatives. If Work could simply turn off Sleep, we’d quickly wear our bodies out. If Work could simply switch Anger on, we’d be fighting all the time. Directness is too dangerous. We’d die.
Extinction would be swift for a species that could simply switch off hunger or pain. Instead, there must be checks and balances. We’d never get through one full day if any agency could seize and hold control over all the rest. This must be why our agencies, in order to exploit each other’s skills, have to discover such roundabout pathways. All direct connections must have been removed in the course of our evolution.
This must be one reason why we use fantasies: to provide the missing paths. You may not be able to make yourself angry simply by deciding to be angry, but you can still imagine objects or situations that make you angry. In the scenario about Professor Challenger, my agency Work exploited a particular memory to arouse my Anger’s tendency to counter Sleep. This is typical of the tricks we use for self-control.
Most of our self-control methods proceed unconsciously, but we sometimes resort to conscious schemes in which we offer rewards to ourselves: “If I can get this project done, I’ll have more time for other things.” However, it is not such a simple thing to be able to bribe yourself. To do it successfully, you have to discover which mental incentives will actually work on yourself. This means that you—or rather, your agencies—have to learn something about one another’s dispositions. In this respect the schemes we use to influence ourselves don’t seem to differ much from those we use to exploit other people—and, similarly, they often fail. When we try to induce ourselves to work by offering ourselves rewards, we don’t always keep our bargains; we then proceed to raise the price or even deceive ourselves, much as one person may try to conceal an unattractive bargain from another person.
Human self-control is no simple skill, but an ever-growing world of expertise that reaches into everything we do. Why is it that, in the end, so few of our self-incentive tricks work well? Because, as we have seen, directness is too dangerous. If self-control were easy to obtain, we’d end up accomplishing nothing at all.

-- Marvin Minsky, The Society of Mind

Akrasia is subagent disagreement

You might feel that the above discussion doesn’t still entirely resolve the original question. After all, sometimes we do manage to change even strongly conditioned habits pretty quickly. Why is it sometimes hard and sometimes easier?

Redgrave et al. (2010) discuss two modes of behavioral control: goal-directed versus habitual. Goal-directed control is a relatively slow mode of decision-making, where “action selection is determined primarily by the relative utility of predicted outcomes”, whereas habitual control involves more directly conditioned stimulus-response behavior. Which kind of subsystem is in control is complicated, and depends on a variety of factors (the following quote has been edited to remove footnotes to references; see the original for those):

Experimentally, several factors have been shown to determine whether the agent (animal or human) operates in goal-directed or habitual mode. The first is over-training: here, initial control is largely goal-directed, but with consistent and repeated training there is a gradual shift to stimulus–response, habitual control. Once habits are established, habitual responding tends to dominate, especially in stressful situations in which quick reactions are required. The second related factor is task predictability: in the example of driving, talking on a mobile phone is fine so long as everything proceeds predictably. However, if something unexpected occurs, such as someone stepping out into the road, there is an immediate switch from habitual to goal-directed control. Making this switch takes time and this is one of the reasons why several countries have banned the use of mobile phones while driving. The third factor is the type of reinforcement schedule: here, fixed-ratio schedules promote goal-directed control as the outcome is contingent on responding (for example, a food pellet is delivered after every n responses). By contrast, interval schedules (for example, schedules in which the first response following a specified period is rewarded) facilitate habitual responding because contingencies between action and outcome are variable. Finally, stress, often in the form of urgency, has a powerful influence over which mode of control is used. The fast, low computational requirements of stimulus–response processing ensure that habitual control predominates when circumstances demand rapid reactions (for example, pulling the wrong way in an emergency when driving on the opposite side of the road). Chronic stress also favours stimulus–response, habitual control. For example, rats exposed to chronic stress become, in terms of their behavioural responses, insensitive to changes in outcome value and resistant to changes in action–outcome contingency. [...]
Although these factors can be seen as promoting one form of instrumental control over the other, real-world tasks often have multiple components that must be performed simultaneously or in rapid sequences. Taking again the example of driving, a driver is required to continue steering while changing gear or braking. During the first few driving lessons, when steering is not yet under automatic stimulus–response control, things can go horribly awry when the new driver attempts to change gears. By contrast, an experienced (that is, ‘over-trained’) driver can steer, brake and change gear automatically, while holding a conversation, with only fleeting contributions from the goal-directed control system. This suggests that many skills can be deconstructed into sequenced combinations of both goal-directed and habitual control working in concert. [...]
Nevertheless, a fundamental problem remains: at any point in time, which mode should be allowed to control which component of a task? Daw et al. have used a computational approach to address this problem. Their analysis was based on the recognition that goal-directed responding is flexible but slow and carries comparatively high computational costs as opposed to the fast but inflexible habitual mode. They proposed a model in which the relative uncertainty of predictions made by each control system is tracked. In any situation, the control system with the most accurate predictions comes to direct behavioural output.

Note those last sentences: besides the subsystems making their own predictions, there might also be a meta-learning system keeping track of which other subsystems tend to make the most accurate predictions in each situation, giving extra weight to the bids of the subsystem which has tended to perform the best in that situation. We’ll come back to that in future posts.

This seems compatible with my experience in that, I feel like it’s possible for me to change even entrenched habits relatively quickly—assuming that the new habit really is unambiguously better. In that case, while I might forget and lapse to the old habit a few times, there’s still a rapid feedback loop which quickly indicates that the goal-directed system is simply right about the new habit being better.

Or, the behavior in question might be sufficiently complex and I might be sufficiently inexperienced at it, that the goal-directed (default planning) subagent has always mostly remained in control of it. In that case change is again easy, since there is no strong habitual pattern to override.

In contrast, in cases where it’s hard to establish a new behavior, there tends to be some kind of genuine uncertainty:

  • The benefits of the old behavior have been validated in the form of direct experience (e.g. unhealthy food that tastes good, has in fact tasted good each time), whereas the benefits of the new behavior come from a less trusted information source which is harder to validate (e.g. I’ve read scientific studies about the long-term health risks of this food).

  • Immediate vs. long-term rewards: the more remote the rewards, the larger the risk that they will for some reason never materialize.

  • High vs. low variance: sometimes when I’m bored, looking at my phone produces genuinely better results than letting my thoughts wander. E.g. I might see an interesting article or discussion, which gives me novel ideas or insights that I would not otherwise have had. Basically looking at my phone usually produces worse results than not looking at it—but sometimes it also produces much better ones than the alternative.

  • Situational variables affecting the value of the behaviors: looking at my phone can be a way to escape uncomfortable thoughts or sensations, for which purpose it’s often excellent. This then also tends to reinforce the behavior of looking at the phone when I’m in the same situation otherwise, but without uncomfortable sensations that I’d like to escape.

When there is significant uncertainty, the brain seems to fall back to those responses which have worked the best in the past—which seems like a reasonable approach, given that intelligence involves hitting tiny targets in a huge search space, so most novel responses are likely to be wrong.

As the above excerpt noted, the tendency to fall back to old habits is exacerbated during times of stress. The authors attribute it to the need to act quickly in stressful situations, which seems correct—but I would also emphasize the fact that negative emotions in general tend to be signs of something being wrong. E.g. Eldar et al. (2016) note that positive or negative moods tend to be related to whether things are going better or worse than expected, and suggest that mood is a computational representation of momentum, acting as a sort of global update to our reward expectations.

For instance, if an animal finds more fruit than it had been expecting, that may indicate that spring is coming. A shift to a good mood and being “irrationally optimistic” about finding fruit even in places where the animal hasn’t seen fruit in a while, may actually serve as a rational pre-emptive update to its expectations. In a similar way, things going less well than expected may be a sign of some more general problem, necessitating fewer exploratory behaviors and less risk-taking, so falling back into behaviors for which there is a higher certainty of them working out.

So to repeat the summary that I had in the beginning: we are capable of changing our behaviors on occasions when the mind-system as a whole puts sufficiently high probability on the new behavior being better, when the new behavior is not being blocked by a particular highly weighted subagent (such as an IFS protector whose bids get a lot of weight) that puts high probability on it being bad, and when we have enough slack in our lives for any new behaviors to be evaluated in the first place. Akrasia is subagent disagreement about what to do.