System 2 as working-memory augmented System 1 reasoning

Kaj_SotalaSep 25, 2019, 8:39 AM

110 points

Dual Process Theory (System 1 & System 2)Subagents

The terms System 1 and System 2 were originally coined by the psychologist Keith Stanovich and then popularized by Daniel Kahneman in his book Thinking, Fast and Slow. Stanovich noted that a number of fields within psychology had been developing various kinds of theories distinguishing between fast/intuitive on the one hand and slow/deliberative thinking on the other. Often these fields were not aware of each other. The S1/S2 model was offered as a general version of these specific theories, highlighting features of the two modes of thought that tended to appear in all the theories.

Since then, academics have continued to discuss the models. Among other developments, Stanovich and other authors have discontinued the use of the System 1/System 2 terminology as misleading, choosing to instead talk about Type 1 and Type 2 processing. In this post, I will build on some of that discussion to argue that Type 2 processing is a particular way of chaining together the outputs of various subagents using working memory. Some of the processes involved in this chaining are themselves implemented by particular kinds of subagents.

This post has three purposes:

Summarize some of the discussion about the dual process model that has taken place in recent years; in particular, the move to abandon the System 1/System 2 terminology.
Connect the framework of thought that I have been developing in my multi-agent minds sequence with dual-process models.
Push back on some popular interpretations of S1/S2 theory which I have been seeing on LW and other places, such as ones in which the two systems are viewed as entirely distinct, S1 is viewed as biased and S2 as logical, and ones in which it makes sense to identify more as one system or the other.

Let’s start with looking at some criticism of the S1/S2 model endorsed by the person who coined the terms.

What type 1/type 2 processing is not

The terms “System 1 and System 2” suggest just that: two distinct, clearly defined systems with their own distinctive properties and modes of operation. However, there’s no single “System 1”: rather, a wide variety of different processes and systems are lumped together under this term. It is also unclear whether there is any single System 2, either. As a result, a number of researchers including Stanovich himself have switched to talking about “Type 1” and “Type 2” processing instead (Evans, 2012; Evans & Stanovich, 2013; Pennycook, Neys, Evans, Stanovich, & Thompson, 2018).

What exactly defines Type 1 and Type 2 processing?

A variety of attributes have been commonly attributed to either Type 1 or Type 2 processing. However, one criticism is that there is no empirical or theoretical support for such attributes to only occur with one type of processing. For instance, Melnikoff & Bargh (2018) note that one set of characteristics which has been attributed to Type 1 processing is “efficient, unintentional, uncontrollable, and unconscious”, whereas Type 2 processing has been said to be “inefficient, intentional, controllable and conscious”.

(Before you read on, you might want to take a moment to consider the extent to which this characterization matches your intuition of Type 1 and Type 2 processing. If it does match to some degree, you can try to think of examples which are well-characterized by these types, as well as examples which are not.)

They note that this correlation has never been empirically examined, and that there are also various processes in which attributes from both sets co-occur. For example:

Unconscious (T1) and Intentional (T2). A skilled typist can write sentences without needing to consciously monitor their typing, “but will never start plucking away at their keys without intending to type something in the first place.” Many other skills also remain intentional activities even as one gets enough practice to be able to carry them out without conscious control: driving and playing piano are some examples. Also, speaking involves plenty of unconscious processes, as we normally have very little awareness of the various language-production rules that go into our speech. Yet we generally only speak when we intend to.
Unconscious (T1) and Inefficient (T2). Unconscious learning can be less efficient than conscious learning. For example, some tasks can be learned quickly using a verbal rule which describes the solution, or slowly using implicit learning so that we figure out how to do the task but cannot give an explicit rule for it.
Uncontrollable (T1) and Intentional (T2). Consider the bat-and-ball problem: “A bat and a ball cost $1.10 in total. The bat costs $1 more than the ball. How much does the ball cost?” Unless they have heard the problem before, people nearly always generate an initial (incorrect) answer of 10 cents. This initial response is uncontrollable: no experimental manipulation has been found that would cause people to produce any other initial answer, such as 8 cents to 13 cents. At the same time, the process which causes this initial answer to be produced is intentional: “it is not initiated directly by an external stimulus (the question itself), but by an internal goal (to answer the question, a goal activated by the experimental task instructions). In other words, reading or hearing the bat-and-ball problem does not elicit the 10 cents output unless one intends to solve the problem.”

Regarding the last example, Melnikoff & Bargh note:

Ironically, this mixture of intentionality and uncontrollability characterizes many of the biases documented in Tversky and Kahneman’s classic research program, which is frequently used to justify the classic dual-process typology. Take, for example, the availability heuristic, which involves estimating frequency by the ease with which information comes to mind. In the classic demonstration, individuals estimate that more words begin with the letter K than have K in the third position (despite the fact that the reverse is true) because examples of the former more easily come to mind [107]. This bias is difficult to control – we can hardly resist concluding that more letters start with K than have K in the third position – but again, all of the available evidence suggests that it only occurs in the presence of an intention to make a judgment. The process of generating examples of the two kinds of words is not activated directly by an external stimulus, but by an internal intention to estimate the relative frequencies of the words. Likewise for many judgments and decisions.

They also give examples of what they consider uncontrollable (T1) but inefficient (T2), unintentional (T1) but inefficient (T2), as well as unintentional (T1) but controllable (T2). Further, they discuss each of the four attributes themselves and point out that they all contain various subdimensions. For example, people whose decisions are influenced by unconscious primes are conscious of their decision but not of the influence from the prime, meaning that the process has both conscious and unconscious aspects.

Type 1/Type 2 processing as working memory use

Rather than following the “list of necessary attributes” definition, Evans & Stanovich (2013) distinguish between defining features and typical correlates. In previous papers, Evans has generally defined Type 2 processing in terms of requiring working memory resources and being able to think hypothetically. On the other hand, Stanovich has focused on what he calls cognitive decoupling, which his work shows is highly correlated with fluid intelligence as the defining feature.

Cognitive decoupling can be defined as the ability to create copies of our mental representations of things, so that the copies can be used in simulations without affecting the original representations. For example, if I see an apple in a tree, my mind has a representation of the apple. If I then imagine various strategies of getting the apple—such as throwing a stone at the tree to knock the apple down—I can mentally simulate what would happen to the apple as a result of my actions. But even as I imagine the apple falling down from the tree, I never end up thinking that I can get the real apple down simply by an act of imagination. This because the mental object representing the real apple is decoupled from the apple in my hypothetical scenario. I can manipulate the apple in the hypothetical without those manipulations being passed on to the mental object representing the original apple.

In their joint paper, Evans & Stanovich propose to combine their models and define Type 2 processes as those which use working memory resources (closely connected with fluid intelligence) in order to carry out hypothetical reasoning and cognitive decoupling. In contrast, Type 1 reasoning is anything which does not do that. Various features of thought—such as being automatic and the other controlled—may tend to correlate more with one or the other type, but these are only correlates, not necessary features.

Type 2 processing as composed of Type 1 components

In previous posts of my multi-agent minds sequence, I have been building up a model of mind that is composed of interacting components. How does it fit together with the proposed Type 1/Type 2 model?

Kahneman in Thinking Fast and Slow mentions that giving the answer to 2 + 2 = ? is a System (Type) 1 task, whereas calculating 17 * 24 is a System (Type) 2 task. This might be starting to sound familiar. In my post on subagents and neural Turing machines, I discussed Stanislas Dehane’s model where you do complex arithmetic by breaking up a calculation into subcomponents which can be done automatically, and then routing the intermediate results through working memory. You could consider this to also involve cognitive decoupling: for instance, if part of how you calculate 17 * 24 is by first noting that you can calculate 10 * 24, you need to keep the original representation of 17 * 24 intact in order to figure out what other steps you need to take.

To me, the calculation of 10 * 24 = 240 happens mostly automatically; like 2 + 2 = 4, it feels like a Type 1 operation rather than a Type 2 one. But what this implies, then, is that we carry out Type 2 arithmetic by chaining together Type 1 operations through Type 2 working memory.

I do not think that this is just a special case relating to arithmetic. Rather it seems like an implication of the Evans & Stanovich definition which they do not mention explicitly, but which is nonetheless relatively straightforward to draw: that Type 2 reasoning is largely built up of Type 1 components.

Under this interpretation, there are some components which are specifically dedicated to Type 2 processes: things like working memory storages and systems for manipulating their contents. But those components cannot do anything alone. The original input to be stored in working memory originates from Type 1 processes (and the act of copying it to working memory decouples it from the original process which produced it), and working memory alone could not do anything without those Type 1 inputs.

Likewise, there may be something like a component which is Type 2 in nature, in that it holds rules for how the contents of working memory should be transformed in different situations—but many of those transformations happen by firing various Type 1 processes which then operate on the contents of the memory. Thus, the rules are about choosing which Type 1 process to trigger, and could again do little without those processes. (My post on neural Turing machines explicitly discussed such rules.)

Looking through Kahneman’s examples

At this point, you might reasonably suspect that arithmetic reasoning is an example that I cherry-picked to support my argument. To avoid this impression, I’ll take the first ten examples of System 2 operations that Kahneman lists in the first chapter of Thinking, Fast and Slow and suggest how they could be broken down into Type 1 and Type 2 components.

Kahneman defines System 2 in a slightly different way than we have defined Type 2 operations—he talks about System 2 operations requiring attention—but as attention and working memory are closely related, this still remains compatible with our model. Most of these examples involve somehow focusing attention, and manipulating attention can be understood as manipulating the contents of working memory to ensure that a particular mental object remains in working memory. Modifying the contents of working memory was an important type of production rule discussed in my earlier post.

Starting with the first example in Kahneman’s list:

Brace for the starter gun in a race.

One tries to keep their body in such a position that it will be ready to run when the gun sounds; recognizing the feel of the correct position is a Type 1 operation. Type 2 rules are operating to focus attention on the output of the system which outputs proprioceptive data, allowing Type 1 processes to notice mismatches with the required body position and correct them. Additionally, Type 2 rules are focusing attention on the sound of the gun, so as to more quickly identify the sound when the gun fires (a Type 1 operation), causing the person to start running (also a Type 1 operation).

Focus attention on the clowns in the circus.

This involves Type 2 rules which focus attention on a particular sensory output, as well as keeping one’s eyes physically oriented towards the clowns. This requires detecting when one’s attention/eyes are on something else than the clowns and then applying an internal (in the case of attention) or external (in the case of eye position) correction. As Kahneman offers “orient to the source of a sudden sound”, “detect hostility in a voice”, “read words on large billboards”, and “understand simple sentences” as Type 1 operations, we can probably say that recognizing something as a clown or not-clown and moving one’s gaze accordingly are Type 1 operations.

Focus on the voice of a particular person in a crowded and noisy room.

As above, Type 2 rules check whether attention is on the voice of that person (a comparison implemented using a Type 1 process), and then adjust focus accordingly.

Look for a woman with white hair.

Similar to the clown example.

Search memory to identify a surprising sound.

It’s unclear to me exactly what is going on here. But introspectively, this seems to involve something like keeping the sound in attention so as to feed it to memory processes, and then applying the rule of “whenever the memory system returns results, compare them against the sound and adjust the search based on how relevant they seem”. The comparison feels like it is done by something like a Type 1 process.

Maintain a faster walking speed than is natural to you.

Monitor the appropriateness of your behavior in a social situation.

Walking: Similar to the “brace for the starter gun” example, Type 2 rules keep calling for a comparison of your current walking speed with the desired one (a Type 1 operation), passing any corrections resulting from that comparison to the Type 1 system controlling your walking speed.

Social behavior: maintain attention on a conscious representation of what you are doing, checking it against various Type 1 processes which contain rules about appropriate and inappropriate behavior. Adjust or block accordingly.

Count the occurrences of the letter a in a page of text.

Focus attention on the letters of a text; when a Type 1 comparison detects the letter “a”, increment a working memory counter by one.

Tell someone your phone number.

After a retrieval of the phone number from memory has been initiated, Type 2 rules use Type 1 processes to monitor that it is said in full.

Park in a narrow space (for most people except garage attendants).

Keeping attention focused on what you are doing to allow a series of evaluations, mental simulations, and cached (Type 1) procedural operations determining how to act in response to a particular situation in the parking process.

A general pattern in these examples is that Type 2 processing can maintain attention on something as well as hold the intention to invoke comparisons to use as the basis for behavioral adjustments. As comparisons involve Type 1 processes, Type 2 processing is fundamentally reliant on Type 1 processing to be able to do anything.

Consciousness and dual process theory

Alert readers might have noticed that focusing one’s attention on something involves keeping it in consciousness, whereas the previous Evans & Stanovich definition noted that consciousness is not a defining part of the Type 1/Type 2 classification. Is this a contradiction? Probably not, since as remarked previously, different aspects of the same process may be conscious and unconscious at the same time.

For example, if one intends to say something, one may be conscious of the intention while the actual speech production happens unconsciously; once they say it and they hear their own words, an evaluation process can run unconsciously but output its results into consciousness. With “conscious” being so multidimensional, it doesn’t seem like a good defining characteristic to use, even if some aspects of it did very strongly correlate with Type 2 processing.

Evans (2012) writes in a manner which seems to me compatible with the notion of there being many different kinds of Type 2 processing, with different processing resources being combined according to different rules as the situation warrants:

The evidence suggests that there is not even a single type 2 system for reasoning, as different reasoning tasks recruit a wide variety of brain regions, according to the exact demands of the task [...].

I think of type 2 systems as ad hoc committees that are put together to deal with a particular problem and then disbanded when the task is completed. Reasoning with abstract and belief-laden syllogisms, for example, recruits different resources, as the neural imaging data indicate: Only the latter involve semantic processing regions of the brain. It is also a fallacy to think of “System 2” as a conscious mind that is choosing its own applications. The ad hoc committee must be put together by some rapid and preconscious process—any feeling that “we” are willing and choosing the course of our thoughts and actions is an illusion [...]. I therefore also take issue with dual-process theorists [...] who assign to System 2 not only the capacity for rule-based reasoning but also an overall executive role that allows it to decide whether to intervene upon or overrule a System 1 intuition. In fact, recent evidence suggests that while people’s brains detect conflict in dual-process paradigms, the conscious person does not.

If you read my neural Turing machines post, you may recall that I noted that the rules which choose what becomes conscious operate below the level of conscious awareness. We may have the subjective experience of being able to choose what thoughts we think, but this is a post-hoc interpretation rather than a fact about the process.

Type 1/Type 2 and bias

People sometimes refer to Type 1 reasoning as biased, and to Type 2 reasoning as unbiased. But as this discussion should suggest, there is nothing that makes one of the two types intrinsically more or less biased than the other. The bias-correction power of Type 2 processing emerges from the fact that if Type 1 operations are known to be erroneous and a rule-based procedure for correcting them exists, a Type 2 operation can be learned which implements that rule.

For example, someone familiar with the substitution principle may know that their initial answer to a question like “how popular will the president be six months from now?” comes from a Type 1 process which actually answered the question of “how popular is the president right now?”.

They may then have a Type 2 rule saying something like “when you notice that the question you were asked is subject to substitution effects, replace the initial answer with one derived from a particular procedure”. But this still requires a) a Type 1 process recognizing the situation as one where the rule should be applied b) knowing a procedure which provides a better answer c) the cue-procedure rule having been installed previously, itself a process requiring a number of Type 1 evaluations (about e.g. how rewarding it would be to have such a rule in place).

There is nothing to say that somebody couldn’t learn an outright wrong Type 2 rule, such as “whenever you think of 2+2 = 4, substitute your initial answer of ‘4’ with a ‘5’”.

Often, it is also unclear of what the better Type 2 rule even should be. For instance, another common substitution effect is that when someone is asked “How happy are you with your life these days?”, they actually answer the question of “What is my mood right now?”. But what is the objectively correct procedure for evaluating your current happiness with life?

On the topic of Type ¹⁄₂ and bias, I give the final word to Evans (2012):

One of the most important fallacies to have arisen in dual-process research is the belief that the normativity of an answer [...] is diagnostic of the type of processing. Given the history of the dual-process theory of reasoning, one can easily see how this came about. In earlier writing, heuristic or type 1 processes were always the “bad guys,” responsible for cognitive biases [...]. In belief bias research, authors often talked about the conflict between “logic” and “belief,” which are actually dual sources, rather than dual processes. Evans and Over [...] defined “rationality2” as a form of well-justified and explicit rule-based reasoning that could only be achieved by type 2 processes. Stanovich [...] in his earlier reviews of his psychometric research program emphasized the association between high cognitive ability, type 2 processing and normative responding. Similarly, Kahneman and Frederick [...] associate the heuristics of Tversky and Kahneman with System 1 and successful reasoning to achieve normatively correct solutions to the intervention of System 2.

The problem is that a normative system is an externally imposed, philosophical criterion that can have no direct role in the psychological definition of a type 2 process. [...] if type 2 processes are those that manipulate explicit representations through working memory, why should such reasoning necessarily be normatively correct? People may apply the wrong rules or make errors in their application. And why should type 1 processes that operate automatically and without reflection necessarily be wrong? In fact, there is much evidence that expert decision making can often be well served by intuitive rather than reflective thinking [...] and that sometimes explicit efforts to reason can result in worse performance [...].

Reasoning research somewhat loads the dice in favor of type 2 processing by focusing on abstract, novel problems presented to participants without relevant expertise. If a sports fan with much experience of following games is asked to predict results, he or she may be able to do so quite well without need for reflective reasoning. However, a participant in a reasoning experiment is generally asked to do novel things, like assuming some dubious propositions to be true and deciding whether a conclusion necessarily follows from them. In these circumstances, explicit type 2 reasoning is usually necessary for correct solution, but certainly not sufficient. Arguably, however, when prior experience provides appropriate pragmatic cues, even an intractable problem like the Wason selection task becomes easy to solve [...], as this can be done with type 1 processes [...]. It is when normative performance requires the deliberate suppression of unhelpful pragmatic cues that higher ability participants perform better under strict deductive reasoning instructions [...].

Hence, [the fallacy that type 1 processes are responsible for cognitive biases and type 2 processes for normatively correct reasoning] is with us for some fairly precise historical reasons. In the traditional paradigms, researchers presented participants with hard, novel problems for which they lacked experience (students of logic being traditionally excluded), and also with cues that prompted type 1 processes to compete or conflict with these correct answers. So in these paradigms, it does seem that type 2 processing is at least necessary to solve the problems, and that type 1 processes are often responsible for cognitive biases. But this perspective is far too narrow, as has recently been recognized. In recent writing, I have attributed responsibility for a range of cognitive biases roughly equally between type 1 and type 2 processing [...]. Stanovich [...] similarly identifies a number of reasons for error other than a failure to intervene with type 2 reasoning; for example, people may reason in a quick and sloppy (but type 2) manner or lack the necessary “mindware” for successful reasoning.

Summary and connection to the multiagent models of mind sequence

In this post, I have summarized some recent-ish academic discussion on dual-process models of thought, or what used to be called System 1 and System 2. I noted that the popular conception of them as two entirely distinct systems with very different properties is mistaken. While there is a defining difference between them—namely, the use of working memory resources to support hypothetical thinking and cognitive decoupling—they seem to rather refer to differences in two types of thought, either of which may use very different kinds of systems.

It is worth noting at this point that there are many different dual-process models in different parts of psychology. The Evans & Stanovich model which I have been discussing here is intended as a generalized model of them, but as they themselves (2013) write:

… we defend our view that the Type 1 and 2 distinction is supported by a wide range of converging evidence. However, we emphasize that not all dual-process theories are the same, and we will not act as universal apologists on each one’s behalf. Even within our specialized domain of reasoning and decision making, there are important distinctions between accounts. S. A. Sloman [...], for example, proposed an architecture that has a parallel-competitive form. That is, Sloman’s theories and others of similar structure [...] assume that Type 1 and 2 processing proceed in parallel, each having their say with conflict resolved if necessary. In contrast, our own theories [...] are default-interventionist in structure [...]. Default-interventionist theories assume that fast Type 1 processing generates intuitive default responses on which subsequent reflective Type 2 processing may or may not intervene.

In previous posts of the multi-agent models of mind sequence, I have been building up a model of the mind being built up of a variety of subsystems (which might in some contexts be called subagents).

In my discussion of Consciousness and the Brain, I summarized some of its conclusions as saying that:

The brain has multiple subagents doing different things; many of the subagents do unconscious processing of information. When a mental object becomes conscious, many subagents will synchronize their processing around analyzing and manipulating that mental object.
The collective of subagents can only have their joint attention focused on one mental object at a time.
The brain can be compared to a production system, with a large number of subagents carrying out various tasks when they see the kinds of mental objects that they care about. E.g. when doing mental arithmetic, applying the right sequence of mental operations for achieving the main goal.

In Building up to an Internal Family Systems model, I used this foundation to discuss the IFS model of how various subagents manipulate consciousness in order to achieve various kinds of behavior. In Subagents, neural Turing machines, thought selection, and blindspots, I talked about the mechanistic underpinnings of this model and how processes like thought selection and firing of production rules might actually be implemented.

What had been lacking so far was a connection between these models and the Type 1/Type 2 typology. However, if we take something like the Evans & Stanovich model of Type 1/Type 2 processing to be true, then it turns out that our discussion has been connected with their model all along. Already in “Consciousness and the Brain”, I mentioned the “neural Turing machine” passing on results from one subsystem to another through working memory. That, it turns out, is the defining characteristic of Type 2 processing—with Type 1 processing simply being any process which does not do that.

Under this model, then, Type 2 processing is a particular way of chaining together the outputs of various Type 1 subagents using working memory. Some of the processes involved in this chaining are themselves implemented by particular kinds of subagents.

References

Evans, J. S. B. T. (2012). Dual process theories of deductive reasoning: facts and fallacies. The Oxford Handbook of Thinking and Reasoning, 115–133.

Evans, J. S. B. T., & Stanovich, K. E. (2013). Dual-Process Theories of Higher Cognition: Advancing the Debate. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 8(3), 223–241.

Melnikoff, D. E., & Bargh, J. A. (2018). The Mythical Number Two. Trends in Cognitive Sciences, 22(4), 280–293.

Pennycook, G., Neys, W. D., Evans, J. S. B. T., Stanovich, K. E., & Thompson, V. A. (2018). The Mythical Dual-Process Typology. Trends in Cognitive Sciences, 22(8), 667–668.

What links here?