A self-experiment in training “noticing confusion”

I previously discussed the potential relevance of therapeutic and instructional models of metacognitive training to LW-style rationality skills. As an attempted concrete realization of what this connection could look like, I ran a self-experiment in which I counted instances of noticing confusion. Below I elaborate on the motivation and design of the experiment, then discuss some quantitative results and qualitative reflections.

Background

Self-monitoring as a treatment vehicle in cognitive-behavioral and related therapies can take many forms. In one (to my secondhand understanding), the patient is coached in noticing a physical or mental behavior by identifying examples of the behavior and heuristics for when to watch for it, and by examining the feeling of the behavior itself. This is accompanied by practice of coping strategies. The patient is instructed to count the occurrences of that behavior on zir own. This is ideally done with a “wrist counter,” which is always available, can be incremented with the press of a single button, and gives both tactile and visual feedback on being pressed.

The patient might, for example, count instances of acting on zir own initiative, or of having positive thoughts about zirself. In this case, tying the thought to the specific physical action of pressing the button, as well as watching a “score” go up, helps with the reward circuit for both noticing the thought and the content of the thought.

The patient could also count negative thoughts, engaging in bad habits, inappropriate “should” statements or other “cognitive distortions.” At first, so I’m told, the count will go up, as you get better at noticing; then (optimistically) back down over a few weeks, as your symptoms diminish. In this case, it’s important not to focus on the fact you’re doing something “bad.” Instead, try to reward noticing and dispelling the bad thing, or at least to reward noticing that you’re focusing on the bad thing rather than rewarding the noticing. (If all else fails, reward noticing that you’re focusing on failing to reward noticing that you’re focusing on the bad thing rather than rewarding the noticing. That should definitely do it, right?)

This seems doubly useful: not only are you practicing and rewarding the noticing skill, but in tying it to a physical action, you necessarily bring the noticed behavior to your conscious attention, so that you can deal with it deliberately. If you noticed yourself dismissing a compliment, you’d take that opportunity to point out to yourself that the dismissal is mostly evidence of your mental state, and only weakly of the compliment’s validity; you’d try to take the compliment at face value.

Design

I chose to implement a version of this for a personal version of noticing confusion. (I also considered noticing a mental flinch, noticing motivated reasoning, flagging beliefs for review, activating curiosity, welcoming bad news, being specific, and noticing others’ nonspecificity/​asking for examples. I decided to go for now with what would be most personally useful and the most frequent.) I’m using this counter widget on my phone’s home screen. It’s two button presses away at any time, and it shows me a nice big number. I also see it whenever I use my phone, which is good for scaffolding but bad for transfer—on one hand, I get reminders to pay attention to my mental processes, so I’m more likely to be able to practice the noticing skill; on the other, I might be inhibiting my learning to apply the skill without reminders. Since I could just keep using the counter if it helped, I didn’t worry too much about this.

The details of the fuzzy introspective rules for whether I get to count something as noticing confusion probably don’t matter so much, but the basic idea is this: If I notice an unresolved tension or conflict between things I believe, then I count it. I don’t count the related and also-crucial noticing that I simply don’t understand something—I have to identify a conflict. (I see “notice when I don’t understand something” as Level 0 of this skill. It’s also particularly easy to practice: just read something on an unfamiliar subject, and draw a question mark next to any specific thing you don’t understand. Ideally, revisit those marks later. Get in the habit of doing this for everything you read.) I don’t count confusions in retrospect—if I’ve already resolved a confusion by the time I bring it to conscious awareness and can press the button, then I don’t count it. That was a personally controversial call, but there’s another sense in which “noticing and resolving confusion” is simply a mode of thought that operates semi- or sub-consciously. I didn’t want to get bogged down in counting those, and this seemed like a simple rule to split the cases.

Thus, some non-examples (still worth noticing in their own right, and often leading to pinpointing a confusion):

  • I don’t understand that.

  • That doesn’t seem right.

  • That’s surprising. Wait, is it? I’m not really sure what I expected, now that I think about it.

And some examples of thoughts that I would count (by the way, these mental processes, like most, are mostly nonverbal for me, so don’t take this literally; I’m noticing a feeling like tension in the connections between concepts):

  • X conflicts with my understanding of Y because Z.

  • Why does that apply in case A but not case B?

  • I expected the graph to look like J, but it looks like L.

  • I don’t think the software usually gives me that message. [Hint: IT DOESN’T. DO NOT PROCEED.]

Before I began, I guessed that I encountered this kind of confusion several times a day, mostly in seminars, papers, textbooks, debugging, simulated data, and experimental data. I suspected that I already consciously notice many of them, but not all, and that increasing the catch rate would markedly improve how much understanding I got out of the above activities and perhaps prevent some expensive mistakes.

I attempted to keep my confusion-inducing workload constant by working the same number of hours every day. I also distributed my reading of textbooks/​papers and my talk attendance to give roughly constant combined time each day, although I’m not sure that those activities had a particularly different density of confusion from my ordinary work. I typically took a couple days a week off of cognitively demanding work, and this pattern is visible in the data, at least at first.

The night before starting the experiment, I ran myself through a couple-hour training exercise on a meaty-looking paper, expressly to pay attention to conflicts in my growing understanding of the result as well as tensions between the content of the paper and my background knowledge, following recommendations of instructional research on metacognition. This was already pretty satisfying and left me feeling good about my self-experiment. The challenge would be to see whether I could improve at spotting and pinning down my nagging doubts, and whether I could take this watchfulness beyond the more-studied domain of self-monitoring while reading. Both of these things seemed to happen.

Results

The quantitative results are promising, but not especially informative. There’s only so much I can say with a month’s worth of data points in such a non-rigorous self-experiment. As it turned out, my guess of “several times a day” was pretty good—for a good day, full of demanding work, which was what came to mind when guessing. In truth, there’s a lot more variation between days, which didn’t disappear as I got better at pressing the button: there’s a standard deviation of 2.85 counts for week 1, 2.81 counts for week 5, and 2.81 counts for all days.

Here’s what the data looks like, with a moving weekly average (thus accounting for the weekend effect) and moving weekly 1σ bounds (e.g. ± 2.85/​√7 for the first week):

By week 3, the weekly count has gone up by a standard deviation, and it stays there or higher for weeks 4 and 5. Again, I don’t want to lean too hard on these numbers—I wasn’t rigorously consistent about the amount and nature of my daily work or the rules for counting. Weeks 1 and 2 might have been bad weeks, so that the increase doesn’t represent a real improvement; there’s also room for my desire to have a better-looking LW post to have increased the counts. And there’s a little ambiguity about what I’m measuring: perhaps the increase in counting comes only from remembering to press the button, and there are plenty of other times when I notice confusions and consciously address them without identifying them as button-pressing candidates. My guess is that this isn’t the case—the increase seemed to come in the form of things I barely didn’t miss.

If I naïvely say that Week 1 establishes a true distribution for averaged weekly counts, then being more than 1σ above the mean for three weeks would have a probability of about p = (0.16)3 = 0.0041 if that true count distribution remained constant. I’m not going to do any more sophisticated analysis than that, since I don’t think the data really supports it. See this detailed comment by VincentYu. There’s also a barely-significant relationship with the previous night’s sleep duration (p = 0.043, +1 count per hour of sleep). If I adjust for this, the appearance of improvement still holds:

So sleep perhaps accounts for a small amount of random variation, and not the overall shift.

Finally, some qualitative reflections:

  • I feel like I gained more solid understanding of things and solved a lot of problems faster as a direct consequence of focusing on my feelings of confusion. Given that the counts went up, I suspect that things were understood and problems solved that wouldn’t have been at all had I not been doing this.

  • I occasionally found myself mentally searching for potential contradictions when I encountered new information. This is called either “cheating” or “mission accomplished.”

  • You might have noticed that I didn’t say anything about what to do after bringing confusion to conscious attention. It turns out that curiosity hijacks my brain once I pinpoint an apparent contradiction, far more so than when I simply notice that I don’t understand something. I do what I can to encourage that process.

  • I’m underconfident in the significance of my confusions. When I have a vague sense that something’s wrong, I’m often tempted to dismiss it as a weird fact about my brain, an uninteresting exception to a weak generalization, or something that would be resolved if I just did the math. But never in the course of this experiment did I count something that turned out to be unimportant.

  • At first, I didn’t seem to exercise this skill on days where I wasn’t doing cognitively demanding work, or when most of my work was not in an academic context (typically weekends). Over time, I began doing so more, although still less than on demanding academic days. This shows up in the disappearance of weekend dips in the data with time, and I think it’s a good sign concerning transfer.

  • A few weeks in, I began spontaneously recalling past instances of confusion, apparently on the strength of their connections to the feeling of being confused. Some of these I’d never resolved—I remembered a professor telling me years ago that the filamentary organization of galaxies had never been observed. That had sounded obviously wrong to me, but I’d just shrugged and moved on. The contradiction lay dormant in my mind until a week ago, when I took a minute to figure out that she was almost definitely talking about direct observation of intergalactic filaments. Depending on what counts (intergalactic? intracluster? visible/​dark matter?), that didn’t happen until 2012 or (provisionally) very recently. (That’s entirely irrelevant to my current work, but I thought it was interesting.)

  • I had one lucid dream during the past 5 weeks, and it explicitly began with noticing confusion in this sense. But that’s not very meaningful, since I ordinarily expect around one lucid dream in 8 weeks. It’s just as plausible to me that I began lucid-dreaming and then my brain made the connection to this experiment.

Conclusion

The quantitative results are promising, but for me, the qualitative lessons are more important—particularly my underconfidence and the possibility of using contradiction to fuel curiosity. I’ll keep counting confusions like this for a while, but I’m not going to worry much about experimental validity. Similarly, it doesn’t matter a whole lot to me whether the apparent gains rely on using the counter, since it costs me basically nothing to continue using it. I suppose that one could look into that by taking a break from counting and resuming it after a few months, but that’s honestly not my priority.

This is a really easy thing to try, and I’d like to encourage others to build on the simple attempt I’ve presented here.