Epistemic Status: I’m neither a neuroscientist nor an ML researcher, but am trying to figure out “what kinds of human thought are actually possible to replicate on silicon right now?”.

Here’s my best guess of how human cognition works. Please tear it apart!

When I looked at GPT-2 last year, I thought: “Huh, when I look at my own thought process… I could summarize most of what I’m doing as: ‘predict the next thing I’d say using crude concept association, and then say it.’”

Meanwhile, Jeff Hawkins says “Every part of the neocortex is running the same algorithm”, and it’s looking like maybe brains aren’t doing that complicated a set of things.

Am I just GPT-2?

This was an obvious question to ask, but I haven’t seen anyone write it up the question in detail.

I asked around. One mathematician friend said “I agree most people are doing GPT-style thinking where they regurgitate and recombine concepts from their neighbors. But, you can’t get civilization from just that. Some people need to have model-based thinking.”

Another mathematician friend agreed and added: “Young math students who try to prove theorems often do it GPT-style – they ramble their way through a bunch of math buzzwords and try to assemble them without understanding the structure of how they fit together. But, actual math proofs require clear understanding. You can’t just ‘predict the next word’”

I agree there is something additional going on here that gets you to formal math proofs and building skyscrapers. But… I don’t think it’s all that much more.

This post has three parts:

Lay out the cognitive algorithm I personally seem to be following
Outline how I think that algorithm developed
Work through some examples of how I do “advanced thinking” (i.e. the sort of thinking that might literally advance the sum of human knowledge), and doublecheck if there are any surprising elements

My algorithm, as I understand it

Even when I’m developing novel concepts, or thinking through effortful procedures, most of my thinking following the same basic algorithm of:

A. Find the Next “Good Enough” Thought

My subconscious finds some nearby concepts that are associated with my previous thought-chunk
If a “good enough” thought appears, think that thought, and then repeat. (“good enough” means “I feel optimistic about the next thought in the chain leading somewhere useful”)
If a “not obviously good enough” thought appears, check the next few associated concepts and see if they’re good enough.
If none of the nearby concepts seem good enough, either give up and switch topics, or conduct an effortful search. This usually involves feeling stuck for awhile, spending willpower or getting a headache. Eventually I either:
- a) find a good enough concept for my next thought, and proceed
- b) find a better search algorithm. (Still basically “find a good enough concept”, except it’s not actually going to help me directly. Instead, I’ll think something like “make a list of possibly hypotheses”, or “search on google”, or “ask my friend who knows about X”, and then begin doing that.)

B. Check for Badness

While doing all this, there’s a followup processing that’s periodically checking “was a recent thought-chunk somehow bad?”.
- Is this sloppy thinking that a respected rationalist would give me a disapproving look for?
- Is this thoughtcrime that my tribe punish me for thinking?
- Does it “smell bad” somehow, such that if I built a whole series of concepts off of this next thought-chunk, the result would be a flimsy construction? (i.e. bad code smell)
If the thought seems maybe-bad, start associating towards concepts that help crystalize whether it’s bad, or fix the badness

There’s a few other things going on – I’m storing concepts in working memory, and sometimes in mood, which shape which other concepts are easily accessible. I’m sometimes using concepts that initiate chains, where I’ll think “oh, I’m supposed to do algebra here. What’s the first step of algebra?” and then the first step associates to the second step. But these parts seem like something I wouldn’t be too surprised if GPT-2 developed on its own, or some equivalent version of.

Almost all of that condenses down to “find nearby associated concepts” and “direct my attention to more distant associated concepts.”

(My understanding of this is based on the Tuning Your Cognitive Algorithms exercise, where you solve problems mindfully, paying lots of attention to what your brain seems to be doing on the sub-second timescale)

How far removed from that is GPT-2?

First, I’m not making any claims about the exact structuring of the learning algorithm. My understanding is that there’s a few different neural network architectures that are more optimal for different kinds of processing (i.e. convolutional nets for image processing).

Some people have responded to my “what if all thought boils down to simple associations?” questioning with “but, model based learning!”. I agree that model based learning is a thing, but it’s not obvious to me that GPT-2 doesn’t have it, at least to some degree.

Second, a key thing GPT-2 is missing is the “check for badness” aspect. After predicting a word, AFAIK there’s nothing that later punishes GPT-2 for thinking sloppily, or rewards it for doing something particularly great, which means it can’t learn things like “You’re supposed to generate multiple hypotheses before getting attached to the first one” and then deliberately apply them.

It probably also takes longer to learn things. (I don’t actually know for sure how either GPT-2 or other leading language generators are rewarded. Has anyone done anything like “Train a neural net on Reddit, where it’s somehow separately rewarded for predicting the next word, and also for predicting how much karma a cluster of words will get, and somehow propagating that back into the language generation?”)

From Toddlers to Software Architects

How might the algorithm I described above develop in humans?

Step 1: Toddlers and Stoves

Toddlers have little longterm planning. If they see a bright red stove, they might think “shiny object!” and their little GPT processes think “what are some things that might come next?” and one of them is “touch the stove” and one of them is “look at it intently” and a third is “shrug and wander off”. They pick “touch the stove” and then OUCH.

After a few iterations, they reach a point where, when they hypothesize “maybe the next action should be ‘touch the stove’”, they get a little flash of “but, two steps later, it will hurt, and that will be bad.”

One way to conceive of this is “GPT-style, but you predict two words ahead instead of one.”

But I don’t think that’s right. I think it’s more like: “GPT style, but thinking certain thoughts brings up associations, and some associations just directly change the likely next actions. i.e. you think “touch the stove!” and then you think “ow!” and then, “ow” is treated as an incorrect end to the sentence of the narrative you’re constructing. So you don’t do the “touch stove” action.

Eventually this is cached into the System 1 GPT system such that “touch the stove” has a low predictive weight of “thing you might do”, and it doesn’t even come up any more.

Step 2: Toddlers and Mom Yelling

The first time Billy the Toddler came upon a hot stove, he reached out to touch it, and beforehand, Mom yelled “Billy don’t touch that!”

And, possibly, Billy touched it anyway. And then he learned “ow!” and also learned that “Mom yells” is something that correlates with “ow!”, which propagates back into his model of what sort of actions are good next-actions-to-take.

Or, previously, perhaps Billy had done some less immediately painful thing – perhaps walking into the street. Mom yells at him. He ignores her. A nearby car slows down, and doesn’t hit him, so he doesn’t learn “street ==> cars ==> bad”. But, his Mom then runs and grabs him and pulls him away from the street, which is kinda painful. So he does gain the “Mom yelling ==> bad” association (as well as the “Walk into street” ==> “Mom Yelling” association).

Eventually Mom Yelling is treated as a failure state in the “predict which action I should take next” function.

Step 3: Internalized Mom Yelling

A couple years later, Billy periodically comes across novel situations – perhaps a wild animal in his backyard. This might remind him of similar situations where Mom Yelled in the past. By now, Billy doesn’t need to hear Mom yell at him directly, he’s able to think “Cool situation! Take Action” ==> “Hmm, Mom may yell at me later” ==> “Failure state” ==> “Okay, back up a step, take a different action.”

And eventually this voice gets internalized as some kind of conscience/morality/guide, which doesn’t even need to be physically present or temporally proximate to be relevant.

You could model this as “GPT style thinking, but predicting multiple steps down the line instead of just one or two.” But, I think this doesn’t match my internal experience. It’s often many steps down the line that a bad thing would happen to me, that I need to avoid. More steps than I could feasibly be modeling.

I think the direct-association...

Previous chunk: “notice dangerous situation”
Next chunk: “association with mom yelling” (evaluates to “low predicted reward”)

...is simpler to execute than:

Previous chunk: “notice dangerous situation”
Next chunk 1: “go play in dangerous situation”
Next chunks 2 − 10: Do a bunch of steps in the dangerous situation”
Chunk N: Mom eventually finds out and yells (evaluates to “low predicted reward”)

Step 3: Internalized Punishment and Reward for Types of Cognition

Eventually Billy gains some internalized concept of “X is bad”, which can be directly associated with various inputs.

Social Shame

For me, Doing X Would be Bad is often social shaped. For example, I often go to write some crappy code by randomly cobbling some things together. And then I get a visceral image of my coworkers complaining at me, saying “Ray! Use your brain! What would good code look like here?” and then I say “sigh… fine”, and then I boot up the associations about what good code looks like and how to construct it.

Or, I’m debugging, and randomly changing things until it works or adding console.log statements to hope they’ll reveal something obvious. And then my shoulder-angel-coworker pops up and says “Man, Ray, that is not how to debug code. Think!” and then I pull up my “what does actually debugging for real look like?” associations, and see what next-actions they pull up for me to consider, and then I do one of those.

(In this case, next-action-chunks include things like “make a list of what I know to be true about the code” and “check what the inputs to this function are and where they came from”, which at my current skill level feel like atomic actions.)

Internalized Taste

A different way some people work (including myself in some domains) is less “social” and more “aesthetic.” Good coders develop “bad code smell”, and (I’d guess in the case of debugging) “bad habits smell”, where if they find themselves debugging by randomly changing things, they think “obviously this is stupid and inefficient”, and then seek out the associated next-actions that are more helpful.

Step 4: Strengthened “Good” Habits

Eventually, you streamline the steps of “notice a problem” ⇒ “consider a bad strategy to solve the problem” ⇒ “feel bad” ⇒ “find a good thing to do” ⇒ “do that”, and go directly to “have a problem” ⇒ “use a good solution to the problem”.

And this gets distilled into increasingly streamlined chunks, until master-level artisans do lots of sophisticated techniques that get bucketed into a single action.

But, really, what about deep planning and models and creativity?

As mentioned earlier, I do some complicated thought via chains-of-association. For example:

Notice that I’ve been focusing on only one hypothesis
Remember that I should be looking for alternate hypotheses
Think “hmm, how do I get more hypotheses?”
Start listing out half-remembered hypotheses that vaguely feel connected to the situation.
Realize that listing random half-remembered hypotheses isn’t a very good strategy
Remember that a better strategy might be to make a list of all the facts I know about the phenomenon I’m investigating (without exactly remembering why I believe this)
Make that list of facts (using either my working memory, or augmented memory like a notebook)
For each relevant fact, ask myself “what must be true given this fact?” (“ah”, I now think. “This is why it’s useful to list out true facts. I can then conduct a search for other dependent facts, which builds up a useful web of associations”)
Use the list of facts and consequences to generate a new, more focused set of hypotheses

This does involve a mixture of System 1 and System 2 thinking (where system 2 involves slower, more laborious use of working memory and considering different options). But it’s still mostly composed a bunch of atomic concepts.

Sarah Constantin’s Distinctions in Types of Thought explores the possibility that deep neural nets basically have the “effortless System 1” type thinking, without being good at the slower, deliberate System 2 style thinking. I wouldn’t be that surprised if GPT-2 was “only” a System 1. But I also wouldn’t be that surprised if it naturally developed a System 2 when scaled up, and given more training. I also wouldn’t be that surprised if it turned out not to need a System 2.

What’s happening in System 2 thought?

The genre of this post is now going to abruptly switch to “Raemon narrates in realtime his thought process, as he tries to doublecheck what actually is going on in his brain.”

Okay, I want to demonstrate System 2 thought. What are some examples of System 2 thought?

(Spins gears for a few minutes, unproductively)

Suddenly I remember “Ah, the bat/baseball problem is a classic System 2 problem.” If I’m asked: “A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?”, what actually is going on in my head?

First, I think “Obviously the answer is ’10c’”

Then, I think “Wait. no I know this problem, I think the answer is 5c” (determined via memory). But, I’m writing a blogpost right now where I’m trying to articulate what System 2 thought is like, and it would be helpful to have a real example. What if I rearranged the numbers in the problem and force myself to solve it again?

New Problem: “A bat and a ball cost $2.25 in total. The bat costs $2.00 more than the ball. How much does the ball cost?”

Great. Now the obvious answer is 25c, and that’s probably wrong. Okay, how do I actually solve this? Okay, boot up my laborious arithmetic brain.

My first impulse is to subtract $2.00… wait that’s literally the first thing my brain did.

Okay, what kind of math is this?

It’s algebra I think?

X + Y = 2.25

X + (X + 2) = 2.25

2X + 2 = 2.25

X + 1 = 1.125

X = .125

Is that right? I think so. I don’t actually care since the goal here was to doublecheck what my internal thought process is like, not get a right answer.

I notice that once I remembered to Do Actual Math, I mostly used quick associations rather than anything effortful, which feels more GPT-2 style. I think in elementary school those steps would have each been harder.

The more interesting part here was not the “how to solve the bat/baseball” part, but how to find an actual good example of System 2 thinking part. That felt quite effortful. I didn’t have any immediate associations, so I was conducting a search, and moreover the search process wasn’t very effective. (I think much of advanced cognition, and the Tuning Your Cognitive Algorithms process, is about figuring out what techniques enable you to more effectively search when you don’t have an obvious search algorithm)

What Other Kinds of Thought Are There?

I notice this whole section is downstream of a feeling, where I have noticed that I haven’t actually tried to comprehensively answer “is all of my thought processes explainable via predict-the-next-thing-then-do-it?”. I have a nagging sense of “if I post this, there’s a good chance someone will either poke a hole in it, or poke a hole in my generating thought process. Mama Rationalist is going to yell at me.”

An obvious thing to do here is list out the types of advanced thinking I do, and then check how each one actually works by actually doing it.

I just did “elementary school math.” What are some others?

1. Creatively combine existing concepts

I think this is how most novel ideas form. I think GPT-2 does very basic versions of this, but I haven’t seen it do anything especially impressive.

One concrete algorithm I run is: “put two non-associated words next to each other, and see how they compile.” An example of an upcoming blogpost generated this way is “Integrity Debt”, which was born by literally taking the phrase “Technical Debt”, swapping in “Integrity”, and then checking “what does this concept mean, and is it useful?”

More often, I do a less intentional, fuzzier version of this where multiple concepts or sensory experiences get meshed together in my mind. Abram recounts a similar experience in Track Back Meditation

At one point a couple of years ago, I noticed that I was using a particular visual analogy to think about something, which didn’t seem like a very good analogy for what I was thinking about. I don’t recall the example, but, let’s say I was using a mental image of trees when thinking about matrix operations. I got annoyed at the useless imaginary trees, and wondered why I was imagining them. Then, I noticed that I was physically looking at a tree! This was fairly surprising to me. Some of the surprise was that I took a random object in my visual field to use for thinking about something unrelated, but more of the surprise was that I didn’t immediately know this to be the case, even when I wondered why I was imagining trees.
After I noticed this once, I started to notice it again and again: objects from my visual field end up in my imagination, and I often try to use them as visual analogies whether they’re appropriate or not. It quickly became a familiar, rather than surprising, event. More interestingly, though, after a while it started to seem like a conscious event, rather than an automatic and uncontrollable one: I’ve become aware of the whole process from start to finish, and can intervene at any point if I wish.

2. Figure out what do do, for a problem where none of my existing associations are relevant enough to help solve it.

This is just actually pretty hard. Even Newton had to get hit on the head with the apple. Archimedes had to sit in the bathtub. Most human thought just isn’t very original and incrementally advances using known associations.

I think most of the “good thought strategy” here involves figuring out efficient ways of exposing yourself to new concepts that might help. (This includes scholarship, getting a wider variety of life experience, and actually remembering to take relaxing baths from time to time)

I think there is a teeny fraction of this that looks like actually babbling entirely novel things at random (sometimes in a semi-directed fashion), and then seeing if they point in a useful direction. This is hard because life is high dimensional. Anyone who actually succeeds at this probably had to first develop a sense of taste that is capable of processing lots of details and get at least a rough sense of whether a sniff of an idea is promising.

3. Do advanced math that is on the edge of my current skills, where every step is novel.

I think this mostly involves repeatedly querying “what are the actual steps here?”, and then applying some effortful directed search to remember the steps.

(I notice my shoulder Mathematician Friend just said “THAT’S NOT WHAT REAL MATH IS. REAL MATH IS BEAUTIFUL AND INVOLVES UNDERSTANDING CONCEPTS DEEPLY OR SOMETHING SOMETHING SOMETHING IDK MY SHOULDER MATHEMATICIAN ISN’T VERY HIGH RESOLUTION”)

Speaking of which...

4. Mulling over concepts until I understand them deeply, have an epiphany, and experience a ‘click’.

There’s a particular sensation where I finally map out all the edges of a fuzzy concept, and then have a delightful moment when I realize I fully understand it. Prior to this, I have a distinctively uncomfortable feeling, like I’m in a murky swamp and I don’t know how to get out.

I think this is the process by which a complicated, multi-chunk concept gets distilled into a single chunk, allowing it to take up less working memory.

5. Procedural Knowledge (i.e. Riding Bicycles)

I think this is actually mostly the same as #4, but the concepts are often differently shaped – physical awareness, emotional attunement, etc. It doesn’t come with the same particular “click” qualia that intellectual concepts have for me, but there is often a “suddenly this is easy and feels effortless” feeling.

How Do You Think?

So those are all the ways that I think, that I can easily remember. How do you do your thinking? Are there any pieces of it that I missed here? (I’m wondering how much variety there is in how people think, or experience thought, as well as whether I missed some of my own thinking-types)

Implications for AI

My actual motivation here was to get a sense of “Are there any major roadblocks for human level intelligence, or do we basically have all the pieces?” (While I wrote this post, a couple other posts came out that seemed to be exploring the same question)

My current sense is that all the most advanced thinking I do is made of simple parts, and it seems like most of those parts roughly correspond to facets of modern ML research. This wouldn’t mean AGI is right around the corner – my understanding is that when you zoom into the details there’s a fair amount of finicky bits. Many of the most interesting bits aren’t using the same architecture, and integrating them into some kind of cohesive whole is a huge, multi-step project.

But, it seems like most of the remaining work is more like “keep plugging away at known problems, improve our ability to efficiently process large amounts of information, and incrementally improve our ability to learn in general ways from smaller bits of information”.

This post doesn’t really attempt to make the “AGI is near” case, because I’m not actually an ML researcher nor even a competent hobbyist. I think seriously investigating that is another post, written by someone who’s been paying much closer attention than me.

For the immediate future, I’m interested in various LW contributors answering the question “How do you think you think, and why do you think you think that way?”

What’s Your Cognitive Algorithm?