What Do We Know About The Consciousness, Anyway?

Epistemic Status: Speculations on top of well established theories.


As the conventional memetic wisdom goes, “after decades and decades of research, the science sill have no idea how the human mind works”. As anyone passingly familiar with the relevant fields of science knows, this is of course a load of bull.

However, even in the works writing specifically about the human mind as a whole, it’s common to treat the consciousness itself with not a small degree of reverence. “Yeah sure, my work explains how people perceive the world or how we deceive ourselves or what driven people to evolve so smart or whatelse, but that’s just the small stuff. The big questions of the nature of consciousness are not for my humble self to answer and generally it’s a mysterious miracle shrouded in the veil of enigma, probably best left to philosophers”. The reasons for this are not hard to see—explaining the nature of consciousness is a status claim nobody can possibly make with impunity. Also, everyone has consciousness so everyone thinks they know how it works and will argue with you, and the philosophers would be unhappy to say the least if the hard sciences would’ve taken yet another loaf of bread from them.

[Here and throughout the post I’m using consciousness to mean “self-awareness in the sense of being aware of one’s own awareness”, ability to introspect one’s own mind, which separates the humans from all or almost all other animals. If this still sounds ambiguous and ill-defined, hopefully the specific meaning will become clearer upon reading the parts 3 and 4, for now bear with me.]

But with the amount of progress done in neuroscience, psychology, AI and other relevant fields of study, one would expect hard sciences have at least something relevant to say on the nature of consciousness. And in fact I will argue in this post that certain theories fit together very nicely and provide a quite satisfying explanation of what exactly the hell self-awareness is and why it exists.

The theories in question are the manifold hypothesis (see part 1), predictive processing (part 2) and Hanson’s “Elephant in the Brain” theory (part 3), and I’ve never seen them brought together before in this fashion (or in any other for that matter). Which is of course not to say nobody’s come up with this idea before, just I haven’t heard of it. I have seen somewhat similar explanation for consciousness not involving any of these theories mentioned once briefly in a rationalists slack channel, but that’s pretty much it.

So after spending some time myself poking holes in this idea and discussing it with a couple of smart people, I’ve decided to make it into a post on LW [my first post here by the way] to get broader help in figuring out whether any of this makes sense and/​or if somebody has thought of it before.

Now, to get started, let’s start with something simple and widely acknowledged: people have thoughts. And those thoughts generally model the real world in some way. But how? If you go from “thought” down the abstraction levels you’ll eventually get to “pattern of neurons firing”, but what lies in between?

1. Manifolds

The manifold hypothesis, which has been tested a number of times and is generally considered to be true, states that objects in the extremely high dimensional data of the real world are contained within [relatively] lower-dimension manifolds. I assume that most people here are familiar with it to some degree, if not here’s a short explanation. And it seems to me that this idea matches perfectly to our common-sense concept of an “object in abstract”, of “general notion” of an object, as opposed to any specific instance.

Lets take the classic ML example of a cat. You can think of a cat in general, without it being any specific cat. If I show you a closed box and tell there’s a cat in it, it vastly narrows down your choice of all the possible collections of atoms that could fit in that box. If I add just a few more variables (age, weight, fur coloration and type, eye color) you’ll be able to picture it quite accurately—because just knowing that it’s a cat already reduced your search space to a small subspace of much lower dimensionality, compared with the space of all possible things which can fit in the box. And conversely, for the vast majority of real-world objects, given a reasonable amount of data you’ll be able to tell confidently whether it is, or isn’t, a cat.

Of course, these manifolds are incredibly fuzzy at the boundaries—just like the tests have shown for the manifold hypothesis. There’s plenty of things out there which you can argue is or isn’t a cat—like those Savannah cats, or a dead cat, or a cat’s embryo. And that in addition to the fact that the verbal handle “cat” we’re using to refer to the manifold is ambiguous—do you mean to include, e.g. an image or figure of a cat? But generally the world does come in chunks it can be carved into.

So what presumably happens in the brain (and a sufficiently powerful NN) is that the high-dimensional input is reduced, turned, twisted and scaled to map it into the lower dimensional space where the manifolds are easy to delineate. At which point you stop seeing an amorphous blob of paint and think—aha, that’s a cat! How does the brain learn how to do it? Where does it get labels for cats and dogs and everything else? Well, it seems this question has also been answered.

2. Predictive Processing

For anyone curious how the brain works—in any meaning of this sentence—I truly can’t recommend “Surfing Uncertainty” by Andy Clark high enough. At least you should read Scott Alexander’s brilliant review of the book, and then decide for yourself. The book describes the predictive processing theory, which is, to quote Scott “a theory of how the brain works – a real unifying framework theory like Darwin’s or Einstein’s – and it’s beautiful and it makes complete sense.”

Since I totally agree with this judgement, everything below is written under the assumption that the theory is broadly true, at least as far as information processing is concerned. (When it comes to how motivation, the predictive processing account is much less convincing, but I think it’s fine and as per author’s own acknowledgement that part is not necessary for the theory to work). The rest of the post is not guaranteed to make sense if you haven’t read at least the review on SSC. But for those who haven’t read it and don’t want to go through the whole thing now, here’s a few paragraphs to give some idea—I’m quoting Scott, so you can take a break and read the writing of someone who’s actually good at it:

The key insight: the brain is a multi-layer prediction machine. All neural processing consists of two streams: a bottom-up stream of sense data, and a top-down stream of predictions. These streams interface at each level of processing, comparing themselves to each other and adjusting themselves as necessary.

The bottom-up stream starts out as all that incomprehensible light and darkness and noise that we need to process. It gradually moves up all the cognitive layers that we already knew existed – the edge-detectors that resolve it into edges, the object-detectors that shape the edges into solid objects, et cetera.

The top-down stream starts with everything you know about the world, all your best heuristics, all your priors, everything that’s ever happened to you before – everything from “solid objects can’t pass through one another” to “e=mc^2” to “that guy in the blue uniform is probably a policeman”. It uses its knowledge of concepts to make predictions – not in the form of verbal statements, but in the form of expected sense data. It makes some guesses about what you’re going to see, hear, and feel next, and asks “Like this?” These predictions gradually move down all the cognitive layers to generate lower-level predictions. If that uniformed guy was a policeman, how would that affect the various objects in the scene? Given the answer to that question, how would it affect the distribution of edges in the scene? Given the answer to that question, how would it affect the raw-sense data received?

As these two streams move through the brain side-by-side, they continually interface with each other. Each level receives the predictions from the level above it and the sense data from the level below it. Then each level uses Bayes’ Theorem to integrate these two sources of probabilistic evidence as best it can. This can end up a couple of different ways.

First, the sense data and predictions may more-or-less match. In this case, the layer stays quiet, indicating “all is well”, and the higher layers never even hear about it. The higher levels just keep predicting whatever they were predicting before.

Second, low-precision sense data might contradict high-precision predictions. The Bayesian math will conclude that the predictions are still probably right, but the sense data are wrong. The lower levels will “cook the books” – rewrite the sense data to make it look as predicted – and then continue to be quiet and signal that all is well. The higher levels continue to stick to their predictions.

Third, there might be some unresolvable conflict between high-precision sense-data and predictions. The Bayesian math will indicate that the predictions are probably wrong. The neurons involved will fire, indicating “surprisal” – a gratuitiously-technical neuroscience term for surprise. The higher the degree of mismatch, and the higher the supposed precision of the data that led to the mismatch, the more surprisal – and the louder the alarm sent to the higher levels.

When the higher levels receive the alarms from the lower levels, this is their equivalent of bottom-up sense-data. They ask themselves: “Did the even-higher-levels predict this would happen?” If so, they themselves stay quiet. If not, they might try to change their own models that map higher-level predictions to lower-level sense data. Or they might try to cook the books themselves to smooth over the discrepancy. If none of this works, they send alarms to the even-higher-levels.

All the levels really hate hearing alarms. Their goal is to minimize surprisal – to become so good at predicting the world (conditional on the predictions sent by higher levels) that nothing ever surprises them. Surprise prompts a frenzy of activity adjusting the parameters of models – or deploying new models – until the surprise stops.

All of this happens several times a second. The lower levels constantly shoot sense data at the upper levels, which constantly adjust their hypotheses and shoot them down at the lower levels. When surprise is registered, the relevant levels change their hypotheses or pass the buck upwards. After umpteen zillion cycles, everyone has the right hypotheses, nobody is surprised by anything, and the brain rests and moves on to the next task.

You can see how this allows the models in human (and other animal) brains to be trained on the real world data, with the input from T+1 serving as the labels for T. The successes of GPT 2 and 3 - which happened after the book was written—are a good argument for predictive processing being true. You can see how this way, given a sufficiently powerful hardware, the model can learn pretty high level abstract concepts, like “trip” or “repetition” or “tool”. And by learn I mean not just being able to perform or perceive these things, but actually having notions of them similar to those in a human mind, being able to tell whether a certain thing “is a trip” or “isn’t a trip”, and so on.

But what happens on the topmost levels of the model? Why am I able see myself thinking, and see myself thinking about that, and so on? What the hell am I predicting at that moment and why? It makes sense and reasonably clear how and why all these external concepts are learned. But why learn representations of the things internal to the model, like thoughts and emotions? Why a human mind needs to learn to represent and predict a human mind… ah. Right.

3. Social Mind

So, the required reading for this part is Robin Hanson’s “Elephant in the Brain”. Which is quite fortunate, since literally every single person on LW is at least somewhat familiar with its main ideas, so I’ll just move on without further ado.

Of course a human mind needs to predict behavior of a human mind—that’s what it’s freaking made for! The only theory convincingly explaining how humans got so damn smart says they did so competing with other humans. The reproductive success in the ancestral evolutionary environment depended strongly on the ability to successfully interact with other humans—deceive them, seduce, negotiate, convince, impress, coordinate and so on. For that purpose, wouldn’t it be pretty handy to have a model of a human mind? Just like to better hunt antelopes, it helps to have a model of an antelope in your head, so you expect it to run away from you at such and such speed, but don’t expect it to fly or shoot lightnings out of its horns.

Obviously you can’t model another mind in exactness, given that you have about as much computational power as that other mind, but you can have at least some model, pretty abstract and high level. And what minds models are made of, on the top level? Well, the top level is where we parsed all the noisy input of reality into neat low-dimensional manifolds. When modeling another mind you probably don’t have computing power to care about exactly how are they parsed or represented, or what goes on the lower levels, nor do you have a strong need. What matters is that these manifolds are there and your fellow tribesman’s mind operates in them, and to predict him more accurately it helps if you could incorporate this fact in your model. So, these low-dimension representations of real world objects that people have in their heads—does it sound like we’d call it “thought”? And the process of operating on them “thinking”?

[You don’t have to buy into the manifold thing though, just the general idea that people have some representation of real world object in their heads, and that a sufficiently powerful model trying to predict people will quickly come up with a representation for those representations—i.e. a notion of thought.]

And one quality of good powerful models is that they generalize. And the one human mind that any human mind always has access to is, of course, itself. At this point it’s hard to imagine how, once having come up with a notion of “thought” and “thinking”, it can fail to notice these same “thoughts” within itself. Add a bunch of other concepts such as “intentions”, “emotions” etc, and you basically have it—all the notions necessary to describe a mind. You also need a notion of self, of course, but a model operating in a world would likely develop this notion relatively early on, and the mirror test experiments hint that even many animals seem to have it.

Now as per “Surfing the Uncertainty”, humans and other animals predict they own neural states all the time, that’s how brains work. The crucial difference in this case is that the predicted part is the same one that does the prediction. The topmost level of the brain predicts the level immediately down below as per regular predictive processing, but it can also predict its own states—because it was trained to predict states of other very similar human minds. When you’re looking into your mind, you see a human, but that human is the same one that looks. Which, I don’t know about you, but for me is pretty much exactly how it feels to be self-aware! I obviously can’t spell out the underlying neurobiology and math in any detail, but I’m reasonably sure there’s nothing at least prohibiting this outright, self-referential systems and autocorrelated functions are all around, and in fact it’s pretty much inevitable that a human brain should contain something of this sort, since as a matter of trivial fact I can use my mind to think about my mind.

And since a thought about a thought is also a thought, the recursion can go as deep as you want your working memory allows. You can think about yourself seeing yourself seeing yourself et cetera (and get bored after about 3 layers per Yudkowsky’s Law of Ultrafinite Recursion).

[If it is not clear why I’m using “see” and “predict” interchangeably, see “Surfing Uncertainty”. In two words, if you see and recognize something it means your bottom-up sensory data is matched by the top-down predictions. Sensory data without matching predictions feels more like looking at this image—i.e. like WTF.]

It also helps—and provides an incentive to generalize to your own mind—that you and other humans in your tribe live in approximately the same world. So as soon as you figured out that both you and them have “concepts” of real world objects in your minds, you can reuse the specific concepts from your mind to model their minds. If you’re smart and introspective enough, you can even reuse your knowledge of internal workings of your mind to model them. And it’s widely known that this is exactly what people do, and it works great—up until the point where it doesn’t, anyway.

Of course modeling your own mind feels somewhat differently from modeling others. For one thing, now you have access to much more detailed and lower-level picture than from just observing others externally. For another, now there’s no boundary between the predictor and the predicted, they are the same thing (more on this in the next section).

Note however that it’s not completely different, you’re still using the same vocabulary and the same mental concepts: we recognize thoughts, desires, goals, emotions, mental states etc in both ourselves and others as fundamentally the same. You can see into yourself, roughly, about one level deeper than into someone you know very well, if that—the better understanding of yourself is that you have answers for all the questions you’d ask about others (why did she say that? what he actually thinks about this?), but the language you’re using is the same. Compare this with e.g. the difference between how an average driver thinks about a car compared to a mechanic, or an engineer who designed that car. The latter would be thinking on entirely different levels, using entirely different models and concepts to understand what goes on with the vehicle. In this sense, human introspection looks much more like a model which was fundamentally about the outside view of a human and just happened to get a peek on the inside, rather than a model which was about the inside view to begin with.

This all also squares neatly with Robin Hanson’s model of consciousness as a “PR office”. Once you do have this ability to see yourself as another human being, one of the most useful things that a social, status-concerned creature can do with it is to make sure their actions and words represent them in the best way among their fellow tribesmen.

But when it comes to dealing with predictions proving false, Hanson’s model as I understand it of consciousness as purely a PR office contradicts predictive processing and frankly common sense, and I’m tending to be on the side of the latter two. In the predictive processing framework if a prediction doesn’t match bottom-up sensory data (“I’m the bravest man in the tribe” vs “I’m scared shitless at the sight of a lion”), the error is resolved either through action (“I don’t run despite the fear”) or update to the model (“You know, actually maybe I’m not as brave the smartest man in the tribe!”). The model is heavily regularized to err on the side of self-flattering, but not to the point where it becomes completely devoid of reality, except maybe for the most pathological cases. (More on this in the section 6)

One testable prediction of this theory is that a human child needs other humans around them to grow up fully conscious and self-aware, because otherwise they won’t have enough data to train on to develop a proper model of a human mind. The limited data we have about feral children seems to tentatively support this suggestion.

This section is essentially the main reason I’m writing this. Everything above is mostly me restating things I’ve read somewhere else. Everything below is largely a speculation which potentially makes no sense. But this idea—self-consciousness is a model trained to predict other such models and generalizing to itself—seems both extremely obvious (in retrospective) and as mentioned before, with one small exception I can’t remember ever hearing or reading about it.

Ok, now to the speculations.

4. The Least Interesting Problem

The Least Interesting Problem Of Consciousness aka “the hard problem of consciousness” goes as follows, per David Chalmers, the author of the term:

It is undeniable that some organisms are subjects of experience. But the question of how it is that these systems are subjects of experience is perplexing. Why is it that when our cognitive systems engage in visual and auditory information-processing, we have visual or auditory experience: the quality of deep blue, the sensation of middle C? How can we explain why there is something it is like to entertain a mental image, or to experience an emotion? It is widely agreed that experience arises from a physical basis, but we have no good explanation of why and how it so arises. Why should physical processing give rise to a rich inner life at all? It seems objectively unreasonable that it should, and yet it does.

or, per Wikipedia:

the problem of explaining why and how we have qualia or phenomenal experiences. That is to say, it is the problem of why we have personal, first-person experiences, often described as experiences that feel “like something.” In comparison, we assume there are no such experiences for inanimate things like, for instance, a thermostat, toaster, computer, or a sophisticated form of artificial intelligence.

Now, when there’s a non-trivial amount of specialists who question the very existence of the problem, you should expect it’s not very well defined. These and other definitions all seem to mostly shuffle mysterious-ness of consciousness between the words “subjective”, “experience” and “qualia”, and if you’ll try to find a philosophical definition of those you’ll end up going in sepulki loop pretty quickly.

So, let’s start from the basics and ask “What is it that I know and why do I think I know it?”. Well, I know that I’m conscious, and philosophical zombies aside I also know other people are. Why do I think I know it?

Let’s consider the alternative. Not some theoretical alternative, but a very practical one. Were there ever a period in your life when you, human being, did perform behavior and perceived things and reacted on them, but were not conscious of any of this? Yes for sure, it’s called early childhood! I don’t have a whole lot experience with 1-year-olds, but from what I gather they tend to do things and react to things going on around them. And I’m pretty sure anyone reading this has been one at some point. Also most people can’t recall any memories prior to the age of two (the rare cases when people claim they do are generally considered to be false memories), and have limited recollection prior to the age of ten. So, can you imagine being a 1 year old? Not really I’d guess, at most you’d imagine being an adult (or maybe a kid but much older kid which you can remember) in a toddler’s body. Because, when you imagine yourself being a toddler you imagine yourself being a toddler—in your imagination you can reflect on your mind being in a toddler’s body, but “reflect on your mind” part isn’t something that a real toddler can do! An actual toddler-you could have been cheerful or angry, but there was nobody “in there” sitting and reflecting “aha, now we’re cheering”. And if you go on to the earliest memories you do have, and then on to the later memories, you can almost track that someone “appearing”, and you “knowing” things and “realizing” what you feel more and more.

So the reason why I know I’m conscious is because there is “someone” in my head who can see inside myself and see all those mental processes going on and think about them. And the fact that this someone also happens to be me and the mental process of looking at your own mental processes is also susceptible to reflection, seems integral to the definition of consciousness. To begin to wonder “why am I self-aware” you need to be aware that you’re self-aware and so on.

Assuming the model from the previous section however, the question stops feeling mysterious and the answer (on the very high level of abstraction) becomes fairly straightforward. The predictive processing model explains how perception is mediated by prediction. The social mind theory says that the most important and most complex thing that our mind will be trying to predict is other minds. In the absence of any laws prohibiting generalization to itself, it becomes quite expected of the mind to generalize to itself and become able to recognize some top-level parts of itself (like thoughts). And the fact that a thought about a thought is also a thought and therefore can be recognized as such closes the reflection loop. I say to myself “Hey, I’m hungry!” but both the speaker and the listener are the same mind which is myself. And the one thinking about this perplexing paradox and trying to figure out which of the two is “really truly” me is also myself, and so on and so on.

Because in fact there’s only one model, in which the top level—the consciousness—tries to predict its own state based on its previous states, but also on the input from lower levels and the outside world. The top level is accessible for direct introspection—you know what you’re thinking about—but communications with any levels down below need to be done in their terms, e.g. to predict how you’d feel in a certain situation it’s not enough to just describe it verbally, you need to actually imagine yourself being there for emotional processing to kick in.

Before I got too carried away wallowing in confirmation bias and running down the happy death spiral, lets try address the elephant in the room. So far, we’ve been talking purely about reasoning about other minds, and how this process is both helping and is helped by developing self-awareness. If this theory were true, if the consciousness is indeed just a model trained to predict other similar models which generalized to itself—why this would make us so darn good at everything else?

5. Why So Smart?

Why instead of being just exceptionally good at deceiving our conspecifics, humans are by far the smartest species on the planet in general?

Frankly, I don’t have a good answer to this one. Here’s a few guesses, but I won’t be surprised if none of them is true, and also more than one can be true.

It’s just raw brainpower

One possible answer is to suggests that consciousness is merely a side-effect of a sufficiently powerful intelligence that needs to interact with other nearly-identical agents. It is useful for this interaction, but not much else. I kind of like this answer because it’s biting the bullet in the true spirit of rationality: without the prior knowledge, from the model so far described I wouldn’t have inferred that self-awareness makes your intelligence (defined as the ability to change the outside world according to your goals) to skyrocket. Therefore, I shouldn’t invent any explanations for why it actually does post-factum and instead just accept that it doesn’t.

And you can make a reasonably strong argument that this is at least a possibility. For one, I’m sure everyone here is aware how weak single-task AI models with no chance of consciousness were able to beat humans in quite a few areas which people 50 years ago would confidently say require consciousness—this is a strong evidence that consciousness is less crucial for being smart than we may have naively assumed. On the other hand, humans have always been social. If self-awareness is only a byproduct of high intelligence in social settings, we wouldn’t be able to tell the difference from it being a prerequisite for high intelligence, since for our single example of self-awareness social setting is a given.

On the con side, if this explanation is true it means Peter Watts was right about everything, vampires are real and we’re all very doomed.

Abstract thinking

The other obvious candidate here is the suggestion that the ability to reflect on your own thought process changes how it works.

How would this help? My guess is that it has something to do with being able to think about something in the absence of any relevant external stimulus. A chimp can think how to get a banana when it sees the banana, or is hungry and remembers where bananas are, or some such. Whereas I can lie there and ponder something as remote as Mars colonization, like all day long! Maybe that’s because, when a chimpanzee’s quite advanced mind sees the banana, it tries to predict the world where it gets the banana and uses all it’s world-modeling power to come up with a coherent prediction of it. But it’s stable only for as long as there’s some stimulus related to banana, once there’s no such stimulus the loop becomes unstable—the upper layers of the model trying to predict the world where the chimp gets the banana, but the lower levels report back that, well, no banana around, so after a while it dies out.

Whereas with self-reflection, the model can screen-off” the lower levels error messages: well, duh, there’s no banana and there won’t be because we’re merely thinking about one! Without the concept of “thinking” the error between the prediction (“banana”) and observation (“no banana”) persists and must be resolved and some way, and in the absence of actual banana the only way to resolve it would be to amend the top-down prediction—to drop the idea. But if you do have the concept of “thinking”, the top level can be coherently predicting just having a thought of banana in your head, and observing exactly that. Which makes this state of mind stable, or at least more much more stable, and allows one to spend their resting time thinking about ways to get the stuff they want, i.e. planning and plotting, or even just contemplating the world and finding patterns which can be potentially helpful in the future.


Another tempting candidate which follows from self-reflection is the ability to modify your own thought process. At the end of the day, one of the main reasons we’re worried about AGI is because it will likely have this ability completely unbounded.

However I’m skeptical that this is the main or even particularly important skill in what gives humans the intellectual edge over the other animals. Of course it depends on the definition of self-modification, but my impression is that this skill is far from ubiquitous even among humans, and to learn and apply it one already needs to be hella smarter than any other animal on the planet.

Language and culture

It’s not difficult to see how you can go from mind-modeling to language. Given that you and your tribesman already have these “concepts” or “manifolds” in your heads, and you’re all generally aware of this fact, it only makes sense to come up with some handles to invoke a notion you need in the head of another. Guessing wildly here, I’d imagine that these processes more likely went kind of in parallel—language developed alongside with the theory of mind, rather than coming strictly after it.

And of course the language and the culture that it enables, are what’s mostly responsible for why the humans dominate the planet. For all our intellect, if each generation were figuring everything anew, we wouldn’t even be particularly good hunter-gatherers.

Also it’s possible that once you do have these handles to notions, it makes it much easier to invoke them in your own head too. Something something context switching something. “Surfing Uncertainty” talks about it in the part “9. Being Human”. That may be another explanation to how conscious helped the humans to become so good at reasoning.

6. Implications

If we assume that the general model above is remotely correct, what would that entail? Here’s my speculations on some areas.


What good is any philosophy of consciousness if it doesn’t tell us anything about safe AIs, right?

One potential ramification of this model is that it kinda seems plausible to built a powerful super-human oracle AI, at least in relatively narrow domains, without it becoming self-aware. If self-awareness is the result of trying to predict other models like yourself, and we’ll train our [decidedly non-general] AI to do high-quality cancer research, it won’t have neither stimuli nor training data to develop consciousness. (Note that the question of whether it is possible to create a narrow but super-human AI in a field like this remains untouched).

Second, sufficiently powerful AI operating in a world with other similar AIs and trained to work with them and against them will very likely become conscious.

Third, an AI trained to work with and against humans may or may not become conscious through this process, depending on the architecture and computational power and likely some other factors.

Ideal Self

One of the topics constantly coming up together with self-reflection is some idealized image of self which people fail to live up to, the eternal conflict between what you believe you should be and what you are. This makes perfect sense if we remember that 1) our model is a predictive one and it produces actions, not just explains them and 2) the model is nevertheless heavily regularized to err on the side of modeling oneself as “good”, in whichever way your local tribe happened to define goodness.

So, on one hand you train your model (well, more appropriate wording may be “you-model”) of human mind on the observations of tribesmen doing their best to cast themselves in the most positive light possible. On the other hand, you have better ability to see through your own bullshit then through their bullshit (yes, yes, far from perfect, but somewhat better generally). It means that your predictions of what you, as a proper human, will do and think in each situation will systematically fall somewhat short of your real behavior, but instead of just lowering the bar of what counts for a proper human, your model will keep on predicting that you’ll do better next time. Which is, I’d dare to say, corresponds very closely to the observations.

S1 vs S2

[Disclaimer: I did not read the book and familiar with the concept only through cultural osmosis.]

Essentially to cast the System 1 and System 2 model described in “Thinking, Fast and Slow” in the terms of predictive processing framework, you don’t even need the whole theory outlined in this post. You can just say that System 2 is the topmost layer, and the System 1 is the second from the top. In our framework that would correspond to the System 2 being the layer capable of reflecting on itself, and the System 1 would be the layer immediately below it, to which System 1 has direct access but which in itself is not reflective.

Free Will and the Meaning of Life

Well the first one was already solved and the latter one (why it’s common for people to agonize about meaning of life and what’s existential dread?) also isn’t particularly difficult. I’ll just point out the by-now-obvious framing of these questions given this theory.

Trying to find the purpose of every action is a wonderful heuristic for dealing with other intelligent agents, so humans have it. One well-known area where it fails is dealing with anything not driven by at least some form of intelligence. Another case where it fails though is when the agent can examine itself and its own goals and ask a meta question of what’s the purpose of me having these goals? There’s no good way to answer this question, because it’s based on the flawed assumption that everything must have a goal spawning an infinite recursion—what’s the purpose of me having this purpose having this purpose… Hence, agonizing about the meaning of life.

As for the free will, it’s nothing new, it’s how the algorithm feels on the inside. We can just be slightly more specific and say this is a predictive processing algorithm. You observing yourself making a choice (and observing yourself observing yourself...) is an integral part of the choice-making process. But since self-awareness is originally about the outside view of the human mind, it feels like the choice is somewhat external to and also controlled by, the observer. Whereas in fact it’s a two way information flow, the upward stream constitutes whatever data relevant to the decision you see/​hear/​remember, and your feelings about consequences of each side of the choice and your current emotional and physiological state all such things which - who would’ve thought! - affect your free-will choice. And the downward stream is your expectations of what you, as a proper human being, would do in this situation—the predictions of your model trained on humans and biased toward you being a very good one. At some point one meets the other, errors are resolved (the details of this process are described in “Surfing Uncertainty”, seriously you should read it) and you make a choice and everything adds up to normality.

Both these questions don’t strictly require anything about brain architecture to figure out (free will has been solved for many years now), I just want to point out how naturally this follows from this theory.

Mary’s Room

As I mentioned before, the concept of qualia appears to be extremely ill-defined, so I avoided it throughout. But this specific question of “Whether Mary will gain new knowledge when she goes outside the room and experiences seeing in color?” has a pretty obvious answer.

The confusion stems from the two possible meanings of the word “knowledge”: first, it can mean a verbal statement one can recite from memory, and second, information stored in neural connections in one’s brains. You can have the latter kind of knowledge without the former. For example, you most likely can’t verbally enumerate which individual muscles are contracted in which order and to which extent as you type the word “knowledge”—but that information is clearly stored in your brain somewhere, so you know it in the second sense.

Perception works in exactly the same way only in reverse, per predictive processing framework. Crudely speaking, neurons in the first layers of visual cortex know how to parse color and brightness alterations into lines and curves, layers after that know how to construct 3D shapes out of those lines, infer location of those shapes relative to each other and to you in the 3D space, attach labels to those shapes and correspond them to sounds and other sensations, and at the very top sits the consciousness which perceives e.g. a red car speeding buy.

Mary, being the super-scientist, knows all that and much more and can recite it to you. But there’s no mechanism in the human brain to propagate this knowledge from the uppermost levels down to the visual cortex neurons which will be actually inferring things from colors. Mary’s neurons so far have only worked with alterations in brightness. And this also means that the neurons instantiating her consciousness don’t know in the second sens what it is like to receive signals about colors from the lower levels, i.e. the conscious part of Mary also has no idea of what it means to perceive e.g. a color red—even though Mary does know it in the first sense, she can recite it to you verbally.

In other words, when Mary tells you how human visual cortex works, upper levels of her brain predict their own state (and, through the regular predictive processing mechanisms, state of all the parts of the brain down below) corresponding to Mary talking about colors, or thinking words about colors. But in order to actually visualize seeing a color Mary’s brain needs to take a verbal notion of color—which Mary does have—and propagate it down through all the levels to end up in almost the same state as if she was actually seeing that color—an operation which Mary’s brain can’t execute. The neural connections required for this operation are simply not there. Surely Mary knows in the first sense—i.e. she can tell verbally—what those connections should be, but it doesn’t help to actually create them any more than knowing what BMI 20 is helps one to get thin.


Thank you for reading all of this!

To close off, here’s some testable predictions of this theory, sorted roughly from relatively easier to test to more hypothetical.

  1. As a child develops, their understanding of self progresses at the same level as their understanding of others.

  2. A child raised in isolation from human contact won’t develop proper self-awareness (not experimentally testable of course, but sadly examples of this do happen).

  3. Given enough computational power and a compatible architecture, the agent will develop consciousness if and only if it needs to interact with other agents of the same kind, or at least of similar level of intelligence.

  4. Consciousness isn’t a binary thing, there’s some form of continuous spectrum just like there’s a spectrum in the agent’s ability to model other agents.

And here’s one prediction that at least seems to have failed:

  • If self-awareness is just modeling other minds directed inwards, it would seem that people best at understanding others and navigating social life should also be the most introspective and self-aware ones. Why does the opposite seem to be the case? E.g. why nearly all advice on social skills includes tips on how to be less self-aware?

Maybe that’s because you have some limited capacity to model humans brains and if you direct lots of it inwards you have less available to direct outwards. Maybe it’s about precision vs speed trade-off—people who can model others and themselves in greatest details struggle doing it at speed and with multiple people, which is necessary in social settings. Both of these explanations sound unsatisfactory and adhoc-ish to me though.

How likely do I think that this is all wrong and doesn’t make any sense? From the outside view, quite likely. From the inside view, by now it feels kinda natural and straightforward way to think about consciousness for me. So I don’t know. But more generally, I do think that we’re at the point, or at least very close to it, where someone much smarter and better educated can come up with a detailed, fully satisfying account of what consciousness is and how it works, which will leave no mysteriousness hidden under the rug.

ETA: right after publishing this I’ve stumbled upon this post reviewing a book which talks about very similar ideas in a more detailed way and from another perspective. From the post it sounds very much compatible with what I’m saying here and potentially answering to the question in the part 5. I’m definitely going to read the book to get a better understanding.