I want to contrast two perspectives on human epistemology I’ve been thinking about for over a year.
There’s one school of thought about how to do reasoning about the future which is about naming a bunch of variables, putting probability distributions over them, multiplying them together, and doing bayesian updates when you get new evidence. This lets you assign probabilities, and also to lots of outcomes. “What probability do I assign that the S&P goes down, and the Ukraine/Russia war continues, and I find a new romantic partner?” I’ll call this the “spreadsheets” model of epistemology.
There’s another perspective I’ve been ruminating on which is about visualizing detailed and concrete worlds, in a similar way to if you hold a ball and ask me to visualize how it’ll fall if you drop it, that I can see the world in full detail. This is more about loading a full hypothesis into your head, and into your anticipations. It’s more related to Privileging the Hypothesis / Believing In / Radical Probabilism[1]. I’ll call this the “cognitive visualization” model of epistemology.
These visualizations hook much more directly to my anticipations and motivations. When I am running after you to remind you that you forgot to take your adderall today, it is not because I had a spreadsheet simulate lots of variables and in a lot of worlds the distribution said it was of high utility to you. I’m doing it because I have experienced you getting very upset and overwhelmed on many days when you forgot and those experiences flashed through my mind as likely outcomes that I am acting hurriedly do divert the future from. When I imagine a great event that I want to run, I am also visualizing a certain scene, a certain freeze frame, a certain mood, that I desire and I believe it is attainable, and I am pushing and pulling on reality to line it up so that it’s a direct walk from here to there.
Now I’m going to say that these visualizations are working memory bottlenecked, and stylize that idea more than is accurate. Similar to the idea that there are only ~7 working memory slots in the brain[2], I feel that for many important parts of my life I can only fit a handful of detailed visualizations of the future easily accessible to my mind to use to orient with. This isn’t true in full generality – at any time or day if you ask me to visualize what happens if you drop a ball, I have an immediate anticipation – but if you constantly ask me to visualize a world in great detail with the S&P 500 goes up and the war continues, versus down and the war stops, and lots of other permutations with other variables changed, then I start to get fatigued. And this is true for life broadly, that I’m only loading up so many detailed visualizations of specific worlds.
Given this assumption – if indeed one perhaps only has a handful of future-states that one can load into one’s brain – the rules of how to update your beliefs and anticipations are radically changed from the spreadsheets model of epistemology. Adding a visualization to your mind means removing one of your precious few; this means you will be better equipped to deal with worlds like the one you’re adding, and less well-equipped to deal with worlds like the one you’ve removed. This includes both taking useful action and making accurate predictions; which ones you load into your mind are a function of accuracy and usefulness. It can help to add many worlds into your cognitive state that you wish to constantly fight to not happen, causing the pathways to those worlds to loom higher when making your predictions. Yet being in your mind is evidence that they will not happen because you are optimizing so. Alternatively, when it is very hard to achieve something, it is often good to load (in great detail) world states that you wish to move towards, such that with every motion and action you have checked whether it’s hewing in the direction of that world, and made the adjustments to the world as required.
This model gives an explanation for why people who are very successful often say they cannot imagine failure. They have loaded into their brain the world they are moving toward, in great detail, and in every situation they are connecting it to the world they desire and making the adjustments to setup reality to move in the right way. It is sometimes actively unhelpful to constantly be comparing reality to lots of much worse worlds and asking yourself what actions you could take to make those more likely. My sense is that this mostly helps you guide reality toward those worlds.
And yet, I value having true beliefs, and being able to give accurate answers to questions that aren’t predictably wrong. If I don’t load a certain world-model into my brain, or if I load a set of biased ones (which I undoubtedly will in the story where can only pick ~7), I may intuitively give inaccurate answers to questions. I think this is probably what happens in the startup founders who give inaccurately high probabilities of success – their head is filled entirely with cognitive visualizations of worlds that succeed and are focused on how to get there, relative to the person with the spreadsheet that is calculating all of the variables and optimizing for accuracy far above human-functionality.
In contrast, when founders semi-regularly slide into despair, I think this is about adding a concrete visualization of total failure to their cognitive workhouse. Suddenly lots of the situations and actions you are in are filled with fear and pain as you see them moving toward a world you desire strongly not to be in. While it is not healthy to be constantly focused on asking yourself how you could make things worse and to notice those pathways, it is helpful to boot up that visualization sometimes in order to check that’s not what’s currently happening. I have personally found it is useful to visualize in detail what it would look like if I were to be acting very stupidly, or actively self-sabotaging, in order to later make sure I behave in ways that definitely don’t come close to that. Despair is a common consequence of booting up those perspectives.
I am still confused about what exactly counts as a cognitive visualization – in some sense I’m producing hundreds of cognitive visualizations per day, so how could I be working memory bottlenecked? I also still have more to learn in the art of human rationality, of choosing when to change the set of cognitive visualizations to have loaded in at any given time, for which I cannot simply rely on Bayes’ theorem. For now I will say that I endeavor to be able to produce the spreadsheet answers, and to often use them as my betting odds, even while it is not the answer I get when I run my cognitive visualizations or where my mind is when I take actions. I endeavor to sometimes say “I literally cannot imagine this failing. Naturally, I give it greater than 1:1 odds that it indeed does so.”
More specifically (and this will make sense later in this quick take) when you’re switching out which visualizations are in your working memory, the updates you make to your probabilities will decidedly not be Bayesian, but perhaps more like the fluid updates / Jeffrey updating discussed by Abram.
I don’t really know what a “slot” means here, so I don’t know that “7″ meaningfully maps onto some discrete thing, but the notion that the brain has a finite amount of working memory is hard to argue with.
I think the billion-dollar question is, what is the relationship between these two perspectives? For example, a simplistic approach would be to see cognitive visualization as some sort of Monte Carlo version of spreadsheet epistemology. I think that’s wrong, but the correct alternative is less clear. Maybe something involving LDSL, but LDSL seems far from the whole story.
So, one problem seems to be that humans are slow, and evaluating all options would require too much time, so you need to prune the option tree a lot. I am not sure what is the optimal strategy here; seems like all the lottery winners have focused on analyzing the happy path, but we don’t know how much luck was involved at actually staying on the happy path, and what was the average outcome when they deviated from it.
Another problem is that human prediction and motivation are linked in a bad way, where having a better model of the world sometimes makes you less motivated, so sometimes lying to yourself can be instrumentally useful… the problem is, you cannot figure out how much instrumentally useful exactly, because you are lying to yourself, duh.
This model gives an explanation for why people who are very successful often say they cannot imagine failure.
Another important piece of data would be, how many of the people who cannot imagine failure actually do succeed, and what typically happens to them when they don’t. Maybe nothing serious. Maybe they often ruin their lives.
Hm, I’m not sure I understand what’s confusing about this.
First, suppose you’re an approximate utility maximizer. There’s a difference between optimizing the expected utilityE[U(world,action)] and optimizing utility in the expected worldU(E(world),action). In general, in the former case, you’re not necessarily keeping the most-likely worlds in mind; you’re optimizing the worlds in which you can get the most payoff. Those may be specific terrible outcomes you want to avert, or specific high-leverage worlds in which you can win big (e. g., where your startup succeeds).
Choosing which worlds to keep in mind/optimize obviously impacts in which worlds you succeed. (Startup founders who start being risk-averse instead of aiming to win in the worlds in which they can win big lose – because they’re no longer “looking” at the worlds where they succeed, and aren’t shaping their actions to exploit their features.)
Second, human world-models are hierarchical, and your probability distribution over worlds is likely multimodal. So when you pick a set of worlds you care about, you likely pick several modes of this distribution (rather than specific fully specified worlds), characterized by various high-level properties (such as “AI progress continues apace” vs. “DL runs into a wall”). When thinking about one of the high-level-constrained worlds/the neighbourhood of a mode, you further zoom-in on modes corresponding to lower-level properties, and so on.
Which is why you’re not keeping a bunch of basically-the-same expected trajectories in your head, but meaningfully “different” trajectories.
This… all seems to be business-as-usual to me? I may be misunderstanding what you’re getting at.
I wrote this because I am increasingly noticing that the rules for “which worlds to keep in mind/optimize” are often quite different from “which worlds my spreadsheets say are the most likely worlds”. And that this is in conflict with my heuristics which would’ve said “optimize the world-models in your head for being the most accurate ones – the ones that will give you the most accurate answers to most questions” rather than something like “optimize the world-models in your head for being the most useful ones”.
(Though the true answer is some more complicated function combining both practical utility and map-territory correspondence.)
I’m not sure I understand what’s confusing about this.
I will note that what is confusing to one person need not be confusing to another person. In my experience it is a common state of affairs for one person in a conversation to be confused and the other not (whether it be because the latter person is pre-confused, post-confused, or simply because their path to understanding a phenomena didn’t pass through the state of their interlocutor).
It seems probable to me that I have found this subject more confusing than have others.
This is probably obvious to you, but you can expand the working memory bottleneck by making lots of notes. You still need to store the “index” of the notes in your working memory though, to be able to get back to relevant ideas later. Making a good index includes compressing the ideas till you get the “core” insights into it.
Some part of what we consider intelligence is basically search and some part of what we consider faster search is basically compression.
Tbh you can also do multi-level indexing, the top-level index (crisp world model of everything) could be in working memory and it can point to indexes (crisp world model of a specific topic) actually written in your notes, which further point to more extensive notes on that topic.
As an aside, automated R&D using LLMs currently heavily relies on embedding search and RAG. AI’s context window is loosely analogous to human’s working memory in that way. AI knows millions of ideas but it can’t simulate pairwise interactions between all ideas as that would require too much GPU time. So it too needs to select some pairs or tuples of ideas (using embedding search or something similar) within which it can explore interactions.
The embedding dataset is a compressed version of the source dataset and the LLM itself is an even more compressed version of the source dataset. So there is interplay between data at different levels of compression.
I want to contrast two perspectives on human epistemology I’ve been thinking about for over a year.
There’s one school of thought about how to do reasoning about the future which is about naming a bunch of variables, putting probability distributions over them, multiplying them together, and doing bayesian updates when you get new evidence. This lets you assign probabilities, and also to lots of outcomes. “What probability do I assign that the S&P goes down, and the Ukraine/Russia war continues, and I find a new romantic partner?” I’ll call this the “spreadsheets” model of epistemology.
There’s another perspective I’ve been ruminating on which is about visualizing detailed and concrete worlds, in a similar way to if you hold a ball and ask me to visualize how it’ll fall if you drop it, that I can see the world in full detail. This is more about loading a full hypothesis into your head, and into your anticipations. It’s more related to Privileging the Hypothesis / Believing In / Radical Probabilism[1]. I’ll call this the “cognitive visualization” model of epistemology.
These visualizations hook much more directly to my anticipations and motivations. When I am running after you to remind you that you forgot to take your adderall today, it is not because I had a spreadsheet simulate lots of variables and in a lot of worlds the distribution said it was of high utility to you. I’m doing it because I have experienced you getting very upset and overwhelmed on many days when you forgot and those experiences flashed through my mind as likely outcomes that I am acting hurriedly do divert the future from. When I imagine a great event that I want to run, I am also visualizing a certain scene, a certain freeze frame, a certain mood, that I desire and I believe it is attainable, and I am pushing and pulling on reality to line it up so that it’s a direct walk from here to there.
Now I’m going to say that these visualizations are working memory bottlenecked, and stylize that idea more than is accurate. Similar to the idea that there are only ~7 working memory slots in the brain[2], I feel that for many important parts of my life I can only fit a handful of detailed visualizations of the future easily accessible to my mind to use to orient with. This isn’t true in full generality – at any time or day if you ask me to visualize what happens if you drop a ball, I have an immediate anticipation – but if you constantly ask me to visualize a world in great detail with the S&P 500 goes up and the war continues, versus down and the war stops, and lots of other permutations with other variables changed, then I start to get fatigued. And this is true for life broadly, that I’m only loading up so many detailed visualizations of specific worlds.
Given this assumption – if indeed one perhaps only has a handful of future-states that one can load into one’s brain – the rules of how to update your beliefs and anticipations are radically changed from the spreadsheets model of epistemology. Adding a visualization to your mind means removing one of your precious few; this means you will be better equipped to deal with worlds like the one you’re adding, and less well-equipped to deal with worlds like the one you’ve removed. This includes both taking useful action and making accurate predictions; which ones you load into your mind are a function of accuracy and usefulness. It can help to add many worlds into your cognitive state that you wish to constantly fight to not happen, causing the pathways to those worlds to loom higher when making your predictions. Yet being in your mind is evidence that they will not happen because you are optimizing so. Alternatively, when it is very hard to achieve something, it is often good to load (in great detail) world states that you wish to move towards, such that with every motion and action you have checked whether it’s hewing in the direction of that world, and made the adjustments to the world as required.
This model gives an explanation for why people who are very successful often say they cannot imagine failure. They have loaded into their brain the world they are moving toward, in great detail, and in every situation they are connecting it to the world they desire and making the adjustments to setup reality to move in the right way. It is sometimes actively unhelpful to constantly be comparing reality to lots of much worse worlds and asking yourself what actions you could take to make those more likely. My sense is that this mostly helps you guide reality toward those worlds.
And yet, I value having true beliefs, and being able to give accurate answers to questions that aren’t predictably wrong. If I don’t load a certain world-model into my brain, or if I load a set of biased ones (which I undoubtedly will in the story where can only pick ~7), I may intuitively give inaccurate answers to questions. I think this is probably what happens in the startup founders who give inaccurately high probabilities of success – their head is filled entirely with cognitive visualizations of worlds that succeed and are focused on how to get there, relative to the person with the spreadsheet that is calculating all of the variables and optimizing for accuracy far above human-functionality.
In contrast, when founders semi-regularly slide into despair, I think this is about adding a concrete visualization of total failure to their cognitive workhouse. Suddenly lots of the situations and actions you are in are filled with fear and pain as you see them moving toward a world you desire strongly not to be in. While it is not healthy to be constantly focused on asking yourself how you could make things worse and to notice those pathways, it is helpful to boot up that visualization sometimes in order to check that’s not what’s currently happening. I have personally found it is useful to visualize in detail what it would look like if I were to be acting very stupidly, or actively self-sabotaging, in order to later make sure I behave in ways that definitely don’t come close to that. Despair is a common consequence of booting up those perspectives.
I am still confused about what exactly counts as a cognitive visualization – in some sense I’m producing hundreds of cognitive visualizations per day, so how could I be working memory bottlenecked? I also still have more to learn in the art of human rationality, of choosing when to change the set of cognitive visualizations to have loaded in at any given time, for which I cannot simply rely on Bayes’ theorem. For now I will say that I endeavor to be able to produce the spreadsheet answers, and to often use them as my betting odds, even while it is not the answer I get when I run my cognitive visualizations or where my mind is when I take actions. I endeavor to sometimes say “I literally cannot imagine this failing. Naturally, I give it greater than 1:1 odds that it indeed does so.”
More specifically (and this will make sense later in this quick take) when you’re switching out which visualizations are in your working memory, the updates you make to your probabilities will decidedly not be Bayesian, but perhaps more like the fluid updates / Jeffrey updating discussed by Abram.
I don’t really know what a “slot” means here, so I don’t know that “7″ meaningfully maps onto some discrete thing, but the notion that the brain has a finite amount of working memory is hard to argue with.
I think the billion-dollar question is, what is the relationship between these two perspectives? For example, a simplistic approach would be to see cognitive visualization as some sort of Monte Carlo version of spreadsheet epistemology. I think that’s wrong, but the correct alternative is less clear. Maybe something involving LDSL, but LDSL seems far from the whole story.
So, one problem seems to be that humans are slow, and evaluating all options would require too much time, so you need to prune the option tree a lot. I am not sure what is the optimal strategy here; seems like all the lottery winners have focused on analyzing the happy path, but we don’t know how much luck was involved at actually staying on the happy path, and what was the average outcome when they deviated from it.
Another problem is that human prediction and motivation are linked in a bad way, where having a better model of the world sometimes makes you less motivated, so sometimes lying to yourself can be instrumentally useful… the problem is, you cannot figure out how much instrumentally useful exactly, because you are lying to yourself, duh.
Another important piece of data would be, how many of the people who cannot imagine failure actually do succeed, and what typically happens to them when they don’t. Maybe nothing serious. Maybe they often ruin their lives.
Hm, I’m not sure I understand what’s confusing about this.
First, suppose you’re an approximate utility maximizer. There’s a difference between optimizing the expected utility E[U(world,action)] and optimizing utility in the expected world U(E(world),action). In general, in the former case, you’re not necessarily keeping the most-likely worlds in mind; you’re optimizing the worlds in which you can get the most payoff. Those may be specific terrible outcomes you want to avert, or specific high-leverage worlds in which you can win big (e. g., where your startup succeeds).
Choosing which worlds to keep in mind/optimize obviously impacts in which worlds you succeed. (Startup founders who start being risk-averse instead of aiming to win in the worlds in which they can win big lose – because they’re no longer “looking” at the worlds where they succeed, and aren’t shaping their actions to exploit their features.)
Second, human world-models are hierarchical, and your probability distribution over worlds is likely multimodal. So when you pick a set of worlds you care about, you likely pick several modes of this distribution (rather than specific fully specified worlds), characterized by various high-level properties (such as “AI progress continues apace” vs. “DL runs into a wall”). When thinking about one of the high-level-constrained worlds/the neighbourhood of a mode, you further zoom-in on modes corresponding to lower-level properties, and so on.
Which is why you’re not keeping a bunch of basically-the-same expected trajectories in your head, but meaningfully “different” trajectories.
This… all seems to be business-as-usual to me? I may be misunderstanding what you’re getting at.
I wrote this because I am increasingly noticing that the rules for “which worlds to keep in mind/optimize” are often quite different from “which worlds my spreadsheets say are the most likely worlds”. And that this is in conflict with my heuristics which would’ve said “optimize the world-models in your head for being the most accurate ones – the ones that will give you the most accurate answers to most questions” rather than something like “optimize the world-models in your head for being the most useful ones”.
(Though the true answer is some more complicated function combining both practical utility and map-territory correspondence.)
I will note that what is confusing to one person need not be confusing to another person. In my experience it is a common state of affairs for one person in a conversation to be confused and the other not (whether it be because the latter person is pre-confused, post-confused, or simply because their path to understanding a phenomena didn’t pass through the state of their interlocutor).
It seems probable to me that I have found this subject more confusing than have others.
(edited)
This is probably obvious to you, but you can expand the working memory bottleneck by making lots of notes. You still need to store the “index” of the notes in your working memory though, to be able to get back to relevant ideas later. Making a good index includes compressing the ideas till you get the “core” insights into it.
Some part of what we consider intelligence is basically search and some part of what we consider faster search is basically compression.
Tbh you can also do multi-level indexing, the top-level index (crisp world model of everything) could be in working memory and it can point to indexes (crisp world model of a specific topic) actually written in your notes, which further point to more extensive notes on that topic.
As an aside, automated R&D using LLMs currently heavily relies on embedding search and RAG. AI’s context window is loosely analogous to human’s working memory in that way. AI knows millions of ideas but it can’t simulate pairwise interactions between all ideas as that would require too much GPU time. So it too needs to select some pairs or tuples of ideas (using embedding search or something similar) within which it can explore interactions.
The embedding dataset is a compressed version of the source dataset and the LLM itself is an even more compressed version of the source dataset. So there is interplay between data at different levels of compression.