How much are you interested in a positive vs normative theory of counterfactuals? For example, do you feel like you understand how humans do counterfactual reasoning, and how and why it works for them (insofar as it works for them)? If not, is such an understanding what you’re looking for? Or do you think humans are not perfect at counterfactual reasoning (e.g. maybe because people disagree with each other about Newcomb’s problem etc.) and there’s some deep notion of “correct counterfactual reasoning” that humans are merely approximating, and the deeper “correct” thing is what you really care about?
(For my part I’m somewhat skeptical that there is a notion of counterfactuals that is fundamentally different from and better than what humans do.)
Update: I should further clarify that even though I provided a rough indication of how important I consider various approaches, this is off-the-cuff and I could be persuaded an approach was more valuable than I think, particularly if I saw good quality work.
I guess my ultimate interest is normative as the whole point of investigating this area is to figure out what we should do.
However, I am interested in descriptive theories insofar as they can contribute to this investigation (and not insofar as the details aren’t useful for normative theories). For example, when I say that counterfactuals only make sense from within the counterfactual perspective and further that counterfactuals are ultimately grounded as an evolutionary adaption I’m making descriptive statements. The latter seems to be more of a positive statement, while the former doesn’t seem to be (it seems to be justified by philosophical reasoning more than empirical investigation). In any case, it feels like there is more work to be done in taking these high-level abstract statements and making them more precise.
For example, do you feel like you understand how humans do counterfactual reasoning, and how and why it works for them (insofar as it works for them)?
I think that further investigation here could be useful—although not in the sense that 40% use this style of reasoning and 60% use this style—exact percentages aren’t the relevant things here—at least not at this early stage. I’d also lean towards saying that how experts operate is more important than average humans and that the behavior of especially stupid humans is probably of limited importance.
I guess I see the behaviour of normal humans mattering for two reasons:
a) Firstly because I see making use of counterfactuals as evolutionarily grounded (in a more primitive form than the highly cognitive and mathematically influenced versions that we tend to use on LW)
b) Secondly because the experts are more likely to discard intuitions that don’t agree with their theories. And I think we need to use our reasoning to produce a consistent theory from our intuitions at some point, but this may be less than ideal if we’re simply trying to collect various intuitions as raw data to later turn into a theory.
I should clarify: in the above discussion, I’m commenting on what I’m interested in, rather than what’s in scope. The scope of the prize is the proposition that counterfactuals only make sense within themselves. And I guess part of what I was trying to clarify above is that empirical investigation can be relevant when carefully chosen. Happy to provide additional clarification if you were planning to submit a post covering something specific.
There’s some deep notion of “correct counterfactual reasoning” that humans are merely approximating, and the deeper “correct” thing is what you really care about?
I guess my position on this is complex as I believe that counterfactuals only make sense in terms of themselves. So I don’t think there is a “true” notion of counterfatuals that exists within the ontology, rather I see them as a heuristic ultimately grounded by evolution. That said, our instinct to systematise and use logic to make things more coherent is also grounded in evolution.
(For my part I’m somewhat skeptical that there is a notion of counterfactuals that is fundamentally different from and better than what humans do.)
People often hold vastly different perspectives on what counts as “fundamentally different” from something else. That said, I believe we should one-box on Newcomb’s problem (do you?) and I guess that seems fundamentally different from how humans who are trained on traditional decision theory/classical physics think. On the other hand, it may not be fundamentally different from how more untutored and instinctual individuals woudl behave. I guess I’d be curious where you stand here.
I think brains build a generative world-model, and that world-model is a certain kind of data structure, and “counterfactual reasoning” is a class of operations that can be performed on that data structure. (See here.) I think that counterfactual reasoning relates to reality only insofar as the world-model relates to reality. (In map-territory terminology: I think counterfactual reasoning is a set of things that you can do with the map, and those things are related to the territory only insofar as the map is related to the territory.)
I also think that there are lots of specific operations that are all “counterfactual reasoning” (just as there are lots of specific operations that are all “paying attention”—paying attention to what?), and once we do a counterfactual reasoning operation, there are also a lot of things that we can do with the result of the operation. I think that, over our lifetimes, we learn metacognitive heuristics that guide these decisions (i.e. exactly what “counterfactual reasoning”-type operations to do and when, and what to do with the result of the operation), and some people’s learned metacognitive heuristics are better than others (from the perspective of achieving such-and-such goal).
Analogy: If you show me a particular trained ConvNet that misclassifies a particular dog picture as a cat, I wouldn’t say that this reveals some deep truth about the nature of image classification, and I wouldn’t conclude that there is necessarily such a thing as a philosophically-better type of image classifier that fundamentally doesn’t ever make mistakes like that. (The brain image classifier makes mistakes too, albeit different mistakes than ConvNets make, but that’s besides the point.) Instead I would be more inclined to look for a very complicated explanation of the mistake, related to details of its training data and so on.
By the same token: if someone makes a poor decision on Newcomb’s problem, I don’t think that reveals some deep truth about the nature of counterfactual reasoning, and I wouldn’t conclude that there is necessarily such a thing as a philosophically-better type of counterfactual reasoning that fundamentally doesn’t ever make mistakes like that. Instead I would be more inclined to look for a very complicated explanation of the mistake, related to the person’s life history, exactly how Newcomb’s problem was explained to them, exactly what their learned world-model looks like, etc.
And if I wanted to build an AGI that performed well on Newcomb’s problem, I would build the AGI first, and then have the AGI read Eliezer’s essays or whatever, same as if I wanted my (human) friend to perform well on Newcomb’s problem. :-)
I also think that there are lots of specific operations that are all “counterfactual reasoning”
Agreed. This is definitely something that I would like further clarity on
Instead I would be more inclined to look for a very complicated explanation of the mistake, related to details of its training data and so on.
I guess the real-world reasons for a mistake are sometimes not very philosophically insightful (ie. Bob was high when reading the post, James comes from a Spanish speaking background and they use their equivalent of a word differently than English-speakers, Sarah has a terrible memory and misremembered it)
I’m guessing like your position might be that there are just mistakes and there aren’t mistakes that are more philosophically fruitful or less fruitful? There’s just mistakes. Is that correct? Or were you just responding to my specific claim that it might be useful to know how the average person responds to problems because we are evolved creatures? If so, then I definitely agree that we’d have to delve into the details and not just remain on the level of averages.
Update: Actually, I’ll add an analogy that might be helpful. Let’s suppose you didn’t know what a dog was. Actually, that’s kind of the case: once you start diving into any definition you end up running into fuzzy cases, such as does a robotic dog count as a dog? Then if humans had built a bunch of different classifiers and you didn’t have access to the humans (say they went extinct) then you might want to analyse the different classifiers to try to figure out how humans defined the term dog, even though much of the behaviour might only tell you how the flaws tend to produce rather than about the human concept
Similarly, we don’t have exact access to our evolutionary history, but examining human intuitions about counterfactuals might provide insights about which heuristics have worked well, whilst also recognising that it’s hard, arguably impossible, to even talk about “working well” without embracing the notion of counterfactuals. And I agree that there are probably different ways we could emphasis various heuristics rather than a unique, principled solution.
I’m not claiming the situation is precisely this—in fact I’m not sure exactly how useful this analogy is—but I think it’s worth sharing anyway in case it lands.
I also think that there are lots of specific operations that are all “counterfactual reasoning”
Agreed. This is definitely something that I would like further clarity on
Hmm, my hunch is that you’re misunderstanding me here. There are a lot of specific operations that are all “making a fist”. I can clench my fingers quickly or slowly, strongly or weakly, left hand or right hand, etc. By the same token, if I say to you “imagine a rainbow-colored tree; are its leaves green?”, there are a lot of different specific mental models that you might be invoking. (It could have horizontal rainbow stripes on the trunk, or it could have vertical rainbow stripes on its branches, etc.) All those different possibilities involve constructing a counterfactual mental model and querying it, in the same nuts-and-bolts way. I just meant, there are many possible counterfactual mental models that one can construct.
I’m guessing like your position might be that there are just mistakes and there aren’t mistakes that are more philosophically fruitful or less fruitful? There’s just mistakes. Is that correct?
Suppose I ask “There’s a rainbow-colored tree somewhere in the world; are its leaves green?” You think for a second. What’s happening under the surface when you think about this? Inside your head are various different models pushing in different directions. Maybe there’s a model that says something like “rainbow-colored things tend to be rainbow-colored in all respects”. So maybe you’re visualizing a rainbow-colored tree, and querying the color of the leaves in that model, and this model is pushing on your visualized tree and trying to make it have a color scheme that’s compatible with the kinds of things you usually see, e.g. in cartoons, which would be rainbow-colored leaves. But there’s also a botany model that says “tree leaves tend to be green, because that’s the most effective for photosynthesis, although there are some exceptions like Japanese maples and autumn colors”. In scientifically-educated people, probably there will also be some metacognitive knowledge that principles of biology and photosynthesis are profound deep regularities in the world that are very likely to generalize , whereas color-scheme knowledge comes from cartoons etc. and is less likely to generalize.
So what’s at play is not “the nature of counterfactuals”, but the relative strengths of these three specific mental models (and many more besides) that are pushing in different directions. The way it shakes out will depend on the particular person and their life experience (and in particular, how much of a track-record of successful predictions these models have built up in similar contexts).
By the same token, I think every neurotypical human thinking about Newcomb’s problem is using counterfactual reasoning, and I think that there isn’t any interesting difference in the general nature of the counterfactual reasoning that they’re using. But the mental model of free will is different in different people, and the mental model of Omega is different in different people, etc.
Hmm, maybe we’re talking past each other a bit because of the learning-algorithm-vs-trained-model division. Understanding the learning algorithm is like being able to read and understand the the source code for a particular ML paper (and the PyTorch source code that it calls in turn). Understanding the trained model is like OpenAI microscope.
(It’s really “learning algorithm & inference algorithm”—the first changes the parameters, the second chooses what to do right now. I’m just calling it “learning algorithm” for short.)
I usually take the perspective that “the main event” is to understand the learning algorithm, because that’s what you need to build AGI, and that’s what the genome needs to build humans (thanks to within-lifetime learning), whereas understanding the trained model is “a sideshow”, unnecessary for building AGI, but still worth talking about for safety and whatnot.
On the “learning algorithm” side, I put “the basic capability to do counterfactual reasoning operations”. On the “trained model” side, I put all the learned heuristics about how reliable counterfactual reasoning is under what circumstance, and also all the learned concepts that go into a particular “counterfactual reasoning” operation (e.g. botany concepts, free will concepts, etc.)
Then when I brashly declare “I basically understand counterfactual reasoning”, I’m just talking about the stuff on the “learning algorithm” side. Whereas it seems that you feel like your project is to understand stuff on both sides—not only what a “counterfactual reasoning” operation is at a nuts-and-bolts level, but also all the other things that go into Newcomb’s problem, like whether there’s a “free will” concept in the world-model and what other concepts it’s connected to and how strongly (all of which can impact the results of a “counterfactual reasoning” operation). Then that research program seems to me to be more about normative decision theory and epistemology (e.g. “what to do in Newcomb’s problem”), rather than about the nature of counterfactual reasoning per se. Or I guess perhaps what you’re going for is closer to “practical advice that helps adult humans use counterfactual reasoning to reach correct conclusions”? In that case I’d be a bit surprised if there was much generically useful advice like that; I would expect that the main useful thing is object-level stuff like teaching better intuitions about the nature of free will etc.
I just meant, there are many possible counterfactual mental models that one can construct.
I agree that there isn’t a single uniquely correct notion of a counterfactual. I’d say that we want different things from this notion and there are different ways to handle the trade-offs.
By the same token, I think every neurotypical human thinking about Newcomb’s problem is using counterfactual reasoning, and I think that there isn’t any interesting difference in the general nature of the counterfactual reasoning that they’re using.
I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well.
I usually take the perspective that “the main event” is to understand the learning algorithm, because that’s what you need to build AGI, and that’s what the genome needs to build humans
Well, we need the information encoded in our DNA rather than than what is actually implemented in humans (clarification: what is implemented in humans is significantly influenced by society) though we aren’t at the level where we can access that by analysing the DNA directly or people’s brain structure for that matter, so we have to reverse engineer it from behaviour
Or I guess perhaps what you’re going for is closer to “practical advice that helps adult humans use counterfactual reasoning to reach correct conclusions”?
I’ve very much focused on trying to understand how to solve these problems in theory rather than how can we correct any cognitive flaws in humans or on how to adapt decision theory to be easier or more convenient to use.
In so far as I’m interested in how average humans reason counterfactually, it’s mostly about trying to understand the various heuristics that are the basis of counterfactuals. I guess I believe that we need counterfactuals to understand and evaluate these heuristics, but I guess I’m hoping that we can construct something reflexively consistent.
By the same token, I think every neurotypical human thinking about Newcomb’s problem is using counterfactual reasoning, and I think that there isn’t any interesting difference in the general nature of the counterfactual reasoning that they’re using.
I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well.
I think there is “machinery that underlies counterfactual reasoning” (which incidentally happens to be the same as “the machinery that underlies imagination”). My quote above was saying that every human deploys this machinery when you ask them a question about pretty much any topic.
I was initially assuming (by default) that if you’re trying to understand counterfactuals, you’re mainly trying to understand how this machinery works. But I’m increasingly confident that I was wrong, and that’s not in fact what you’re interested in. Instead it seems that your interests are more like “how would an AI, equipped with this kind of machinery, reach correct conclusions about the world?” (After all, the machinery by itself can lead to both correct and incorrect conclusions—just as “thinking / reasoning in general” can lead to correct or incorrect conclusions.)
Given what (I think) you’re trying to do above, I’m somewhat skeptical that you’ll make progress by thinking about the philosophical nature of counterfactuals in general. I don’t think there’s a clean separation between “good counterfactual reasoning” and “good reasoning in general”. If I say some counterfactual nonsense like “If the Earth were a flat disk, then the north pole would be in the center,” I think the reason it’s nonsense lives at the object-level, i.e. the detailed content of the thought in the context of everything else we know about the world. I don’t think the problem with that nonsense thought can be diagnosed at the meta-level, i.e. by examining structural properties of its construction as a counterfactual or whatever.
So by the same token, I think that “what counterfactuals make sense in the context of decision-making” is a decision theory question, not a counterfactuals question, and I expect a good answer to look like explicit discussions of decision theory as opposed to looking like a more general discussion of the philosophical nature of counterfactuals. (That said, the conclusion of that decision theory discussion could certainly look like a prescription on the content of counterfactual reasoning in a certain context, e.g. maybe the decision theory discussion concludes with ”...Therefore, when making decisions, use FDT-type counterfactuals” or whatever.)
I think there is “machinery that underlies counterfactual reasoning”
I agree that counterfactual reasoning is contingent on certain brain structures, but I would say the same about logic as well and it’s clear that the logic of a kindergartener is very different from that of a logic professor—although perhaps we’re getting into a semantic debate—and what you mean is that the fundamental machinery is more or less the same.
I was initially assuming (by default) that if you’re trying to understand counterfactuals, you’re mainly trying to understand how this machinery works. But I’m increasingly confident that I was wrong, and that’s not in fact what you’re interested in. Instead it seems that your interests are more like “how would an AI, equipped with this kind of machinery, reach correct conclusions about the world?”
Yeah, this seems accurate. I see understanding the machinery as the first step towards the goal of learning to counterfactually reason well. As an analogy, suppose you’re trying to learn how to reason well. It might make sense to figure out how humans reason, but if you want to build a better reasoning machine and not just duplicate human performance, you’d want to be able to identify some of these processes as good reasoning and some as biases.
I don’t think there’s a clean separation between “good counterfactual reasoning” and “good reasoning in general”
I guess I don’t see why there would need to be a separation in order for the research direction I’ve suggested to be insightful. In fact, if there isn’t a separation, this direction could even be more fruitful as it could lead to rather general results.
If I say some counterfactual nonsense like “If the Earth were a flat disk, then the north pole would be in the center,” I think the reason it’s nonsense lives at the object-level, i.e. the detailed content of the thought in the context of everything else we know about the world
I would say (as a slight simplification) that our goal in studying counterfactual reasoning should be to get counterfactuals to a point where we can answer questions about them using our normal reasoning.
I think that “what counterfactuals make sense in the context of decision-making” is a decision theory question, not a counterfactuals question, and I expect a good answer to look like explicit discussions of decision theory as opposed to looking like a more general discussion of the philosophical nature of counterfactuals
That post certainly seems to contain an awful lot of philosophy to me. And I guess even though this post and my post On the Nature of Counterfactuals don’t make any reference to decision theory, that doesn’t mean that it isn’t in the background influencing what I write. I’ve written a lot of posts here, many of which discuss specific decision theory questions.
I guess I would still consider Joe Carlsmith’s post a high-quality post if it had focused exclusively on the more philosophical aspects. And I guess philosophical arguments are harder to evaluate than mathematical ones and it can be disconcerting for some people, especially those used to the certainty of mathematics, but I believe it’s possible to get to the level where you can avoid formalisation things a lot of the time because you have enough experience to know how things will shake out.
Although I suppose in this case my reason for avoiding formalisation is that I see premature formalisation as a critical error. Once someone has produced a formal theory they will feel psychologically compelled to defend it, especially if it mathematically beautiful, so I believe it’s important to be very careful about making sure the assumptions are right before attempting to formalise anything.
How much are you interested in a positive vs normative theory of counterfactuals? For example, do you feel like you understand how humans do counterfactual reasoning, and how and why it works for them (insofar as it works for them)? If not, is such an understanding what you’re looking for? Or do you think humans are not perfect at counterfactual reasoning (e.g. maybe because people disagree with each other about Newcomb’s problem etc.) and there’s some deep notion of “correct counterfactual reasoning” that humans are merely approximating, and the deeper “correct” thing is what you really care about?
(For my part I’m somewhat skeptical that there is a notion of counterfactuals that is fundamentally different from and better than what humans do.)
Update: I should further clarify that even though I provided a rough indication of how important I consider various approaches, this is off-the-cuff and I could be persuaded an approach was more valuable than I think, particularly if I saw good quality work.
I guess my ultimate interest is normative as the whole point of investigating this area is to figure out what we should do.
However, I am interested in descriptive theories insofar as they can contribute to this investigation (and not insofar as the details aren’t useful for normative theories). For example, when I say that counterfactuals only make sense from within the counterfactual perspective and further that counterfactuals are ultimately grounded as an evolutionary adaption I’m making descriptive statements. The latter seems to be more of a positive statement, while the former doesn’t seem to be (it seems to be justified by philosophical reasoning more than empirical investigation). In any case, it feels like there is more work to be done in taking these high-level abstract statements and making them more precise.
I think that further investigation here could be useful—although not in the sense that 40% use this style of reasoning and 60% use this style—exact percentages aren’t the relevant things here—at least not at this early stage. I’d also lean towards saying that how experts operate is more important than average humans and that the behavior of especially stupid humans is probably of limited importance.
I guess I see the behaviour of normal humans mattering for two reasons:
a) Firstly because I see making use of counterfactuals as evolutionarily grounded (in a more primitive form than the highly cognitive and mathematically influenced versions that we tend to use on LW)
b) Secondly because the experts are more likely to discard intuitions that don’t agree with their theories. And I think we need to use our reasoning to produce a consistent theory from our intuitions at some point, but this may be less than ideal if we’re simply trying to collect various intuitions as raw data to later turn into a theory.
I should clarify: in the above discussion, I’m commenting on what I’m interested in, rather than what’s in scope. The scope of the prize is the proposition that counterfactuals only make sense within themselves. And I guess part of what I was trying to clarify above is that empirical investigation can be relevant when carefully chosen. Happy to provide additional clarification if you were planning to submit a post covering something specific.
I guess my position on this is complex as I believe that counterfactuals only make sense in terms of themselves. So I don’t think there is a “true” notion of counterfatuals that exists within the ontology, rather I see them as a heuristic ultimately grounded by evolution. That said, our instinct to systematise and use logic to make things more coherent is also grounded in evolution.
People often hold vastly different perspectives on what counts as “fundamentally different” from something else. That said, I believe we should one-box on Newcomb’s problem (do you?) and I guess that seems fundamentally different from how humans who are trained on traditional decision theory/classical physics think. On the other hand, it may not be fundamentally different from how more untutored and instinctual individuals woudl behave. I guess I’d be curious where you stand here.
I think brains build a generative world-model, and that world-model is a certain kind of data structure, and “counterfactual reasoning” is a class of operations that can be performed on that data structure. (See here.) I think that counterfactual reasoning relates to reality only insofar as the world-model relates to reality. (In map-territory terminology: I think counterfactual reasoning is a set of things that you can do with the map, and those things are related to the territory only insofar as the map is related to the territory.)
I also think that there are lots of specific operations that are all “counterfactual reasoning” (just as there are lots of specific operations that are all “paying attention”—paying attention to what?), and once we do a counterfactual reasoning operation, there are also a lot of things that we can do with the result of the operation. I think that, over our lifetimes, we learn metacognitive heuristics that guide these decisions (i.e. exactly what “counterfactual reasoning”-type operations to do and when, and what to do with the result of the operation), and some people’s learned metacognitive heuristics are better than others (from the perspective of achieving such-and-such goal).
Analogy: If you show me a particular trained ConvNet that misclassifies a particular dog picture as a cat, I wouldn’t say that this reveals some deep truth about the nature of image classification, and I wouldn’t conclude that there is necessarily such a thing as a philosophically-better type of image classifier that fundamentally doesn’t ever make mistakes like that. (The brain image classifier makes mistakes too, albeit different mistakes than ConvNets make, but that’s besides the point.) Instead I would be more inclined to look for a very complicated explanation of the mistake, related to details of its training data and so on.
By the same token: if someone makes a poor decision on Newcomb’s problem, I don’t think that reveals some deep truth about the nature of counterfactual reasoning, and I wouldn’t conclude that there is necessarily such a thing as a philosophically-better type of counterfactual reasoning that fundamentally doesn’t ever make mistakes like that. Instead I would be more inclined to look for a very complicated explanation of the mistake, related to the person’s life history, exactly how Newcomb’s problem was explained to them, exactly what their learned world-model looks like, etc.
And if I wanted to build an AGI that performed well on Newcomb’s problem, I would build the AGI first, and then have the AGI read Eliezer’s essays or whatever, same as if I wanted my (human) friend to perform well on Newcomb’s problem. :-)
Agreed. This is definitely something that I would like further clarity on
I guess the real-world reasons for a mistake are sometimes not very philosophically insightful (ie. Bob was high when reading the post, James comes from a Spanish speaking background and they use their equivalent of a word differently than English-speakers, Sarah has a terrible memory and misremembered it)
I’m guessing like your position might be that there are just mistakes and there aren’t mistakes that are more philosophically fruitful or less fruitful? There’s just mistakes. Is that correct? Or were you just responding to my specific claim that it might be useful to know how the average person responds to problems because we are evolved creatures? If so, then I definitely agree that we’d have to delve into the details and not just remain on the level of averages.
Update: Actually, I’ll add an analogy that might be helpful. Let’s suppose you didn’t know what a dog was. Actually, that’s kind of the case: once you start diving into any definition you end up running into fuzzy cases, such as does a robotic dog count as a dog? Then if humans had built a bunch of different classifiers and you didn’t have access to the humans (say they went extinct) then you might want to analyse the different classifiers to try to figure out how humans defined the term dog, even though much of the behaviour might only tell you how the flaws tend to produce rather than about the human concept
Similarly, we don’t have exact access to our evolutionary history, but examining human intuitions about counterfactuals might provide insights about which heuristics have worked well, whilst also recognising that it’s hard, arguably impossible, to even talk about “working well” without embracing the notion of counterfactuals. And I agree that there are probably different ways we could emphasis various heuristics rather than a unique, principled solution.
I’m not claiming the situation is precisely this—in fact I’m not sure exactly how useful this analogy is—but I think it’s worth sharing anyway in case it lands.
Hmm, my hunch is that you’re misunderstanding me here. There are a lot of specific operations that are all “making a fist”. I can clench my fingers quickly or slowly, strongly or weakly, left hand or right hand, etc. By the same token, if I say to you “imagine a rainbow-colored tree; are its leaves green?”, there are a lot of different specific mental models that you might be invoking. (It could have horizontal rainbow stripes on the trunk, or it could have vertical rainbow stripes on its branches, etc.) All those different possibilities involve constructing a counterfactual mental model and querying it, in the same nuts-and-bolts way. I just meant, there are many possible counterfactual mental models that one can construct.
Suppose I ask “There’s a rainbow-colored tree somewhere in the world; are its leaves green?” You think for a second. What’s happening under the surface when you think about this? Inside your head are various different models pushing in different directions. Maybe there’s a model that says something like “rainbow-colored things tend to be rainbow-colored in all respects”. So maybe you’re visualizing a rainbow-colored tree, and querying the color of the leaves in that model, and this model is pushing on your visualized tree and trying to make it have a color scheme that’s compatible with the kinds of things you usually see, e.g. in cartoons, which would be rainbow-colored leaves. But there’s also a botany model that says “tree leaves tend to be green, because that’s the most effective for photosynthesis, although there are some exceptions like Japanese maples and autumn colors”. In scientifically-educated people, probably there will also be some metacognitive knowledge that principles of biology and photosynthesis are profound deep regularities in the world that are very likely to generalize , whereas color-scheme knowledge comes from cartoons etc. and is less likely to generalize.
So what’s at play is not “the nature of counterfactuals”, but the relative strengths of these three specific mental models (and many more besides) that are pushing in different directions. The way it shakes out will depend on the particular person and their life experience (and in particular, how much of a track-record of successful predictions these models have built up in similar contexts).
By the same token, I think every neurotypical human thinking about Newcomb’s problem is using counterfactual reasoning, and I think that there isn’t any interesting difference in the general nature of the counterfactual reasoning that they’re using. But the mental model of free will is different in different people, and the mental model of Omega is different in different people, etc.
Hmm, maybe we’re talking past each other a bit because of the learning-algorithm-vs-trained-model division. Understanding the learning algorithm is like being able to read and understand the the source code for a particular ML paper (and the PyTorch source code that it calls in turn). Understanding the trained model is like OpenAI microscope.
(It’s really “learning algorithm & inference algorithm”—the first changes the parameters, the second chooses what to do right now. I’m just calling it “learning algorithm” for short.)
I usually take the perspective that “the main event” is to understand the learning algorithm, because that’s what you need to build AGI, and that’s what the genome needs to build humans (thanks to within-lifetime learning), whereas understanding the trained model is “a sideshow”, unnecessary for building AGI, but still worth talking about for safety and whatnot.
On the “learning algorithm” side, I put “the basic capability to do counterfactual reasoning operations”. On the “trained model” side, I put all the learned heuristics about how reliable counterfactual reasoning is under what circumstance, and also all the learned concepts that go into a particular “counterfactual reasoning” operation (e.g. botany concepts, free will concepts, etc.)
Then when I brashly declare “I basically understand counterfactual reasoning”, I’m just talking about the stuff on the “learning algorithm” side. Whereas it seems that you feel like your project is to understand stuff on both sides—not only what a “counterfactual reasoning” operation is at a nuts-and-bolts level, but also all the other things that go into Newcomb’s problem, like whether there’s a “free will” concept in the world-model and what other concepts it’s connected to and how strongly (all of which can impact the results of a “counterfactual reasoning” operation). Then that research program seems to me to be more about normative decision theory and epistemology (e.g. “what to do in Newcomb’s problem”), rather than about the nature of counterfactual reasoning per se. Or I guess perhaps what you’re going for is closer to “practical advice that helps adult humans use counterfactual reasoning to reach correct conclusions”? In that case I’d be a bit surprised if there was much generically useful advice like that; I would expect that the main useful thing is object-level stuff like teaching better intuitions about the nature of free will etc.
I agree that there isn’t a single uniquely correct notion of a counterfactual. I’d say that we want different things from this notion and there are different ways to handle the trade-offs.
I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well.
Well, we need the information encoded in our DNA rather than than what is actually implemented in humans (clarification: what is implemented in humans is significantly influenced by society) though we aren’t at the level where we can access that by analysing the DNA directly or people’s brain structure for that matter, so we have to reverse engineer it from behaviour
I’ve very much focused on trying to understand how to solve these problems in theory rather than how can we correct any cognitive flaws in humans or on how to adapt decision theory to be easier or more convenient to use.
In so far as I’m interested in how average humans reason counterfactually, it’s mostly about trying to understand the various heuristics that are the basis of counterfactuals. I guess I believe that we need counterfactuals to understand and evaluate these heuristics, but I guess I’m hoping that we can construct something reflexively consistent.
I think there is “machinery that underlies counterfactual reasoning” (which incidentally happens to be the same as “the machinery that underlies imagination”). My quote above was saying that every human deploys this machinery when you ask them a question about pretty much any topic.
I was initially assuming (by default) that if you’re trying to understand counterfactuals, you’re mainly trying to understand how this machinery works. But I’m increasingly confident that I was wrong, and that’s not in fact what you’re interested in. Instead it seems that your interests are more like “how would an AI, equipped with this kind of machinery, reach correct conclusions about the world?” (After all, the machinery by itself can lead to both correct and incorrect conclusions—just as “thinking / reasoning in general” can lead to correct or incorrect conclusions.)
Given what (I think) you’re trying to do above, I’m somewhat skeptical that you’ll make progress by thinking about the philosophical nature of counterfactuals in general. I don’t think there’s a clean separation between “good counterfactual reasoning” and “good reasoning in general”. If I say some counterfactual nonsense like “If the Earth were a flat disk, then the north pole would be in the center,” I think the reason it’s nonsense lives at the object-level, i.e. the detailed content of the thought in the context of everything else we know about the world. I don’t think the problem with that nonsense thought can be diagnosed at the meta-level, i.e. by examining structural properties of its construction as a counterfactual or whatever.
So by the same token, I think that “what counterfactuals make sense in the context of decision-making” is a decision theory question, not a counterfactuals question, and I expect a good answer to look like explicit discussions of decision theory as opposed to looking like a more general discussion of the philosophical nature of counterfactuals. (That said, the conclusion of that decision theory discussion could certainly look like a prescription on the content of counterfactual reasoning in a certain context, e.g. maybe the decision theory discussion concludes with ”...Therefore, when making decisions, use FDT-type counterfactuals” or whatever.)
I agree that counterfactual reasoning is contingent on certain brain structures, but I would say the same about logic as well and it’s clear that the logic of a kindergartener is very different from that of a logic professor—although perhaps we’re getting into a semantic debate—and what you mean is that the fundamental machinery is more or less the same.
Yeah, this seems accurate. I see understanding the machinery as the first step towards the goal of learning to counterfactually reason well. As an analogy, suppose you’re trying to learn how to reason well. It might make sense to figure out how humans reason, but if you want to build a better reasoning machine and not just duplicate human performance, you’d want to be able to identify some of these processes as good reasoning and some as biases.
I guess I don’t see why there would need to be a separation in order for the research direction I’ve suggested to be insightful. In fact, if there isn’t a separation, this direction could even be more fruitful as it could lead to rather general results.
I would say (as a slight simplification) that our goal in studying counterfactual reasoning should be to get counterfactuals to a point where we can answer questions about them using our normal reasoning.
That post certainly seems to contain an awful lot of philosophy to me. And I guess even though this post and my post On the Nature of Counterfactuals don’t make any reference to decision theory, that doesn’t mean that it isn’t in the background influencing what I write. I’ve written a lot of posts here, many of which discuss specific decision theory questions.
I guess I would still consider Joe Carlsmith’s post a high-quality post if it had focused exclusively on the more philosophical aspects. And I guess philosophical arguments are harder to evaluate than mathematical ones and it can be disconcerting for some people, especially those used to the certainty of mathematics, but I believe it’s possible to get to the level where you can avoid formalisation things a lot of the time because you have enough experience to know how things will shake out.
Although I suppose in this case my reason for avoiding formalisation is that I see premature formalisation as a critical error. Once someone has produced a formal theory they will feel psychologically compelled to defend it, especially if it mathematically beautiful, so I believe it’s important to be very careful about making sure the assumptions are right before attempting to formalise anything.