I’ve watched your talk at SERI now.
One question I have is how you hope to define a good notion of “acceptable” without a notion of intent. In your talk, you mention looking at why the model does what it does, in addition to just looking at what it does. This makes sense to me (I talk about similar things), but, it seems just about as fraught as the notion of mesa-objective:
It requires approximately the same “magic transparency tech” as we need to extract mesa-objectives.
Even with magical transparency tech, it requires additional insight as to which reasoning is acceptable vs unacceptable.
If you are pessimistic about extracting mesa-objectives, why are you optimistic about providing feedback about how to reason? More generally, what do you think “acceptability” might look like?
(By no means do I mean to say your view is crazy; I am just looking for your explanation.)
Thanks, very interesting!
I still think that a good modification to the progress equation would be to lose progress if population dips below some number, and, that this predicts that severe population crashes will be amplified even further as progress is lost.
Or, in less formal terms: I still think knowledge can be lost quickly in times of chaos, particularly when population takes a nose-dive.
So I believe in the gears of the theory I was referring to, even if it’s not the “primary cause”?
Right. My point is just that the state will heavily favor centralized technology to address these challenges, because it prefers to maintain control. Seeing like a State illustrates how this can result in much worse productivity overall (in contrast to pre-existing noncentralized systems), while still being much better for the state (due to increased tax revenue, and diminished risk of rebellion).
From the perspective of the state, you want to tax that excess, and store as much of it as you can for lean times (at which point you do hand it back out to the people, to preserve the population). This was a major function of bronze age states. So yeah, it increases robustness, but the state still isn’t really incentivized to let people keep wealth (except for the key players, which the state has to make happy to avoid coups).
In Dictators Handbook, Bruce Bueno de Mesquita presents evidence that centralized authoritarian governments (like most bronze age governments) tend to avoid enriching their citizens if they can accumulate resources without doing so. In the other hand, if the people themselves are the only available source of wealth (ie if natural resources are scarce and a state’s economy must therefore rely on skilled labor and trade), the state will tend to become less authoritarian, I think.
Interesting, any references?
(Meta: was this meant to be a question?)
I originally conceived of it as such, but in hindsight, it doesn’t seem right.
In contrast, the generalization-focused approach puts less emphasis on the assumption that the worst catastrophes are intentional.I don’t think this is actually a con of the generalization-focused approach.
In contrast, the generalization-focused approach puts less emphasis on the assumption that the worst catastrophes are intentional.
I don’t think this is actually a con of the generalization-focused approach.
By no means did I intend it to be a con. I’ll try to edit to clarify. I think it is a real pro of the generalization-focused approach that it does not rely on models having mesa-objectives (putting it in Evan’s terms, there is a real possibility of addressing objective robustness without directly addressing inner alignment). So, focusing on objective robustness seems like a potential advantage—it opens up more avenues of attack. Plus, the generalization-focused approach requires a much weaker notion of “outer alignment”, which may be easier to achieve as well.
But, of course, it may also turn out that the only way to achieve objective robustness is to directly tackle inner alignment. And it may turn out that the weaker notion of outer alignment is insufficient in reality.
Are you the historical origin of the robustness-centric approach? I noticed that Evan’s post has the modified robustness-centric diagram in it, but I don’t know if it was edited to include that. The “Objective Robustness and Inner Alignment Terminology” post attributes it to you (at least, attributes a version of it to you). (I didn’t look at the references there yet.)
I’m currently fairly swayed that the alignment forum should be more “open” in the sense Peter intends:
Per my definition of closed, no academic discussion is closed, because anyone in theory can get a paper accepted to a journal/conference, attend the related meaning, and participate in the discourse. I am not actually talking about visibility to the broader public, but rather the access of any individual to the discourse, which feels more important to me.
However, I am not sure how to accomplish this. (Specifically, I am not sure how to accomplish this without too much added work, and maintaining other properties we want the forum to have.)
If there were a “curated posts” system on the alignment forum, I would nominate this for curation. I think it’s a great post.
All of which I really should have remembered, since it’s all stuff I have known in the past, but I am a doofus. My apologies.(But my error wasn’t being too mired in EDT, or at least I don’t think it was; I think EDT is wrong. My error was having the term “counterfactual” too strongly tied in my head to what you call linguistic counterfactuals. Plus not thinking clearly about any of the actual decision theory.)
All of which I really should have remembered, since it’s all stuff I have known in the past, but I am a doofus. My apologies.
(But my error wasn’t being too mired in EDT, or at least I don’t think it was; I think EDT is wrong. My error was having the term “counterfactual” too strongly tied in my head to what you call linguistic counterfactuals. Plus not thinking clearly about any of the actual decision theory.)
I’m glad I pointed out the difference between linguistic and DT counterfactuals, then!
It still feels to me as if your proof-based agents are unrealistically narrow. Sure, they can incorporate whatever beliefs they have about the real world as axioms for their proofs—but only if those axioms end up being consistent, which means having perfectly consistent beliefs. The beliefs may of course be probabilistic, but then that means that all those beliefs have to have perfectly consistent probabilities assigned to them. Do you really think it’s plausible that an agent capable of doing real things in the real world can have perfectly consistent beliefs in this fashion?
I’m not at all suggesting that we use proof-based DT in this way. It’s just a model. I claim that it’s a pretty good model—that we can often carry over results to other, more complex, decision theories.
However, if we wanted to, then yes, I think we could… I agree that if we add beliefs as axioms, the axioms have to be perfectly consistent. But if we use probabilistic beliefs, those probabilities don’t have to be perfectly consistent; just the axioms saying which probabilities we have. So, for example, I could use a proof-based agent to approximate a logical-induction-based agent, by looking for proofs about what the market expectations are. This would be kind of convoluted, though.
It’s obvious how ordinary conditionals are important for planning and acting (you design a bridge so that it won’t fall down if someone drives a heavy lorry across it; you don’t cross a bridge because you think the troll underneath will eat you if you cross), but counterfactuals? I mean, obviously you can put them in to a particular problem
All the various reasoning behind a decision could involve material conditionals, probabilistic conditionals, logical implication, linguistic conditionals (whatever those are), linguistic counterfactuals, decision-theoretic counterfactuals (if those are indeed different as I claim), etc etc etc. I’m not trying to make the broad claim that counterfactuals are somehow involved.
The claim is about the decision algorithm itself. The claim is that the way we choose an action is by evaluating a counterfactual (“what happens if I take this action?”). Or, to be a little more psychologically realistic, the cashed values which determine which actions we take are estimated counterfactual values.
What is the content of this claim?
A decision procedure is going to have (cashed-or-calculated) value estimates which it uses to make decisions. (At least, most decision procedures work that way.) So the content of the claim is about the nature of these values.
If the values act like Bayesian conditional expectations, then the claim that we need counterfactuals to make decisions is considered false. This is the claim of evidential decision theory (EDT).
If the values are still well-defined for known-false actions, then they’re counterfactual. So, a fundamental reason why MIRI-type decision theory uses counterfactuals is to deal with the case of known-false actions.
However, academic decision theorists have used (causal) counterfactuals for completely different reasons (IE because they supposedly give better answers). This is the claim of causal decision theory (CDT).
My claim in the post, of course, is that the estimated values used to make decisions should match the EDT expected values almost all of the time, but, should not be responsive to the same kinds of reasoning which the EDT values are responsive to, so should not actually be evidential.
Could you give a couple of examples where counterfactuals are relevant to planning and acting without having been artificially inserted?
It sounds like you’ve kept a really strong assumption of EDT in your head; so strong that you couldn’t even imagine why non-evidential reasoning might be part of an agent’s decision procedure. My example is the troll bridge: conditional reasoning (whether proof-based or expectation-based) ends up not crossing the bridge, where counterfactual reasoning can cross (if we get the counterfactuals right).
The thing you call “proof-based decision theory” involves trying to prove things of the form “if I do X, I will get at least Y utility” but those look like ordinary conditionals rather than counterfactuals to me too.
Right. In the post, I argue that using proofs like this is more like a form of EDT rather than CDT, so, I’m more comfortable calling this “conditional reasoning” (lumping it in with probabilistic conditionals).
The Troll Bridge is supposed to show a flaw in this kind of reasoning, suggesting that we need counterfactual reasoning instead (at least, if “counterfactual” is broadly understood to be anything other than conditional reasoning—a simplification which mostly makes sense in practice).
though this is pure prejudice and maybe there are better reasons for it than I can currently imagine: we want agents that can act in the actual world, about which one can generally prove precisely nothing of interest
Oh, yeah, proof-based agents can technically do anything which regular expectation-based agents can do. Just take the probabilistic model the expectation-based agents are using, and then have the proof-based agent take the action for which it can prove the highest expectation. This isn’t totally slight of hand; the proof-based agent will still display some interesting behavior if it is playing games with other proof-based agents, dealing with Omega, etc.
At any rate, right now “passing Troll Bridge” looks to me like a problem applicable only to a very specific kind of decision-making agent, one I don’t see any particular reason to think has any prospect of ever being relevant to decision-making in the actual world—but I am extremely aware that this may be purely a reflection of my own ignorance.
Even if proof-based decision theory didn’t generalize to handle uncertain reasoning, the troll bridge would also apply to expectation-based reasoners if their expectations respect logic. So the narrow class of agents for whome it makes sense to ask “does this agent pass the troll bridge” are basically agents who use logic at all, not just agents who are ristricted to pure logic with no probabilistic belief.
Agreed. The asymmetry needs to come from the source code for the agent.
In the simple version I gave, the asymmetry comes from the fact that the agent checks for a proof that x>y before checking for a proof that y>x. If this was reversed, then as you said, the Lobian reasoning would make the agent take the 10, instead of the 5.
In a less simple version, this could be implicit in the proof search procedure. For example, the agent could wait for any proof of the conclusion x>y or y>x, and make a decision based on whichever happened first. Then there would not be an obvious asymmetry. Yet, the proof search has to go in some order. So the agent design will introduce an asymmetry in one direction or the other. And when building theorem provers, you’re not usually thinking about what influence the proof order might have on which theorems are actually true; you usually think of the proofs as this static thing which you’re searching through. So it would be easy to mistakenly use a theorem prover which just so happens to favor 5 over 10 in the proof search.
While I agree that the algorithm might output 5, I don’t share the intuition that it’s something that wasn’t ‘supposed’ to happen, so I’m not sure what problem it was meant to demonstrate.
OK, this makes sense to me. Instead of your (A) and (B), I would offer the following two useful interpretations:
1: From a design perspective, the algorithm chooses 5 when 10 is better. I’m not saying it has “computed argmax incorrectly” (as in your A); an agent design isn’t supposed to compute argmax (argmax would be insufficient to solve this problem, because we’re not given the problem in the format of a function from our actions to scores), but it is supposed to “do well”. The usefulness of the argument rests on the weight of “someone might code an agent like this on accident, if they’re not familiar with spurious proofs”. Indeed, that’s the origin of this code snippet—something like this was seriously proposed at some point.
2: From a descriptive perspective, the code snippet is not a very good description of how humans would reason about a situation like this (for all the same reasons).
When I try to examine my own reasoning, I find that when I do so, I’m just selectively blind to certain details and so don’t notice any problems. For example: suppose the environment calculates “U=10 if action = A; U=0 if action = B” and I, being a utility maximizer, am deciding between actions A and B. Then I might imagine something like “I chose A and got 10 utils”, and “I chose B and got 0 utils”—ergo, I should choose A.
Right, this makes sense to me, and is an intuition which I many people share. The problem, then, is to formalize how to be “selectively blind” in an appropriate way such that you reliably get good results.
Yep, agreed. I used the language “false antecedents” mainly because I was copying the language in the comment I replied to, but I really had in mind “demonstrably false antecedents”.
I like the alief/belief distinction, this seems to carry the distinction I was after. To make it more formal, I’ll use “belief” to refer to ‘things which an agent can prove in its reasoning engine/language (L)‘, and “alief” to refer to beliefs plus ‘additional assumptions which the agent makes about the bearing of that reasoning on the environment’, which together constitute a larger logic (L’). Does that match the distinction you intended between these terms?
Unfortunately, this seems almost opposite to the way I was defining the terminology. I had it that the aliefs are precisely what is proven by the formal system, and the beliefs are what the agent would explicitly endorse if asked. Aliefs are what you feel in your bones. So if the “bones” of the agent are the formal system, that’s the aliefs.
Note also that your definition implies that if an agent alieves something, it must also believe it. In contrast, part of the point for me is that an agent can alieve things without believing them. I would also allow the opposite, for humans and other probabilistic reasoners, though for pure-logic agents this would have to correspond to unsoundness. But pure-logical agents have to have the freedom to alieve without believing, on pain of inconsistency, even if we can’t model belief-without-alief in pure logic.
I find it interesting that you (seemingly) nodded along with my descriptions, but then proposed a definition which was almost opposite mine! I think there’s probably a deep reason for that (having to do with how difficult it is to reliably distinguish alief/belief), but I’m not grasping it for now. It is a symptom of my confusion in this regard that I’m not even sure we’re pointing to different notions of belief/alief even though your definition sounds almost opposite to me. It is well within the realm of possibility that we mean the same thing, and are just choosing very different ways to talk about it.
Specifically, your definition seems fine if L is not the formal language which the agent is hard-wired with, but rather, some logic which the agent explicitly endorses (like the relationship that we have with Peano Arithmetic). Then, yeah, “belief” is about provability in L, while “alief” implies that the agent has some “additional assumptions about the bearing of that reasoning on the environment”. Totally! But then, this suggests that those additional assumptions are somehow represented in some other subsystem of the agent (outside of L). The logic of that other subsystem is what I’m interested in. If that other subsystem uses L’, then it makes sense that the agent explicitly believes L. But now the aliefs of the agent are represented by L’. That is: L’ is the logic within the agent’s bones. So it’s L’ that I’m talking about when I define “aliefs” as the axiomatic system, and “beliefs” as more complicated (opposite to your definition).
Over this discussion, a possible interpretation of what you’re saying that’s been in the back of my mind has been that you think agents should not rely on formal logic in their bones, but rather, should use formal logic only as part of their explicit thinking IE a form of thinking which other, “more basic” reasoning systems use as a tool (a tool which they can choose to endorse or not). For example, you might believe that an RL system decides whether to employ logical reasoning. Or deep learning. Etc. In this case, you might say that there is no L’ to find (no logical system represents the aliefs).
To be clear, I do think this is a kind of legitimate response to the Lobstacle: the response of rejecting logic (as a tool for understanding what’s going on in an agent’s “bones” IE their basic decision architecture). This approach says: “Look what happens when you try to make the agent overly reliant on logic! Don’t do that!”
However, the Lobstacle is specifically about using logic to describe or design decision-making procedures. So, this ‘solution’ will not be very satisfying for people trying to do that. The puzzling nature of the Lobstacle remains: the claim is that RL (or something) has to basically solve the problem; we can’t use logic. But why is this? Is it because agents have to “be irrational” at some level (ie, their basic systems can’t conform to the normative content of logic)?
Anyway, this may or may not resemble your view. You haven’t explicitly come out and said anything like that, although it does seem like you think there should be a level beyond logic in some sense.
An immediate pedagogical problem with this terminology is that we have to be careful not to conflate this notion of belief with the usual one: an agent will still be able to prove things in L even if it doesn’t believe (in the conventional sense) that the L-proof involved is valid.
It seems like you think this property is so important that it’s almost definitional, and so, a notion of belief which doesn’t satisfy it is in conflict with the usual notion of belief. I just don’t have this intuition. My notion of belief-in-contrast-to-alief does contrast with the informal notion, but I would emphasize other differences:
“belief” contrasts with the intuitive notion of belief in that it much less implies that you’ll act on a belief; our intuitive notion more often predicts that you’ll act on something if you really believe it.
“alief” contrasts with the intuitive notion of belief in that it much less implies that you’ll explicitly endorse something, or even be able to articulate it in language at all.
In other words, I see the intuitive notion of belief as really consisting of two parts, belief and alief, which are intuitively assumed to go together but which we are splitting apart here.
The property you mention is derived from this conflation, because in fact we need to alieve a reasoning process in order to believe its outputs; so if we’re conflating alief and belief, then it seems like we need to believe that L-proofs are valid in order to see L-proofs and (as a direct result) come to believe what’s been proved.
But this is precisely what’s at stake, in making the belief/alief distinction: we want to point out that this isn’t a healthy relationship to have with logic (indeed, Godel shows how it leads to inconsistency).
There is a more serious formalization issue at play, though, which is the problem of expressing a negative alief. How does one express that an agent “does not alieve that a proof of X in L implies that X is true”? ¬(□X→X) is classically equivalent to □X∧¬X, which in particular is an assertion of both the existence of a proof of X and the falsehood of X, which is clearly far stronger than the intended claim. This is going off on a tangent, so for now I will just assume that it is possible to express disaliefs by introducing some extra operators in L’ and get on with it.
I like that you pointed this out, but yeah, it doesn’t seem especially important to our discussion. In any case, I would define disalief as something like this:
The agent’s mental architecture lacks a deduction rule accepting proofs in L, or anything tantamount to such a rule.
Note that the agent might also disbelieve in such a rule, IE, might expect some such proofs to have false conclusions. But this is, of course, not necessary. In particular it would not be true in the case of logics which the agent explicitly endorses (and therefore must not alieve).
Yes. The mathematical proofs and Löb’s theorem are absolutely fine. What I’m refuting is their relevance; specifically the validity of this claim:An agent can only trust another agent if it believes that agent’s aliefs.My position is that *when their beliefs are sound* an agent only ever needs to *alieve* another agents *beliefs* in order to trust them.
Yes. The mathematical proofs and Löb’s theorem are absolutely fine. What I’m refuting is their relevance; specifically the validity of this claim:
An agent can only trust another agent if it believes that agent’s aliefs.
My position is that *when their beliefs are sound* an agent only ever needs to *alieve* another agents *beliefs* in order to trust them.
Hrm. I suspect language has failed us, and we need to make some more distinctions (but I am not sure what they are).
In my terms, if A alieves B’s beliefs, then (for example) if B explicitly endorses ZFC, then A must be using ZFC “in its bones”. But if B explicitly endorses ZFC, then the logic which B is using must be more powerful than that. So A might be logically weaker than B (if A only alieves ZFC, and not the stronger system which B used to decide that ZFC is sound). If so, A cannot trust B (A does not alieve that B’s reasoning is sound, that is, A does not believe B).
I have to confess that I’m confusing myself a bit, and am tempted to give yet another (different, incompatible) definition for the alief/belief split. I’ll hold off for now, but I hope it’s very clear that I’m not accusing all confusion about this as coming from you—I’m aiming to minimize confusion, but I still worry that I’ve introduced contradictory ideas in this conversation. (I’m actually tempted to start distinguishing 3 levels rather than 2! Alief/belief are relative terms, and I worry we’re actually conflating important subtleties by using only 2 terms rather than having multple levels...)
A definition of trust which fails to be reflexive is clearly a bad definition, and with this modified definition there is no obstacle
This goes back to the idea that you seem to think “belief in X implies belief in the processes whereby you came to believe in X” is so important as to be definitional, where I think this property has to be a by-product of other things.
In my understanding, the definition of “trust” should not explicitly allow or disallow this, if we’re going to be true to what “trust” means. Rather, for the Lobstacle, “A trusts B” has to be defined as “A willingly relies on B to perform mission-critical tasks”. This definition does indeed fail to be true for naive logical agents. But this should be an argument against naive logical agents, not our notion of trust.
Hence my perception that you do indeed have to question the theorems themselves, in order to dispute their “relevance” to the situation. The definition of trust seems fixed in place to me; indeed, I would instead have to question the relevance of your alternative definition, since what I actually want is the thing studied in the paper (IE being able to delegate critical tasks to another agent).
Note that following the construction in the article, the secondary agent B can only act on the basis of a valid L-proof, so there is no need to distinguish between trusting what B says (the L-proofs B produces) and what B does (their subsequent action upon producing an L-proof).
Ok, but if agent B can only act on valid L-proofs, it seems like agent B has been given a frontal lobotomy (IE this is just the “make sure my future self is dumber” style of solution to the problem).
Or, on the other hand, if the agent A also respects this same restriction, then A cannot delegate tasks to B (because A can’t prove that it’s OK to do so, at least not in L, the logic which it has been restricted to use when it comes to deciding how to act).
Which puts us back in the same lobstacle.
Attaining this guarantee in practice, so as to be able to trust that B will do what they have promised to do, is a separate but important problem. In general, the above notion of trust will only apply to what another agent says, or more precisely to the proofs they produce.
Is this a crux for you? My thinking is that this is going to be a deadly sticking point. It seems like you’re admitting that your approach has this problem, but, you think there’s value in what you’ve done so far because you’ve solved one part of the problem and you think this other part could also work with time. Is that what you’re intending to say? Whereas to me, it looks like this other part is just doomed to fail, so I don’t see what the value in your proposal could be.
For me, solving the Lobstacle means being able to actually decide to delegate.
I was taking “reasoning” here to mean “applying the logic L” (so manipulating statements of belief), since any assumptions lying strictly in L’ are only applied passively. It feels strange to me to extend “reasoning” to include this implicit stuff, even if we are including it in our formal model of the agent’s behaviour.
I think I get what you’re saying here. But if the assumptions lying strictly in L’ are only applied passively, how does it help us? I’m thinking your current answer is (as you’ve already said) “trusting that B will do what they’ve said they’ll do is a separate problem”—IE you aren’t even trying to build the full bridge between B thinking something is a good idea and A trusting B with such tasks.
Both A and B “reason” in L’ (B could even work in a distinct extension of L), but will only accept proofs in the fragment L.[...]But then, your bolded statement seems to just be a re-statement of the Löbstacle: logical agents can’t explicitly endorse their own logic L’ which they use to reason, but rather, can only generally accept reasoning in some weaker fragment L.It’s certainly a restatement of Löb’s theorem. My assertion is that there is no resultant obstacle.
Both A and B “reason” in L’ (B could even work in a distinct extension of L), but will only accept proofs in the fragment L.[...]But then, your bolded statement seems to just be a re-statement of the Löbstacle: logical agents can’t explicitly endorse their own logic L’ which they use to reason, but rather, can only generally accept reasoning in some weaker fragment L.
Both A and B “reason” in L’ (B could even work in a distinct extension of L), but will only accept proofs in the fragment L.
But then, your bolded statement seems to just be a re-statement of the Löbstacle: logical agents can’t explicitly endorse their own logic L’ which they use to reason, but rather, can only generally accept reasoning in some weaker fragment L.
It’s certainly a restatement of Löb’s theorem. My assertion is that there is no resultant obstacle.
I still really, really don’t get why your language is stuff like “there’s no resultant obstacle” and “what I’m refuting is their relevance”. Your implication is “there was not a problem to begin with” rather than “I have solved the problem”. I asked whether you objected to details of the math in the original paper, and you said no—so apparently you would agree with the result that naive logical agents fail to trust their future self (which is the lobstacle!). Solving the lobstacle would presumably involve providing an improved agent design which would avoid the problem. Yet this seems like it’s not what you want to do—instead you claim something else, along the lines of claiming that there’s not actually a problem?
So, basically, what do you claim to accomplish? I suspect I’m still really misunderstanding that part of it.
Re the rest,(And I also don’t yet see what that part has to do with getting around the Löbstacle.)It’s not relevant to getting around the Löbstacle; this part of the discussion was the result of me proposing a possible advantage of the perspective shift which (I believe, but have yet to fully convince you) resolves the Löbstacle. I agree that this part is distracting, but it’s also interesting, so please direct message me (via whatever means is available on LW, or by finding me on the MIRIx Discord server or AI alignment Slack) if you have time to discuss it some more.
Re the rest,
(And I also don’t yet see what that part has to do with getting around the Löbstacle.)
It’s not relevant to getting around the Löbstacle; this part of the discussion was the result of me proposing a possible advantage of the perspective shift which (I believe, but have yet to fully convince you) resolves the Löbstacle. I agree that this part is distracting, but it’s also interesting, so please direct message me (via whatever means is available on LW, or by finding me on the MIRIx Discord server or AI alignment Slack) if you have time to discuss it some more.
Since you’re now identifying it as another part of the perspective shift which you’re trying to communicate (rather than just some technical distraction), it sounds like it might actually be pretty helpful toward me understanding what you’re trying to get at. But, there are already a lot of points floating around in this conversation, so maybe I’ll let it drop.
I’m somewhat curious if you think you’ve communicated your perspective shift to any other person; so far, I’m like “there just doesn’t seem to be anything real here”, but maybe there are other people who get what you’re trying to say?
Yeah, interesting. I don’t share your intuition that nested counterfactuals seem funny. The example you give doesn’t seem ill-defined due to the nesting of counterfactuals. Rather, the antecedent doesn’t seem very related to the consequent, which generally has a tendency to make counterfactuals ambiguous. If you ask “if calcium were always ionic, would Nixon have been elected president?” then I’m torn between three responses:
“No” because if we change chemistry, everything changes.
“Yes” because counterfactuals keep everything the same as much as possible, except what has to change; maybe we’re imagining a world where history is largely the same, but some specific biochemistry is different.
“I don’t know” because I am not sure what connection between the two you are trying to point at with the question, so, I don’t know how to answer.
In the case of your Bach example, I’m similarly torn. On the one hand, if we imagine some weird connection between the ages of Back and Mozart, we might have to change a lot of things. On the other hand, counterfactuals usually try to keep thing fixed if there’s not a reason to change them. So the intention of the question seems pretty unclear.
Which, in my mind, has little to do with the specific nested form of your question.
More importantly, perhaps, I think Stalnaker and other philosophers can be said to be investigating linguistic counterfactuals; their chief concern is formalizing the way humans naively talk about things, in a way which gives more clarity but doesn’t lose something important.
My chief concern is decision-theoretic counterfactuals, which are specifically being used to plan/act. This imposes different requirements.
The philosophy of linguistic counterfactuals is complex, of course, but personally I really feel that I understand fairly well what linguistic counterfactuals are and how they work. My picture probably requires a little exposition to be comprehensible, but to state it as simply as I can, I think linguistic counterfactuals can always be understood as “conditional probabilities, but using some reference frame rather than actual beliefs”. For example, very often we can understand counterfactuals as conditional probabilities from a past belief state. “If it had rained, we would not have come” can’t be understood as a conditional probability of the current beliefs where we knew we did come; but back up time a little bit, and it’s true that if it had been raining, we would not have made the trip.
Backing up time doesn’t always quite work. In those cases we can usually understand things in terms of a hypothetical “objective judge” who doesn’t know details of a situation but who knows things a “reasonable third party” would know. It makes sense that humans would have to consider this detached perspective a lot, in order to judge social situations; so it makes sense that we would have language for talking about it (IE counterfactual language).
We can make sense of nested linguistic counterfactuals in that way, too, if we wish. For example, “if driving had [counterfactually] meant not making it to the party, then we wouldn’t have done it” says (on my understanding) that if a reasonable third person would have looked at the situation and said that if we drive we won’t make it to the party, then, we would not have driven. (This in turn says that my past self would have not driven if he had believed that a resonable third person wouldn’t believe that we would make it to the party, given the information that we’re driving.)
So, I think linguistic counterfactuals implicitly require a description of a third party / past self to be evaluated; this is usually obvious enough from conversation, but, can be an ambiguity.
However, I don’t think this analysis helps with decision-theoretic counterfactuals. At least, not directly.