Researcher at the Center on Long-Term Risk. All opinions my own.
Anthony DiGiovanni
Nice post!
This process typically involves some back-and-forth between our intuitive judgments about how a concept like “knowledge” or “morality” applies to a given case, and the more explicit and systematized analysis that the philosopher has proposed. Typically, intuitions are given substantive but finite amounts of weight, such that they can be revised/discarded if necessary; and considerations other than fit with intuitions (for example, the parsimony and intuitive appeal of the higher-level principles themselves) can play a role as well. The hope is to eventually reach a “reflective equilibrium” that best harmonizes one’s intuitions and one’s higher level analysis/understanding – though philosophers can still differ, here, in which form of harmonization they ultimately end up endorsing.
I’d be very interested in your takes on my critique of reflective equilibrium here.[1] I personally suspect it could be a philosophical mistake if we baked reflective equilibrium into the definition of “human-like philosophy” that AIs are supposed to do. (Not to say you’re claiming we should do that — you do say “typically”, which I appreciate.)
- ^
Some of my favorite examples of academic critiques:
Singer (1974) (largely an exegesis of Sidgwick)
McMahan (2013), sec. “A Sketch of a Foundationalist Conception of Moral Justification”
- ^
not just coextensive, but necessarily coextensive.
I meant to include necessarily coextensive, too.
It’s also epistemically rational to be certain of a tautology, even if you think (like I do) that being a tautology is a semantic fact
This seems like a non sequitur. I wasn’t claiming “norms of epistemic rationality can’t be functions of [whether a given proposition is a semantic fact]”. (Edit:) I was claiming “norms of epistemic rationality themselves aren’t merely semantic”. The semantic fact that X is a tautology doesn’t, by itself, dictate that you should have P(X) = 1. You need an (in this case, very obviously compelling) norm to tell you that.
Indeed, it’s hard to see where a priori justification could come from if not from semantic facts.
It can come from your intuitions in favor of some foundational norms. (Cf. “non-pragmatic principles” here.) Do you think that when, say, moral anti-realists talk about their bedrock moral intuitions, they’re just talking about semantic facts?
Freedom from human independent concepts is supposed to support the idea of a mathematical universe, not a relational one
Tegmark says (p. 10 here) “the only intrinsic properties of a mathematical structure are its relations”. So I think I am representing his actual argument for MUH in the OP. I think “the claim that the idea that some , but only some, maths exists materially, is unnecessary baggage” is meant to derive the Level IV multiverse from MUH.
It isn’t obvious that maths isn’t a human invention. It isn’t obvious that an external world has to be independent of human concepts, as well as human imagination. It isn’t obvious that maths can exist immaterially
FWIW I think I agree with all the non-obvious propositions here. I don’t think they’re the load-bearing premises of Tegmark’s argument.
Why I’m unconvinced by Tegmark’s argument for the mathematical universe hypothesis
The basic argument seems to be:
A complete description of external reality must be free of human-idiosyncratic concepts (“baggage”).
Any intrinsic properties of external reality, i.e. properties other than relational properties, are baggage.
Therefore a complete description of external reality must be purely relational. (And a mathematical structure is the only candidate for “purely relational”.)
I don’t see why we should buy (2).
As far as I know, there are two arguments for (2). The first: “Every time our theories of physics have improved, we’ve dissolved what we thought was ‘stuff’ into abstract relations between more fundamental stuff. That is, we’ve dissolved intrinsic properties into relational properties.”
But our evidence is “we keep dissolving concepts of non-fundamental intrinsic properties into relations”, not “we keep dissolving all the intrinsic properties that those concepts cover”. E.g. when we learned that the concept of an “atom” is nothing over and above a relation between subatomic particles, we weren’t getting rid of any of the fundamental stuff that composes an atom. The claim that there’s no fundamental stuff to be found is metaphysical, not something we have inductive evidence for.
Which brings us to the second argument: “We never directly observe this intrinsic fundamental stuff. Physics only needs relational properties to explain our observations. By Occam’s razor, we should drop the stuff.”
But I don’t know what it even means to have relations without stuff (e.g. parts of spacetime) to be related — to have structure without substance. So our observations do seem to entail intrinsic properties. I’m open to the correct ontology being very counterintuitive. But compared to the cost of dropping an ontological view as basic as “relations need stuff for the relations to apply to”, the simplicity gains from doing so seem pretty weak.
then that’s presumably a semantic fact about what you mean with these words
I don’t think so. This seems to conflate “A is coextensive with B” and “A is semantically identical to B”. My intuition in favor of assigning equal subjective probability to symmetric outcomes (assuming there’s a unique partition etc.) seems to be about the epistemic right-ness of doing so. (Where “right-ness” is compatible with normative anti-realism, I think.) That goes beyond semantics.
Thanks for clarifying! Back to your first comment then:
Worlds a) and b) and worlds a) and c) are incomparable from one standpoint because you are still radically clueless about alignment research being good
I’m still not sure I understand, sorry if I’m missing something basic. Let:
(a*) = “recommend donating to alignment research [without doing research about the sign of alignment beforehand]”
(b*) = “recommend not donating [without doing research about the sign of alignment beforehand]”
As you say, Lukas’s argument implies (a) > (b). (Independently of whether (b) ~ (c).) This holds even if (a*) is incomparable with (b*) (that’s precisely Lukas’s point). I don’t see how “you are still radically clueless about alignment research being good” — i.e., (a*) is incomparable with (b*) — tells me that (a) is incomparable with (b) (or that (a) incomp (c)).
Thanks! To clarify, my counterpoint wasn’t meant to address any objections to incomparability in principle. It was meant to respond to Lukas’s claim that even if you buy the arguments for incomparability in principle and for severe imprecision, some strategies are still comparable.
Anyway, to your point:
Worlds a) and b) and worlds a) and c) are incomparable from one standpoint because you are still radically clueless about alignment research being good.
Could you spell out this claim more? What does it mean for one pair of strategies ((a) and (b)) to be incomparable to another ((a) and (c))? It sounds like you want to say “from one standpoint, (b) and (c) are incomparable”, but I’m not sure.
Update for readers
As illustrated in Fig. 6 below, Alice adding her PMP never makes her worse off, provided that Bob responds at least as favorably to Alice if she adds the PMP than if she doesn’t, i.e. provided that Bob doesn’t punish Alice. And Bob has no incentive to punish Alice, since, for any fixed renegotiation set of Bob, Alice PMP-extending her set doesn’t add any outcomes that are strictly better for Bob yet worse for Alice than the default renegotiation outcome.
As pointed out by Lukas Finnveden, this argument is pretty questionable given the definition of PMP-extension we use. I’m hoping to write up an updated version of the paper that fixes this bug. Roughly, the fix (which I’m not very confident works, but seems promising) would be: Define a class P of CSR programs (Algo 2 here) that:
Against CSR programs outside of P, use a set-valued renegotiation function RN^def.
Against programs in P, use the PMP-extension of RN^def.
(ETA: The idea being, if Alice’s program unconditionally used the PMP-extension, Bob might have an incentive to use a more restrictive renegotiation set against Alice because she adds the PMP. And he can’t benefit from this move if Alice only adds the PMP conditional on him not punishing this.)
How to not do decision theory backwards
I’m a bit confused about what your methodology is, in the appendix. You start by saying:
We should demand that our decision theory gives the right answer “for the right reason”.
I agree! I think this is really important. I argue for something similar here. But then, later you say:
Here are the cases which make me skeptical of physically causal CDT: [lists bottom-line answers given by CDT in the cases]
This seems in tension with the first quote. If what we care about is the reasons for the decision recommended by the decision theory, shouldn’t we be focusing on questions like “What does it mean for my decision to have good consequences in the relevant sense? What do I care about when I say I want to ‘maximize EV’? What does this say about whether the relevant notion of EV is causal or evidential or [something else]?” Rather than “Does this decision theory recommend bottom-line answers that I intuitively agree with?”
(Maybe you’d say “those bottom-line answers are evidence about the right reasons”? See this part of the linked post for my response to that.)
The principle of indifference or the law of non-contradiction can be justified with an appeal to semantic intuitions.
How so? POI at the very least seems irreducibly normative to me. I’m not merely giving an analysis of the meaning of concepts like “arbitrary”, when I express my endorsement of POI.
When do intuitions need to be reliable?
In what sense do you “prefer” 1A to 1B, then?
In the sense that you would choose 1A over 1B if those were the only options in a one-shot decision. The money-pump arguments might constrain your patterns of choices over some sequences. But this tells you nothing about what you should choose in other contexts.
I’d also recommend Carlsmith’s discussion of similar problems with money-pump arguments here. (I have other problems with such arguments, but they’re tricky to unpack here.)
I remember a post on this topic by Caspar Oesterheld, but can’t find it now.
“The lack of performance metrics for CDT versus EDT, etc.” (An important post! FWIW I discuss some related implications for forecasting AIs’ decision theories here.)
I found this post valuable for stating an (apparently popular on LW) coherentist view pretty succinctly. But its dismissal of foundationalism seems way too quick:
It’s not at all “pretty clear” to me that “any foundations are also subject to justificatory work”. Foundationalist philosophers don’t pick arbitrary beliefs (or seemings/appearances/etc.) to call foundational. They try to identify foundations that seem intuitively to (1) not be the sort of thing that requires justification, and (2) provide justification for non-foundational beliefs.
Fair enough if one doesn’t find the arguments for such foundations persuasive. But I would think one should engage at least a bit with those arguments in a post like this.
There seems to be a misunderstanding of what foundationalism entails, in this quote: “We’re not going to find a set of axioms which just seem obvious to all humans once articulated. Rather, there’s some work to be done to make them seem obvious.” Foundationalists don’t deny this — something can be foundational in the sense of requiring no further justification when you fully understand it, without being obvious before you fully understand it.[1]
- ^
Relevant quote from Nye (2015): “The methodological approach I defend maintains that the direct plausibility or implausibility of principles about the ethical relevance of various factors is foundational in normative and practical ethics. This does not mean that appearances of direct plausibility are infallible. Principles often seem plausible only because we are making confusions and do not fully appreciate what they are really saying. On the approach I defend, much of the business of ethical reasoning consists in correcting erroneous appearances of plausibility by clarifying the content of principles, making crucial distinctions, and discovering alternatives with greater direct plausibility. Nor does the claim that the direct plausibility of principles is foundational mean that we should begin our ethical reasoning by considering only which principles seem plausible. The principles that turn out to be most plausible on reflection might be suggested to us only by first considering our intuitions about a variety of cases and then seeing which of them can be justified by principles that, once formulated and clarified, are directly plausible.”
Your example would be pretty telling and damning if we assume that you’re correct, but my guess is that most readers here will assume you’re wrong about it. Someone in your position could still be right, of course; I’m just saying that this wouldn’t yet be apparent to readers.
Fair enough! :) The parallel I had in mind was “[almost] no object level pushback”, or at least almost no object level pushback that I can tell is based on an accurate understanding of my arguments.
So far, I genuinely have not gotten much object-level pushback on the most load-bearing points of my sequence, so, I’m not that worried.
I think this probably underestimates the severity of founder effects in this community. A salient example to me is precise Bayesianism (and the cluster of epistemologies that try to “approximate” Bayesianism): As far as I can tell, the rationalist and EA communities went over a decade without anyone within these communities pushing back on the true weak points of this epistemology, which is extremely load-bearing for cause prioritization. I think in hindsight we should have been worried about missing this sort of thing.
Examples of awareness growth vs. logical updates
(Thanks to Lukas Finnveden for discussion that prompted these examples, and for authoring examples #3-#6 verbatim.)
A key concept in the theory of open-minded updatelessness (OMU) is “awareness growth”, i.e., conceiving of hypotheses you hadn’t considered before. It’s helpful to gesture at “discovering crucial considerations” as examples of awareness growth. But not all CC discoveries are awareness growth. And we might think we don’t need this OMU idea if awareness growth is just logical updating, i.e. you already had nonzero credence in some hypothesis, but you changed this credence purely by thinking more. What’s the difference? Here are some examples.
I realize it’s possible that I’m in a simulation.
Awareness growth. Because before realizing that, I simply hadn’t conceived of “I’m in a sim” as a way the world could be. (I don’t know how/why I could/should model myself-before-I-read-Bostrom as having had nonzero credence in “I’m in a sim”.)
I realize that the simulation argument implies (under such and such assumptions) I should have high credence in “I’m in a simulation”.
Logical update. Because I’m not discovering a new way the world could be, I’m just learning of a logical implication governing my credences over ways the world could be.
Someone has conceived of the idea that they might be in a simulation (“like a video game!”) but hasn’t considered that simulations might be done for scientific investigations of the past. This updates their view on how likely it is that they’re in a simulation, and also makes them think that more of the probability mass goes to the specific type “investigation of the past”-sim.
The first sentence is awareness growth “by refinement” (Steele and Stefansson (2021, Sec 3.3.2)). The more specific hypotheses you’re now aware of were covered by a more coarse hypothesis you were previously aware of. And the second sentence is a logical update.
I see a rainbow colored car. I had never previously explicitly pictured a rainbow colored car, or thought about those words.
Pretty straightforward awareness growth.
I’m forecasting the likelihood of regime change in a country. I look up the base rates. I forecast based on them and some inside-view adjustments. Later on, the news come in that there was a regime change due to chaos from a natural disaster. I hadn’t conceptualized that possible contribution to regime change! But fortunately I had implicitly accounted for it via base rates that (it turns out) included some examples of natural disasters.
Also pretty straightforward awareness growth. It’s just not the decision-relevant kind, to the extent that you already implicitly priced in “natural disasters lead to regime change” via the base rate. (I think in practice it will very often be ambiguous how precisely we’re pricing in hypotheses we’re unaware of, even if one might argue we’re always kinda sorta pricing them in if you squint.)
Someone learns Newtonian physics.
A mix of awareness growth and a logical update. E.g., the exact law “F = ma” wouldn’t have occurred to most people before learning any Newtonian physics — that’s awareness growth. But some people might’ve both (a) conceived of an object getting pushed in a vacuum never slowing down, yet (b) not assigned very high P(object never slows down | object is pushed in a vacuum) before learning this was a law — that’s a logical update.
tendency to “bite bullets” or accepting implications that are highly counterintuitive to others or even to themselves, instead of adopting more uncertainty
I find this contrast between “biting bullets” and “adopting more uncertainty” strange. The two seem orthogonal to me, as in, I’ve ~just as frequently (if not more often) observed people overconfidently endorse their pretheoretic philosophical intuitions, in opposition to bullet-biting.
(I think there are probably some significant differences in general philosophical framings that make this discussion tricky, so I’m not sure if I’ll reply further.)
I would say, we also know what we value — broadly construed, i.e., meant to encompass epistemic and decision-theoretic norms too. These normative attitudes are things we can directly introspect, and they’re neither merely semantic (as people usually use that term) nor empirical.
If you want to say claims about our values are really just semantic claims about our evaluative concepts, then I think:
That will be confusing to most people, based on how we usually use the term “semantic”.
That loses something important about the substance of bedrock normative attitudes: They’re, well, attitudes. To me it seems clear that a claim like “Other things being equal, suffering is bad” is doing something more than expressing meanings of terms. Someone who says “I understand what bad means, but I don’t see why suffering is bad” would be making a normative error by our lights, not a semantic one — unlike “I understand what bachelor means, but I don’t see why bachelors are unmarried”.
(“Bedrock” is the operative word here. Meant to exclude the “vast majority of ethical statements [that] also have some degree of empirical content”.)