A philosophical kernel: biting analytic bullets

Link post

Sometimes, a philosophy debate has two basic positions, call them A and B. A matches a lot of people’s intuitions, but is hard to make realistic. B is initially unintuitive (sometimes radically so), perhaps feeling “empty”, but has a basic realism to it. There might be third positions that claim something like, “A and B are both kind of right”.

Here I would say B is the more bullet-biting position. Free will vs. determinism is a classic example: hard determinism is biting the bullet. One interesting thing is that free will believers (including compatibilists) will invent a variety of different theories to explain or justify free will; no one theory seems clearly best. Meanwhile, hard determinism has stayed pretty much the same since ancient Greek fatalism.

While there are some indications that the bullet-biting position is usually more correct, I don’t mean to make an overly strong statement here. Sure, position A (or a compatibility between A and B) could really be correct, though the right formalization hasn’t been found. But I am interested in what views result from biting bullets at every stage, nonetheless.

Why consider biting multiple bullets in sequence? Consider an analogy: a Christian fundamentalist considers whether Christ’s resurrection didn’t really happen. He reasons: “But if the resurrection didn’t happen, then Christ is not God. And if Christ is not God, then humanity is not redeemed. Oh no!”

There’s clearly a mistake here, in that a revision of a single belief can lead to problems that are avoided by revising multiple beliefs at once. In the Christian fundamentalist case, atheists and non-fundamentalists already exist, so it’s pretty easy not to make this mistake. On the other hand, many of the (explicit or implicit) intuitions in the philosophical water supply may be hard to think outside of; there may not be easily identifiable “atheists” with respect to many of these intuitions simultaneously.

Some general heuristics. Prefer ontological minimality: do not explode types of entities beyond necessity. Empirical plausibility: generally agree with well-established science and avoid bold empirical claims; at most, cast doubt on common scientific background assumptions (see: Kant decoupling subjective time from clock time). Un-creativity: avoid proposing speculative, experimental frameworks for decision theory and so on (they usually don’t work out).

What’s the point of all this? Maybe the resulting view is more likely true than other views. Even if it isn’t true, it might be a minimal “kernel” view that supports adding more elements later, without conflicting with legacy frameworks. It might be more productive to argue against a simple, focused, canonical view than a popular “view” which is really a disjunctive collection of many different views; bullet-biting increases simplicity, hence perhaps being more productive to argue against.

Causality: directed acyclic graph multi-factorization

Empirically, we don’t see evidence of time travel. Events seem to proceed from past to future, with future events being at least somewhat predictable from past events. This can be seen in probabilistic graphical models. Bayesian networks have a directed acyclic graph factorization (which can be topologically sorted, perhaps in multiple ways), while factor graphs in general don’t. (For example, it is possible to express the conditional distribution of a Bayesian network on some variable having some value, in a factor graph; the factor graph now expresses something like “teleology”, events tending to happen more when they are compatible with some future possibility.)

This raises the issue that there are multiple Bayesian networks with different graphs expressing the same joint distribution. For ontological minimality, we could say these are all valid factorizations (so there is no “further fact” of what is the real factorization, in cases of persistent empirical ambiguity), though of course some have analytically nicer mathematical properties (locality, efficient computability) than others. Each non-trivial DAG factorization has mathematical implications about the distribution; we need not forget these implications even though there are multiple DAG factorizations.

Bayesian networks can be generalized to probabilistic programming, e.g. some variables may only exist dependent on specific values for previous variables. This doesn’t change the overall setup much; the basic ideas are already present in Bayesian networks.

We now have a specific disagreement with Judea Pearl: he operationalizes causality in terms of consequences of counterfactual intervention. This is sensitive to the graph order of the directed acyclic graph; hence, causal graphs express more information than the joint distribution. For ontological minimality, we’ll avoid reifying causal counterfactuals and hence causal graphs. Causal counterfactuals have theoretical problems, such as implying violations of physical law, hence being un-determined by empirical science (as we can’t observe what happens when physical laws are violated). We avoid these, by not believing in causal counterfactuals.

Since causal counterfactuals are about non-actual universes, we don’t really need them to make the empirical predictions of causal models, such as no time travel. DAG factorization seems to do the job.

Laws of physics: universal satisfaction

Given a DAG model, some physical invariants may hold, e.g. conservation of energy. And if we transform the DAG model to one expressing the same joint distribution, the physical invariants translate. They always hold for any configuration in the DAG’s support.

Do the laws have “additional reality” beyond universal satisfaction? It doesn’t seem we need to assume they do. We predict as if the laws always hold, but that reduces to a statement about the joint configuration; no extra predictive power results from assuming the laws have any additional existence.

So for ontological minimality, the reality of a law can be identified with its universal satisfaction by the universe’s trajectory. (This is weaker than notions of “counterfactual universal satisfaction across all possible universes”.)

This enables us to ask questions similar to counterfactuals: what would follow (logically, or with high probability according to the DAG) in a model in which these universal invariants hold, and the initial state is X (which need not match the actual universe’s initial state)? This is a mathematical question, rather than a modal one; see discussion of mathematics later.

Time: eternalism

Eternalism says the future exists, as the past and present do. This is fairly natural from the DAG factorization notion of causality. As there are multiple topological sorts of a given DAG, and multiple DAGs consistent with the same joint distribution, there isn’t an obvious way to separate the present from the past and future; and even if there were, there wouldn’t be an obvious point in declaring some nodes real and others un-real based on their topological ordering. Accordingly, for ontological minimality, they have the same degree of existence.

Eternalism is also known as “block universe theory”. There’s a possible complication, in that our DAG factorization can be stochastic. But the stochasticity need not be “located in time”. In particular, we can move any stochasticity into independent random variables, and have everything be a deterministic consequence of those. This is like pre-computing random numbers for a Monte Carlo sampling algorithm.

The main empirical ambiguity here is whether the universe’s history has a high Kolmogorov complexity, increasing approximately linearly with time. If it does, then something like a stochastic model is predictively appropriate, although the stochasticity need not be “in time”. If not, then it’s more like classical determinism. It’s an open empirical question, so let’s not be dogmatic.

We can go further. Do we even need to attribute “true stochasticity” to a universe with high Kolmogorov complexity? Instead, we can say that simple universally satisfied laws constrain the trajectory, either partially or totally (only partially in the high K-complexity case). And to the extent they only partially do, we have no reason to expect that a simple stochastic model of the remainder would be worse than any other model (except high K-complexity ones that “bake in” information about the remainder, a bit of a cheat). (See the “The Coding Theorem — A Link between Complexity and Probability” for technical details.)

Either way, we have “quasi-determinism”; everything is deterministic, except perhaps factored-out residuals that a simple stochastic model suffices for.

Free will: non-realism

A basic argument against free will: free will for an agent implies that the agent could have done something else. This already implies a “possibility”-like modality; if such a modality is not real, free will fails. If on the other hand, possibility is real, then, according to standard modal logics such as S4, any logical tautology must be necessary. If an agent is identified with a particular physical configuration, then, given the same physics /​ inputs /​ stochastic bits (which can be modeled as non-temporal extra parameters, per previous discussion), there is only one possible action, and it is necessary, as it is logically tautological. Hence, a claim of “could” about any other action fails.

Possible ways out: consider giving the agent different inputs, or different stochastic bits, or different physics, or don’t identify the agent with its configuration (have “could” change the agent’s physical configuration). These are all somewhat dubious. For one, it is dogmatic to assume that the universe has high Kolmogorov complexity; if it doesn’t, then modeling decisions as having corresponding “stochastic bits” can’t in general be valid. Free will believers don’t tend to agree on how to operationalize “could”, their specific formalizations tend to be dubious in various ways, and the formalizations do not agree much with normal free will intuitions. The obvious bullet to bite here is, there either is no modal “could”, or if there is, there is none that corresponds to “free will”, as the notion of “free will” bakes in confusions.

Decision theory: non-realism

We reject causal decision theory (CDT), because it relies on causal counterfactuals. We reject any theory of “logical counterfactuals”, because the counterfactual must be illogical, contradicting modal logics such as S4. Without applying too much creativity, what remain are evidential decision theory (EDT) and non-realism, i.e. the claim that there is not in general a fact of the matter about what action by some fixed agent best accomplishes some goal.

To be fair to EDT, the smoking lesion problem is highly questionable in that it assumes decisions could be caused by genes (without those genes changing the decision theory, value function, and so on), contradicting implementation of EDT. Moreover, there are logical formulations of EDT, which ask whether it would be good news to learn that one’s algorithm outputs a given action given a certain input (the one you’re seeing), where “good news” is taken across a class of possible universes, not just the one you have evidence of; these may better handle “XOR blackmail” like problems.

Nevertheless, I won’t dogmatically assume based on failure of CDT and logical counterfactual theories that EDT works; EDT theorists have to do a lot to make EDT seem to work in strange decision-theoretic thought experiments. This work can introduce ontological extras such as infinitesimal probabilities, or similarly, pseudo-Bayesian conditionals on probability 0 events. From a bullet-biting perspective, this is all highly dubious, and not really necessary.

We can recover various “practical reason” concepts as statistical predictions about whether an agent will succeed at some goal, given evidence about the agent, including that agent’s actions. For example, as a matter of statistical regularity, some people succeed in business more than others, and there is empirical correlation with their decision heuristics. The difference is that this is a third-personal evaluation, rather than a first-personal recommendation: we make no assumption that third-person predictive concepts relating to practical reason translate to a workable first-personal decision theory. (See also “Decisions are not about changing the world, they are about learning what world you live in”, for related analysis.)

Morality: non-realism

This shouldn’t be surprising. Moral realism implies that moral facts exist, but where would they exist? No proposal of a definition in terms of physics, math, and so on has been generally convincing, and they vary quite a lot. G.E. Moore observes that any precise definition of morality (in terms of physics and so on) seems to leave an “open question” of whether that is really good, and compelling to the listener.

There are many possible minds (consider the space of AGI programs), and they could find different things compelling. There are statistical commonalities (e.g. minds will tend to make decisions compatible with maintaining an epistemology and so on), but even commonalities have exceptions. (See “No Universally Compelling Arguments”.)

Suppose you really like the categorical imperative and think rational minds have a general tendency to follow it. If so, wouldn’t it be more precise to say “X agent follows the categorical imperative” than “X agent acts morally”? This bakes in fewer intuitive confusions.

As an analogy, suppose some people refer to members of certain local bird species as a “forest spirit”, due to a local superstition. You could call such a bird a “forest spirit” by which you mean a physical entity of that bird species, but this risks baking in a superstitious confusion.

In addition, the discussion of free will and decision theory shows that there are problems with formulating possibility and intentional action. If, as Kant says, “ought implies can”, then contrapositively “not can implies not ought”; if modal analysis shows that alternative actions for a given agent are not possible, then no alternative actions can be “ought”. (Alternatively, if modal possibility is unreal, then “ought implies can” is confused to begin with). This is really not the interpretation of “ought” intended by moral realists; it’s redundant with the actual action.

Theory of mind: epistemic reductive physicalism

Chalmers claims that mental properties are “further facts” on top of physical properties, based on the zombie argument: it is conceivable that a universe physically identical to ours could exist, but with no consciousness in it. Ontological minimality suggests not believing in these “further facts”, especially given how dubious theories of consciousness tend to be. This seems a lot like eliminativism.

We don’t need to discard all mental concepts, though. Some mental properties such as logical inference and memory have computational interpretations. If I say my computer “remembers” something, I specify a certain set of physical configurations that way: the ones corresponding to computers with that something in the memory (e.g. RAM). I could perhaps be more precise than “remembers”, by saying something like “functionally remembers”.

A possible problem with eliminativism is that it might undermine the idea that we know things, including any evidence for eliminativism. It is epistemically judicious to have some ontological status for “we have evidence of this physical theory” and so on. The idea with reductive physicalism is to correspond such statements with physical ones. Such as: “in the universe, most agents who use this or that epistemic rule are right about this or that”. (It would be a mistake to assume, given a satisficing epistemology evaluation over existent agents, that we “could” maximize epistemology with a certain epistemic rule; that would open up the usual decision-theoretic complications. Evaluating the reliability of our epistemologies is more like evaluating third-personal practical reason than making first-personal recommendations.)

That might be enough. If it’s not enough then ontological minimality suggests adding as little as possible to physicalism to express epistemic facts. We don’t need a full-blown theory of consciousness to express meaningful epistemic statements.

Personal identity: empty individualism, similarity as successor

If a machine scans you and makes a nearly-exact physical copy elsewhere, is that copy also you? Paradoxes of personal identity abound. Whether that copy is “really you” seems like a non-question; if it had an answer, where would that answer be located?

Logically, we have a minimal notion of personal identity from mathematical identity (X=X). So, if X denotes (some mathematical object corresponding to) you at some time, then X=X. This is an empty notion of individualism, as it fails to hold that you are the same as recent past or future versions of yourself.

What’s fairly simple and predictive to say above X=X is that a near-exact copy of you is similar to you. As you are similar to near past and future versions of yourself, as two prints of a book are similar, and as two world maps are similar. There are also directed properties (rather than symmetric similarity), such as you remembering the experiences of past versions of yourself but not vice versa; these are reduce to physical properties, not further properties, as in the theory of mind section.

It’s easy to get confused about which entities are “really the same person”. Ontological minimality suggests there isn’t a general answer, beyond trivial reflexive identities (X=X). The successor concept is, then, something like similarity. (And getting too obsessed with “how exactly to define similarity?” misses the point; the use of similarity is mainly predictive/​evidential, not metaphysical.)

Anthropic probability: non-realism, graph structure as successor

In the Sleeping Beauty problem, is the correct probability ½ or ⅓? It seems the argument is over nothing real. Halfers and thirders agree on a sort of graph structure of memory: the initial Sleeping Beauty “leads to” one or two future states, depending on the coin flip, in terms of functional memory relations. The problem has to do with translating the graph structure to a probability distribution over future observations and situations (from the perspective of the original Sleeping Beauty).

From physics and identification of basic mental functions, we get a graph-like structure; why add more ontology? Enough thought experiments of memory wipes, upload copying, and so on, suggest that the linear structure of memory and observation is not always valid.

This slightly complicates the idea of physical theories being predictive, but it seems possible to operationalize prediction without a full notion of subjective probability. We can ask questions like, “do most entities in the universe who use this or that predictive model make good predictions about their future observations?”. The point here isn’t to get a universal notion of good predictions, but rather one that is good enough to get basic inferences, like learning about universal physical laws.

Mathematics: formalism

Are mathematical facts, such as “Fermat’s Last Theorem is true”, real? If so, where are they? Are they in the physical universe, or at least partially in a different realm?

Both of these are questionable. If we try to identify “for all n,m: n + S(m) = S(n + m)” with “in the universe, it is always the case that adding n objects to S(m) objects yields S(n + m) objects”, we run into a few problems. First, it requires identifying objects in physics. Second, given a particular definition of object, physics might not be such that this rule always holds: maybe adding a pile of sand to another pile of sand reduces the number of objects (as it combines two piles into one), or perhaps some objects explode when moved around; meanwhile, mathematical intuition is that these laws are necessary. Third, the size of the physical universe limits how many test cases there can be; hence, we might un-intuitively conclude something like “for all n,m both greater than Graham’s number, n=m”, as the physical universe has no counter-examples. Fourth, the size of the universe limits the possible information content of any entity in it, forcing something like ultrafinitism.

On the other hand, the idea that the mathematical facts live even partially outside the universe is ontologically and epistemically questionable. How would we access these mathematical facts, if our behaviors are determined by physics? Why even assume they exist, when all we see is in the universe, not anything outside of it?

Philosophical formalism does not explain “for all n,m: n + S(m) = S(n + m)” by appealing to a universal truth, but by noting that our formal system (in this case, Peano arithmetic) derives it. A quasi-invariant holds: mathematicians tend to in practice follow the rules of the formal system. And mathematicians use one formal system rather than another for physical, historical reasons. Peano arithmetic, for example, is useful: it models numbers in physics theories and in computer science, yielding predictions due to the structure of the inferences having some correspondence with the structure of physics. Though, utility is a contingent fact about our universe; what problems are considered useful to solve varies with historical circumstances. Formal systems are also adopted for reasons other than utility, such as the momentum of past practice or the prestige of earlier work.

The thing we avoid with philosophical formalism is confusions over “further facts”, such as the Continuum Hypothesis, which has been shown to be independent of ZFC. We don’t need to think there is a real fact of the matter about whether the Continuum Hypothesis is true.

Formalism is suggestive of finitism and intuitionism, although these are additional principles of formal systems; we don’t need to conclude something like “finitism is true” per se. The advantage of such formal systems is that they may be a bit more “self-aware” as being formal systems; for example, intuitionism is less suggestive that there is always a fact of the matter regarding undecidable statements (like a Gödelian sentence), as it does not accept the law of the excluded middle. But, again, these are particular formal systems, which have advantages and disadvantages relative to other formal systems; we don’t need to conclude that any of these are “the correct formal system”.

Conclusion

The positions sketched here are not meant to be a complete theory of everything. They are a deliberately stripped-down “kernel” view, obtained by repeatedly biting bullets rather than preserving intuitions that demand extra ontology. Across causality, laws of physics, time, free will, decision theory, morality, mind, personal identity, anthropic probability, and mathematics, the same method has been applied:

  • Strip away purported “further facts” not needed for empirical adequacy.

  • Treat models as mathematical tools for describing the world’s structure, not as windows onto modal or metaphysical realms.

  • Accept that some familiar categories like “could,” “ought,” “the same person,” or “true randomness” may collapse into redundancy or dissolve into lighter successors such as statistical regularity or similarity relations.

This approach sacrifices intuitive richness for structural economy. But the payoff is clarity: fewer moving parts, fewer hidden assumptions, and fewer places for inconsistent intuitions to be smuggled in. Even if the kernel view is incomplete or false in detail, it serves as a clean baseline — one that can be built upon, by adding commitments with eyes open to their costs.

The process is iterative. For example, I stripped away a causal counterfactual ontology to get a DAG structure; then stripped away the timing of stochasticity into a-temporal uniform bits; then suggested that residuals not determined by simple physical laws (in a high Kolmogorov complexity universe) need not be “truly stochastic”, just well-predicted by a simple stochastic model. Each round makes the ontology lighter while preserving empirical usefulness.

It is somewhat questionable to infer from lack of success to define, say, optimal decision theories, that no such decision theory exists. This provides an opportunity for falsification: solve the problem really well. A sufficiently reductionist solution may be compatible with the philosophical kernel; otherwise, an extension might be warranted.

I wouldn’t say I outright agree with everything here, but the exercise has shifted my credences toward these beliefs. As with the Christian fundamentalist analogy, resistance to biting particular bullets may come from revising too few beliefs at once.

A practical upshot is that a minimal philosophical kernel can be extended more easily without internal conflict, whereas a more complex system is harder to adapt. If someone thinks this kernel is too minimal, the challenge is clear: propose a compatible extension, and show why it earns its ontological keep.