# Science in a High-Dimensional World

Claim: the usual explanation of the Scientific Method is missing some key pieces about how to make science work well in a high-dimensional world (e.g. our world). Updating our picture of science to account for the challenges of dimensionality gives a different model for how to do science and how to recognize high-value research. This post will sketch out that model, and explain what problems it solves.

## The Dimensionality Problem

Imagine that we are early scientists, investigating the mechanics of a sled sliding down a slope. What determines how fast the sled goes? Any number of factors could conceivably matter: angle of the hill, weight and shape and material of the sled, blessings or curses laid upon the sled or the hill, the weather, wetness, phase of the moon, latitude and/or longitude and/or altitude, etc. For all the early scientists know, there may be some deep mathematical structure to the world which links the sled’s speed to the astrological motions of stars and planets, or the flaps of the wings of butterflies across the ocean, or vibrations from the feet of foxes running through the woods.

Takeaway: there are literally billions of variables which could influence the speed of a sled on a hill, as far as an early scientist knows.

So, the early scientists try to control as much as they can. They use a standardized sled, with standardized weights, on a flat smooth piece of wood treated in a standardized manner, at a standardized angle. Playing around, they find that they need to carefully control a dozen different variables to get reproducible results. With those dozen pieces carefully kept the same every time… the sled consistently reaches the same speed (within reasonable precision).

At first glance, this does not sound very useful. They had to exercise unrealistic levels of standardization and control over a dozen different variables. Presumably their results will not generalize to real sleds on real hills in the wild.

But stop for a moment to consider the implications of the result. A consistent sled-speed can be achieved while controlling *only a dozen variables*. Out of *literally billions*. Planetary motions? Irrelevant, after controlling for those dozen variables. Flaps of butterfly wings on the other side of the ocean? Irrelevant, after controlling for those dozen variables. Vibrations from foxes’ feet? Irrelevant, after controlling for those dozen variables.

The amazing power of achieving a consistent sled-speed is not that other sleds on other hills will reach the same predictable speed. Rather, it’s knowing *which variables are needed* to predict the sled’s speed. Hopefully, those same variables will be sufficient to determine the speeds of other sleds on other hills—even if some experimentation is required to find the speed for any particular variable-combination.

## Determinism

How can we know that *all* other variables in the universe are irrelevant after controlling for a handful? Couldn’t there always be some other variable which is relevant, no matter what empirical results we see?

The key to answering that question is determinism. If the system’s behavior can be predicted *perfectly*, then there is no mystery left to explain, no information left which some unknown variable could provide. Mathematically, information theorists use the mutual information to measure the information which contains about . If is deterministic—i.e. we can predict perfectly—then is zero no matter what variable we look at. Or, in terms of correlations: a deterministic variable always has zero correlation with everything else. If we can perfectly predict , then there is no further information to gain about it.

In this case, we’re saying that sled speed is deterministic *given* some set of variables (sled, weight, surface, angle, etc). So, given those variables, everything else in the universe is irrelevant.

Of course, we can’t always perfectly predict things in the real world. There’s always some noise—certainly at the quantum scale, and usually at larger scales too. So how do we science?

The first thing to note is that “perfect predictability implies zero mutual information” plays well with approximation: *approximately* perfect predictability implies *approximately* zero mutual information. If we can predict the sled’s speed to within 1% error, then any other variables in the universe can only influence that remaining 1% error. Similarly, if we can predict the sled’s speed 99% of the time, then any other variables can only matter 1% of the time. And we can combine those: if 99% of the time we can predict the sled’s speed to within 1% error, then any other variables can only influence the 1% error except for the 1% of sled-runs when they might have a larger effect.

More generally, if we can perfectly predict any specific variable, then everything else in the universe is irrelevant to that variable—even if we can’t perfectly predict all aspects of the system’s trajectory. For instance, if we can perfectly predict the *first two digits* of the sled’s speed (but not the less-significant digits), then we know that nothing else in the universe is relevant *to those first two digits* (although all sorts of things could influence the less-significant digits).

As a special case of this, we can also handle noise using repeated experiments. If I roll a die, I can’t predict the outcome perfectly, so I can’t rule out influences from all the billions of variables in the universe. But if I roll a die a few thousand times, then I can approximately-perfectly predict the *distribution* of die-rolls (including the mean, variance, etc). So, even though I don’t know what influences any one particular die roll, I do know that nothing else in the universe is relevant to the overall distribution of repeated rolls (at least to within some small error margin).

## Replication

This does still leave one tricky problem: what if we *accidentally* control some variable? Maybe air pressure influences sled speed, but it never occurred to us to test the sled in a vacuum or high-pressure chamber, so the air pressure was roughly the same for all of our experiments. We are able to deterministically predict sled speed, but only because we accidentally keep air pressure the same every time.

This is a thing which actually does happen! Sometimes we test something in conditions never before tested, and find that the usual rules no longer apply.

Ideally, replication attempts catch this sort of thing. Someone runs the same experiment in a different place and time, a different environment, and hopefully whatever things were accidentally kept constant will vary. (You’d be amazed what varies by location—I once had quite a surprise double-checking the pH of deionized water in Los Angeles.)

Of course, like air pressure, some things may happen to be the same even across replication attempts.

On the other hand, if a variable is accidentally controlled across multiple replication attempts, then it will likely be accidentally controlled outside the lab too. If every lab tests sled-speed at atmospheric pressure, and nobody ever accidentally tries a different air pressure, then that’s probably because sleds are almost always *used* at atmospheric pressure. When somebody goes to predict a sled’s speed in space, some useful new scientific knowledge will be gained, but until then the results will generally work in practice.

## The Scientific Method In A High-Dimensional World

Scenario 1: a biologist hypothesizes that adding hydroxyhypotheticol to their yeast culture will make the cells live longer, and the cell population will grow faster as a result. To test this hypothesis, they prepare one batch of cultures with the compound and one without, then measure the increase in cell density after 24 hours. They statistically compare the final cell density in the two batches to see whether the compound had a significant effect.

This is the prototypical Scientific Method: formulate a hypothesis, test it experimentally. Control group, p-values, all that jazz.

Scenario 2: a biologist observes that some of their clonal yeast cultures flourish, while others grow slowly or die out altogether, despite seemingly-identical preparation. What causes this different behavior? They search for differences, measuring and controlling for everything they can think of: position of the dishes in the incubator, order in which samples were prepared, mutations, phages, age of the initial cell, signalling chemicals in the cultures, combinations of all those… Eventually, they find that using initial cells of the same replicative age eliminates most of the randomness.

This looks less like the prototypical Scientific Method. There’s probably some hypothesis formation and testing steps in the middle, but it’s less about hypothesize-test-iterate, and more about figuring out which variables are relevant.

In a high-dimensional world, effective science looks like scenario 2. This isn’t mutually exclusive with the Scientific-Method-as-taught-in-high-school, there’s still some hypothesizing and testing, but there’s a new piece and a different focus. The main goal is to hunt down sources of randomness, figure out exactly what needs to be controlled in order to get predictable results, and thereby establish which of the billions of variables in the universe are actually relevant.

Based on personal experience and reading lots of papers, this matches my impression of which scientific research offers lots of long-term value in practice. The one-shot black-box hypothesis tests usually aren’t that valuable in the long run, compared to research which hunts down the variables relevant to some previously confusing (a.k.a. unpredictable) phenomenon.

## Everything Is Connected To Everything Else (But Not Directly)

What if there is no small set of variables which determines the outcome of our experiment? What if there really are billions of variables, all of which matter?

We sometimes see a claim like this made about biological systems. As the story goes, you can perform all sorts of interventions on a biological system—knock out a gene, add a drug, adjust diet or stimulus, etc—and any such intervention will change the level of most of the tens-of-thousands of proteins or metabolites or signalling molecules in the organism. It won’t necessarily be a large change, but it will be measurable. Everything is connected to everything else; any change impacts everything.

Note that this is not at all incompatible with a small set of variables determining the outcome! The problem of science-in-a-high-dimensional-world is not to enumerate all variables which have any influence. The problem is to find a set of variables which determine the outcome, so that no other variables have any influence *after* controlling for those.

Suppose sled speed is determined by the sled, slope material, and angle. There may still be billions of other variables in the world which impact the sled, the slope material, and the angle! But none of those billions of variables are relevant *after* controlling for the sled, slope material, and angle; other variables influence the speed only through those three. Those three variables *mediate* the influence of all the billions of other variables.

In general, the goal of science in a high dimensional world is to find sets of variables which mediate the influence of all other variables on some outcome.

In some sense, the central empirical finding of All Of Science is that, in practice, we can generally find *small* sets of variables which mediate the influence of all other variables. Our universe is “local”—things only interact directly with nearby things, and only so many things can be nearby at once. Furthermore, our universe abstracts well: even indirect interactions over long distances can usually be summarized by a small set of variables. Interactions between stars across galactic distances mostly just depend on the total mass of each star, not on all the details of the plasma roiling inside.

Even in biology, every protein interacts with every other protein in the network, but the vast majority of proteins do not interact *directly*—the graph of biochemical interactions is connected, but extremely sparse. The interesting problem is to figure out the structure of that graph—i.e. which variables interact directly with which other variables. If we pick one particular “outcome” variable, then the question is which variables are its neighbors in the graph—i.e. which variables mediate the influence of all the other variables.

## Summary

Let’s put it all together.

In a high-dimensional world like ours, there are billions of variables which could influence an outcome. The great challenge is to figure out *which variables* are directly relevant—i.e. which variables mediate the influence of everything else. In practice, this looks like finding mediators and hunting down sources of randomness. Once we have a set of control variables which is sufficient to (approximately) determine the outcome, we can (approximately) rule out the relevance of any other variables in the rest of the universe, *given* the control variables.

A remarkable empirical finding across many scientific fields, at many different scales and levels of abstraction, is that a *small* set of control variables usually suffices. Most of the universe is not directly relevant to most outcomes most of the time.

Ultimately, this is a picture of “gears-level science”: look for mediation, hunt down sources of randomness, rule out the influence of all the other variables in the universe. This sort of research requires a lot of work compared to one-shot hypothesis tests, but it provides a lot more long-run value: because all the other variables in the universe are irrelevant, we only need to measure/control the control variables each time we want to reuse the model.

- 10 Jan 2021 19:15 UTC; 14 points) 's comment on Evolution of Modularity by (
- 12 Feb 2021 20:10 UTC; 9 points) 's comment on Potential factors in Bell Labs’ intellectual progress, Pt. 1 by (
- 11 Jan 2021 22:32 UTC; 2 points) 's comment on Book Review: The Structure Of Scientific Revolutions by (

After reading this sentence, I had a short moment of illumination, that this is actually backwards: perhaps what our brains perceive as locality, is the property of “being influenced by/related to”. Perhaps childs brain learns which “pixels” of retina are near each other, by observing they often have correlated colors, and similarly which places in space are nearby because you can move things or itself between them etc. So, whatever high-dimensional structure the real universe would have, we would still evolve to notice which nodes in the graph are connected and declare them “local”. This doesn’t mean, that the observation from the quoted sentence is a tautology: it wouldn’t be true in a universe with much higher connectivity—we’re lucky to live in a universe with a low [Treewidth](https://en.wikipedia.org/wiki/Treewidth), and thus can hope to grasp it.

I believe this is exactly correct. Good explanation, too.

I don’t know enough about neurology to make a statement on whether this is something human children learn, or whether it comes evolutionarily preprogrammed, so to speak. But in a universe where physics wasn’t at least approximately local, I would expect there’d indeed be little point in holding the notion that points in space and time have given “distances” from one another.

I’m not sure whether it’s the standard view in physics, but Sean Carroll has suggested that we should think of locality in space as deriving from entanglement. (With space itself as basically an emergent phenomenon.) And I believe he considers this a driving principle in his quantum gravity work.

https://www.preposterousuniverse.com/blog/2016/07/18/space-emerging-from-quantum-mechanics/

I don’t understand the article enough to decode what “the right kind of state” means, but this feels like circular explanation. The three-dimentional space can “emerge” from a graph, but only assuming it is the right kind of graph. Okay, so what caused the graph to be exactly the kind of graph that generates a three-dimensional space?

I was expecting the central idea of this post to be more similar to/an extension of Everyday Lessons from High-Dimensional Optimization. That in a high-dimensional world, a good scientist can’t afford to waste time testing implausible hypotheses. Doing so will get you the right answer

eventually, but it is far too slow. In a high-dimensional world, there are just too many variables to tweak. Relevant excerpt from My Wild and Reckless Youth:To what extent is this post making these points?

Greatquestion. This post is completely ignoring those points, and it’s really not something which should be ignored.In the context of this post, the question is: ok, we’re trying to hunt down sources of randomness, trying to figure out which of the billions of variables actually matter, but

howdo we do that? We can’t just guess and check all those variables.Your description of the second type of science where you repeatedly control variables to isolate one reminds me a lot of debugging a complex program.

Great point! It’s a very similar problem, with a very similar solution. We have some complicated system with a large number of lines/variables which could influence the outcome (i.e. the bug), and the main problem is to figure out which lines/variables mediate the influence of everything else. The first step is to reproduce the bug—i.e. hunt down all the sources of “randomness”, until we can make the bug happen consistently. After that, the next step is to look for mediation—i.e. find lines/variables which are “in between” our original reproduction-inputs and the bug itself, and which are themselves sufficient to reproduce the problem.

I detect the ghost of Jaynes in this!

I am not sure exactly why, but this and the optimization post both call to mind the current of thought suggesting we segregate the hypothesis and experimental steps explicitly. I have encountered this in three places:

An unfinished textbook on Arxiv (which I cannot now locate to my frustration) that described treating machine learning as a science, which proposed gathering data and then the goodness of machine learning algorithm is measured by compression.

The Report likelihoods, not p-values article on Arbital.

This is basically how astronomy works by default: no one has a hypothesis for how pulsars interact and then gets a grant from their university department to launch a satellite network to look for pulsars; instead they identify phenomena on which they have little data, and pool resources to build a telescope or satellite or underground neutrino detector to gather the data, and then the publications test their hypotheses against the data gathered from one or more such projects.

I have a vague intuition that dividing up scientific practice in this way chunks the dimensions more tractably, or at least allows for it. Allowing optimization of data gathering and hypothesis formulation independently seems like a clear win for similar reasons.

Maybe the appeal is that it allows hypotheses to come from multiple directions in dimension space. The dimensionality of a body of data is fixed, but if it is generated as a tuple with a single hypothesis then it can only be approached from the perspective of that single hypothesis; if it is independent, then any hypothesis concerned with any of the dimensions of the data can be applied. By analogy, consider convergent evolution: two different paths in phase space arrive at essentially the same thing. Segregating the data step radically compresses this by allowing hypotheses from any other chain of development to be tested against it directly.

In particular, the view in this post is extremely similar to the view in Macroscopic Prediction. As there, reproducible phenomena are the key puzzle piece.

This post gave me an idea about how you might approach magic in fiction while keeping it ground in reality: something like magic users are people who learn to pick out relevant variables from the noise to consistently nudge reality in ways that otherwise seem not possible.

Basically placebomancy from Unsong.

I’ve wanted for a while to see a game along these lines. It would have some sort of 1-v-1 fighting, but dominated by “random” behavior from environmental features and/or unaligned combatants. The centerpiece of the game would be experimenting with the “random” components to figure out how they work, in order to later leverage them in a fight.

I’m very doubtful that hunting down sources of randomness is a good way to go about doing science where there’s a big solution space.

There’s a lot of human pattern matching involved in coming up with good hypothesis to test.

I think you’re pointing to the same issue which Adam Zerner was pointing to. Hunting down sources of randomness is a good

goalwhen doing science, but that doesn’t tell us much abouthowto go about the hunt when the solution space is very large.It sort of feels like switching the perspectives back and forth between searching for what works at all and searching for things to rule out is analogous to research and development. Iterating between them feels like how knowledge would be refined.

Also: imagining science as “optimizing from zero” is aesthetically pleasing to me.

Fleshing this out a bit more, within the framework of this comment: when we can consistently predict some outcomes using only a handful of variables, we’ve learned a (low-dimensional)

constrainton the behavior of the world. For instance, the gas law PV = nRT is a constraint on the relationship between variables in a low-dimensional summary of a high-dimensional gas. (More precisely, it’s a template for generating low-dimensional constraints on the summary variables of many different high-dimensional gases.)When we flip perspective to problems of design (e.g. engineering), those constraints provide the structure of our problem—analogous to the walls in a maze. We look for “paths in the maze”—i.e. designs—which satisfy the constraints. Duality says that those designs act as constraints when searching for new constraints (i.e. doing science). If engineers build some gadget that works, then that lets us rule out some constraints: any constraints which would prevent the gadget from working must be wrong.

Data serves a similar role (echoing your comment here). If we observe some behavior, then that provides a constraint when searching for new constraints. Data and working gadgets live “in the same space”—the space of “paths”: things which definitely do work in the world and therefore cannot be ruled out by constraints.

You know, I had never explicitly considered that data and devices would be in the same abstract space, but as soon as I read the words it was obvious. Thank you for that!

In the realm of biology, I think hunting for patterns and especially those you care about is a better way then hunting for randomness.

Many times randomness is the result of complex interactions that can’t easily be reduced.

There’s a parallel need to review the actual purpose for which you are doing all of that. It can be mutable.

For example, suppose you culture some unicellular algae, and you notice the cells can be more or less rounded in the same dish. You shrug and discard the dishes with too elongated cells to keep the line pure and strong. You learn what parameters to keep constant to make it easier.

And then someone shows that in point of fact, cell shape for this group of species can vary somewhat

even in cultureso we have been wrong about the diversity in the wild this whole time. And you read it and hope in your heart that some very motivated people might one day deviate from the beaten path and finally find out what’s going on there, despite this looking entirely unfundable.I don’t quite think you’ve solved the problem of induction.

I think there’s a fairly serious issue with your claim that being able to predict something accurately means you necessarily fully understand the variables which causes it because determinism.

That’s not really the cases. E.g: let’s say that ice cream melt twice as fast in galaxies without a supermassive black hole at the center. You do experiments to see how fast ice cream melts. After controlling for type of ice cream, temperature, initial temp of the ice cream, airflow and air humidity, you find that you can predict how ice cream melts. You triumphantly claim that you know which things cause ice cream to melt at different rates, having completely missed the black hole’s effects.

Essentially, controlling for A & B but not C won’t tell you whether C has a causal influence on the thing you’re measuring unless

you intentionally change C between experiments (not practical given googleplexes of potential causal factors)

C happens to naturally vary quite a bit and so makes your experimental results different, cluing you in to the fact that you’re missing something.

I do not think that the prototypical scientific method is not valuable in the long term.

In any experiment, there are lots of naturally varying parameters (current phase of the Moon, air pressure, amount of snow on the slope), and there are lots of naturally constant parameters (strength of gravity, room temperature, amount of hydroxyhypotethicol in the solution). There are base and derived parameters. The distances from the sun and the orbital periods vary between the planets, but (distance)^3/(orbital period)^2 is constant.

In the experiment, you measure X and Y. If X vary, but Y is constant, then they probably have no relation. Suppose that we want to find out that is X related to B or C. We control B to vary, and set C to a constant. If X vary, then it is not connected to C, if X is constant, then it is unrelated to B.

In the second scenario, you try to find the minimal set of base parameters that are related to X (growth rate). After some testing, we found that (growth rate) ~~ (initial age). After we found that connection, we can rule out the uncontrolled varying parameters, but there may be a connection between X and an uncontrolled constant parameter. It is possible that (growth rate) ~~ (initial age) times (1 + (amount of hydroxyhypotethicol)), and the first scenario will test these kinds of connections.

It is not enough to find which parameters won’t affect the experiment. It is also important to find out which parameters could affect the experiment.

I am kind of suprised you didn’t reference causal inference here to just gesture at the task in which we “figure out

which variablesare directly relevant—i.e. which variables mediate the influence of everything else”. Are you pointing to a different sort of idea/do you not feel causal inference is adequate for describing this task?Also, scenario 1 and 2 seem fairly close to the “linear” and “non-linear” models of innovation Jason Crawford described in his talk “

TheNon-Linear Model of Innovation.”To be honest, I prefered his description of the models. Though he didn’t cover how miraculous it is that somehow the model can work. That, to a good approximation, the universe is simple and local.Causal inference (or more precisely learning causal structure) is exactly the sort of thing I have in mind here. There’s actually a few places in the post where I should distinguish between variables which control an outcome in an information sense (i.e. sufficient to perfectly predict the outcome) vs in a causal sense (i.e. sufficient to cause the outcome under interventions). The main reason I didn’t talk about it directly is because I would have had to explain that distinction, and decided that would be too much of a distraction from the main point.

I think the takeaway of Jason’s talk, as it relates to this post, is that a large chunk of the “science” of achieving consistent outcomes happens in inventors’ workshops rather than scientists’ labs. The problem is still largely similar, regardless of the label applied, but scientists aren’t the only ones doing science.

I think this certainly describes

atype of gears level work scientists engage in, but not the only type, nor necessarily the most common one in a given field. There’s also model building, for example.Even once you’ve figured out which dozen variables you need to control to get a sled to move at the same speed every time, you still can’t predict what that speed would be if you set these dozen variables to different values. You’ve got to figure out Newton’s laws of motion and friction before you can do that.

Finding out which variables are relevant to a phenomenon in the first place is usually a required initial step for building a predictive model, but it’s not the only step, nor necessarily the hardest one.

Another type of widespread scientific work I can think of is facilitating efficient calculation. Even if you have a deterministic model that you’re pretty sure could theoretically predict a class of phenomena perfectly, that doesn’t mean you have the computing power necessary to actually use it.

Lattice Quantum Chromodynamics should theoretically be able to predict all of nuclear physics, but employing it in practice requires coming up with all sorts of ingenuous tricks and effective theories to reduce the computing power required for a given calculation. It’s enough to have kept a whole scientific field busy for over fifty years, and we’re still not close to actually being able to freely simulate every interaction of nucleons at the quark level from scratch.

Exactly correct.

Part of the implicit argument of the post is that the “figure out the dozen or so relevant variables” is the “hard” step in a big-O sense, when the number of variables in the universe is large. This is for largely similar reasons to those in Everyday Lessons From High-Dimensional Optimization: in low dimensions, brute force-ish methods are tractable. Thus we get things like e.g. tables of reaction rate constants. Before we had the law of mass action, there were too many variables potentially relevant to reaction rates to predict via brute force. But once we have mass action, there are few enough degrees of freedom that we can just try them out and make these tables of reaction constants.

Now, that still leaves the step of going from “temperature and concentrations are the relevant variables” to the law of mass action, but again, that’s the sort of thing where brute-force-ish exploration works pretty well. There is an insight step involved there, but it can largely be done by guess-and-check. And even before that insight is found, there’s few enough variables involved that “make a giant table” is largely tractable.

Good example.

What matters is

localdeterminism. You need to show that behaviour is predictable from factors under your control. If local determinism fails, it is hard to tell whether locality or determinism failed individually.And showing that a system’s behaviour is predictable when N factors are held constant by the experimenter doesn’t show that those are the only ones it is conditionally dependent one. Its behaviour might counterfactually depend on factors which the experimenter did not vary

and which did not naturally changeover the course of the experiment. In general, you can’t exclude mysterious extra variables.Keep reading, the post gets to that.

Doesn’t really contradict what I am saying. In theory, I am saying, you can’t exclude mysterious extra variables...but in practice that often doesn’t matter, as you are saying.