Bridge Thinking and Wall Thinking
There are a couple of frames I find useful when understanding why different people talk very differently about AI safety—the wall, and the bridge.
A wall is incrementally useful. Every additional brick you add is good, and the more bricks you add the better. If you are adding a brick to the wall you are doing something good, regardless of the current state of the wall.
A bridge requires a certain amount of investment. There’s not much use for half a bridge. Once the bridge crosses the lake, it can be improved—but until you get a working bridge, you have nothing.
A solid example of wall thinking is the image in this thread by Chris Olah. Any approach around “eating marginal probability” involves a wall frame. Another example is the theory of change of the standards work I’ve done for Inspect Evals, which I would summarise as “Other fields like aviation and rocketry have solid safety standards and paradigms. We need to build this for evaluations—it’s the kind of thing that a mature AI safety field needs to have.” This theory doesn’t have a full story of how it helps save the world end-to-end, but it doesn’t have to, under the wall frame—it just has to be pointing in the right direction and be broadly helpful.
A good example of bridge thinking is the MIRI approach—asking openly and outright for an international treaty. In my understanding, MIRI are not asking for the most they think they can get away with. They are asking for what they think is necessary to solve the problem, and believe anything less is insufficient. In lieu of p(doom), Eliezer asks “What is the minimum necessary and sufficient policy that you think would prevent extinction?” This is bridge thinking—we need to achieve a certain outcome X, and <X won’t do. Anything that has no chance of achieving X or greater is unhelpful at best and counterproductive at worst, And to figure out what X is needed, you need to have a solid idea of what your high level goal is from the start and how a given course of action gets you there.
From the wall perspective, bridge thinkers are ignoring or denigrating important marginal or unglamorous work in favor of swinging for the fences. From the bridge perspective, wall thinkers are doing things that are not helpful and will end up rounding down to zero.
I have found this a very useful way to understand why some people in AI safety are proposing very different ideas than me.
Huh. This post was thoroughly not what I expected from the title.
I was expecting something about bridges connecting and walls dividing; maybe lumping and splitting, or something about political factions and alliance-making vs. purity-testing.
(To be clear, this post is also a good idea and upvoted. I just wanted to point at the surprise thing.)
I like this terminology. It feels related to my econ vs sociopolitics distinction.
I am not sure that they are similar. As I remarked in comments to your post, econ-related thinking could be closer to thinking only about easy-to-model aspects instead of the entire system. If an American corporation outsourced factory work to Asia, it doesn’t destroy the entire USA, it just is a step in the race to the bottom and towards American workers losing qualifications.
Additionally, I don’t think that wall-thinking and bridge-thinking are as easy to separate as the OP describes. Suppose that mankind can choose between not creating the ASI at all with probability or creating the ASI via governance model with probability and, conditioned on choosing the model , creating the misaligned ASI with probability . Then , where by definition. As far as I understand MIRI’s view, if we create the ASI at all then unless one accepts hard-to-implement demands like making the humans smarter, and meaning that we should concentrate on raising it as high as possible. The AI-2027 optimistic alignment assumption is that the Slowdown-like governance with the USA iterating over many different models and strategies has and that the race dynamics prevents alignment from being studied ( ). Were this true, we would see . However, if we believe that and that it’s easier to alter by coming up with various schemes than to alter , then altering becomes vitally important.
I think the biggest difference between wall-thinking and bridge-thinking here isn’t actually about the size of , but rather how easy is to alter. From a more mathematical standpoint, what matters is the rate of change of with respect to the effort put in. In AI safety, believing is very high is also correlated with a bunch of bridge thinking like “Alignment is incredibly difficult and current techniques have essentially zero chances of working”—i.e .
If I were to sum it up—bridge thinking assumes you need a lot of effort to start to meaningfully reduce from where we currently are. Wall thinking thinks there are marginal gains from the current position.
As an example, let’s take the following hypothetical belief: “If we had really good interpretability tools, then there would be a lot of low-hanging fruit we could pick with those tools. But without those tools we’re operating blindly, and can’t make much progress at all”. By this belief, we are currently in a bridge model—small improvements to current interpretability techniques will yield almost nothing. But if we did develop those good tools, we would now transition to wall thinking—there’s lots of marginal effort that leads to a reduction in by using those tools.
initially starts high or low. Which frame is appropriate would here depend on where you think we currently are on this graph, and is invariant with respect to the initial value of , provided is at least large enough to be concerning.
Here is a Claude-generated visualisation of what that would look like, demonstrating what the curves look like in each regime in my mind. This works whether
An alternative to the bridge analogy would be a wall that doesn’t fully enclose your velociraptor exhibit. Until the gaps are closed, building the wall higher is fence-post security.
Here’s the feedback I gave to Jay, which he encouraged me to add as a comment:
(this is mostly just me playing devils advocate for the wall perspective)