Bridge Thinking and Wall Thinking

Jay Bailey15 Mar 2026 0:20 UTC

34 points

There are a couple of frames I find useful when understanding why different people talk very differently about AI safety—the wall, and the bridge.

A wall is incrementally useful. Every additional brick you add is good, and the more bricks you add the better. If you are adding a brick to the wall you are doing something good, regardless of the current state of the wall.

A bridge requires a certain amount of investment. There’s not much use for half a bridge. Once the bridge crosses the lake, it can be improved—but until you get a working bridge, you have nothing.

A solid example of wall thinking is the image in this thread by Chris Olah. Any approach around “eating marginal probability” involves a wall frame. Another example is the theory of change of the standards work I’ve done for Inspect Evals, which I would summarise as “Other fields like aviation and rocketry have solid safety standards and paradigms. We need to build this for evaluations—it’s the kind of thing that a mature AI safety field needs to have.” This theory doesn’t have a full story of how it helps save the world end-to-end, but it doesn’t have to, under the wall frame—it just has to be pointing in the right direction and be broadly helpful.

A good example of bridge thinking is the MIRI approach—asking openly and outright for an international treaty. In my understanding, MIRI are not asking for the most they think they can get away with. They are asking for what they think is necessary to solve the problem, and believe anything less is insufficient. In lieu of p(doom), Eliezer asks “What is the minimum necessary and sufficient policy that you think would prevent extinction?” This is bridge thinking—we need to achieve a certain outcome X, and <X won’t do. Anything that has no chance of achieving X or greater is unhelpful at best and counterproductive at worst, And to figure out what X is needed, you need to have a solid idea of what your high level goal is from the start and how a given course of action gets you there.

From the wall perspective, bridge thinkers are ignoring or denigrating important marginal or unglamorous work in favor of swinging for the fences. From the bridge perspective, wall thinkers are doing things that are not helpful and will end up rounding down to zero.

I have found this a very useful way to understand why some people in AI safety are proposing very different ideas than me.

Jay Bailey15 Mar 2026 0:20 UTC

34 points

6 comments1 min readLW link

Karl Krueger 15 Mar 2026 2:32 UTC
7 points
4
Huh. This post was thoroughly not what I expected from the title.
I was expecting something about bridges connecting and walls dividing; maybe lumping and splitting, or something about political factions and alliance-making vs. purity-testing.
(To be clear, this post is also a good idea and upvoted. I just wanted to point at the surprise thing.)
Richard_Ngo 15 Mar 2026 0:43 UTC
3 points
−1
I like this terminology. It feels related to my econ vs sociopolitics distinction.
- StanislavKrym 15 Mar 2026 1:16 UTC
  1 point
  0
  Parent
  I am not sure that they are similar. As I remarked in comments to your post, econ-related thinking could be closer to thinking only about easy-to-model aspects instead of the entire system. If an American corporation outsourced factory work to Asia, it doesn’t destroy the entire USA, it just is a step in the race to the bottom and towards American workers losing qualifications.
  Additionally, I don’t think that wall-thinking and bridge-thinking are as easy to separate as the OP describes. Suppose that mankind can choose between not creating the ASI at all with probability or creating the ASI via governance model with probability and, conditioned on choosing the model , creating the misaligned ASI with probability . Then , where by definition. As far as I understand MIRI’s view, if we create the ASI at all then unless one accepts hard-to-implement demands like making the humans smarter, and meaning that we should concentrate on raising it as high as possible. The AI-2027 optimistic alignment assumption is that the Slowdown-like governance with the USA iterating over many different models and strategies has and that the race dynamics prevents alignment from being studied (). Were this true, we would see . However, if we believe that and that it’s easier to alter by coming up with various schemes than to alter , then altering becomes vitally important.
  - Jay Bailey 15 Mar 2026 2:15 UTC
    7 points
    1
    Parent
    I think the biggest difference between wall-thinking and bridge-thinking here isn’t actually about the size of , but rather how easy is to alter. From a more mathematical standpoint, what matters is the rate of change of with respect to the effort put in. In AI safety, believing is very high is also correlated with a bunch of bridge thinking like “Alignment is incredibly difficult and current techniques have essentially zero chances of working”—i.e .
    If I were to sum it up—bridge thinking assumes you need a lot of effort to start to meaningfully reduce from where we currently are. Wall thinking thinks there are marginal gains from the current position.
    As an example, let’s take the following hypothetical belief: “If we had really good interpretability tools, then there would be a lot of low-hanging fruit we could pick with those tools. But without those tools we’re operating blindly, and can’t make much progress at all”. By this belief, we are currently in a bridge model—small improvements to current interpretability techniques will yield almost nothing. But if we did develop those good tools, we would now transition to wall thinking—there’s lots of marginal effort that leads to a reduction in by using those tools.
    
    Here is a Claude-generated visualisation of what that would look like, demonstrating what the curves look like in each regime in my mind. This works whether initially starts high or low. Which frame is appropriate would here depend on where you think we currently are on this graph, and is invariant with respect to the initial value of , provided is at least large enough to be concerning.
Measure 16 Mar 2026 13:18 UTC
2 points
0
An alternative to the bridge analogy would be a wall that doesn’t fully enclose your velociraptor exhibit. Until the gaps are closed, building the wall higher is fence-post security.
Justin Olive 15 Mar 2026 20:46 UTC
2 points
0
Here’s the feedback I gave to Jay, which he encouraged me to add as a comment:
(this is mostly just me playing devils advocate for the wall perspective)
This is actually awesome and something I think would be a great contribution to AI safety discourse

I do think the wall framing is pretty cool

Perhaps another useful test case is asking what this would look like if you applied the same approaches to climate change, where we have lots more historical data

In climate, there’s actually a legally binding international treaty, it’s called the Paris Agreement. While incrementally impactful, it definitely hasn’t solved climate change.

This kinda illustrates how, from the “wall” perspective, there is no bridge, there is a large swathe of functions executed by many thousands of people across the field.

Here’s the intuition for that:

Examples of two archetypes within climate science:
1. The person who is designing a component within a device that measures gas ratios in ice core samples, to inform climate modelling
2. The secretary of the UN Climate Change Secretariat, which was responsible for organising the Paris Agreement
If you deleted function 1, it would have less adverse effects than if you deleted function 2. However, if you deleted most of the functions that looked like “design CO2 measurement device”, progress would mostly grind to a halt, and there’s a risk you wouldn’t have an international treaty, because that international treaty coordination was heavily influenced by thousands of “marginal” functions with diffuse impacts.

So in effect, you might argue there’s not much difference, because you can’t have 2 without 1, and in reality, it’s hard to know which functions within category 1 you can go without, so a good strategy is to just do as many as possible, with some common-sense prioritisation.
So the main constraint comes down to resourcing & capacity—i.e. how much wall can you build per unit of time?