I am not sure that they are similar. As I remarked in comments to your post, econ-related thinking could be closer to thinking only about easy-to-model aspects instead of the entire system. If an American corporation outsourced factory work to Asia, it doesn’t destroy the entire USA, it just is a step in the race to the bottom and towards American workers losing qualifications.
Additionally, I don’t think that wall-thinking and bridge-thinking are as easy to separate as the OP describes. Suppose that mankind can choose between not creating the ASI at all with probability or creating the ASI via governance model with probability and, conditioned on choosing the model , creating the misaligned ASI with probability . Then , where by definition. As far as I understand MIRI’s view, if we create the ASI at all then unless one accepts hard-to-implement demands like making the humans smarter, and meaning that we should concentrate on raising it as high as possible. The AI-2027 optimistic alignment assumption is that the Slowdown-like governance with the USA iterating over many different models and strategies has and that the race dynamics prevents alignment from being studied (). Were this true, we would see . However, if we believe that and that it’s easier to alter by coming up with various schemes than to alter , then altering becomes vitally important.
I think the biggest difference between wall-thinking and bridge-thinking here isn’t actually about the size of , but rather how easy is to alter. From a more mathematical standpoint, what matters is the rate of change of with respect to the effort put in. In AI safety, believing is very high is also correlated with a bunch of bridge thinking like “Alignment is incredibly difficult and current techniques have essentially zero chances of working”—i.e .
If I were to sum it up—bridge thinking assumes you need a lot of effort to start to meaningfully reduce from where we currently are. Wall thinking thinks there are marginal gains from the current position.
As an example, let’s take the following hypothetical belief: “If we had really good interpretability tools, then there would be a lot of low-hanging fruit we could pick with those tools. But without those tools we’re operating blindly, and can’t make much progress at all”. By this belief, we are currently in a bridge model—small improvements to current interpretability techniques will yield almost nothing. But if we did develop those good tools, we would now transition to wall thinking—there’s lots of marginal effort that leads to a reduction in by using those tools.
Here is a Claude-generated visualisation of what that would look like, demonstrating what the curves look like in each regime in my mind. This works whether initially starts high or low. Which frame is appropriate would here depend on where you think we currently are on this graph, and is invariant with respect to the initial value of , provided is at least large enough to be concerning.
I am not sure that they are similar. As I remarked in comments to your post, econ-related thinking could be closer to thinking only about easy-to-model aspects instead of the entire system. If an American corporation outsourced factory work to Asia, it doesn’t destroy the entire USA, it just is a step in the race to the bottom and towards American workers losing qualifications.
Additionally, I don’t think that wall-thinking and bridge-thinking are as easy to separate as the OP describes. Suppose that mankind can choose between not creating the ASI at all with probability or creating the ASI via governance model with probability and, conditioned on choosing the model , creating the misaligned ASI with probability . Then , where by definition. As far as I understand MIRI’s view, if we create the ASI at all then unless one accepts hard-to-implement demands like making the humans smarter, and meaning that we should concentrate on raising it as high as possible. The AI-2027 optimistic alignment assumption is that the Slowdown-like governance with the USA iterating over many different models and strategies has and that the race dynamics prevents alignment from being studied ( ). Were this true, we would see . However, if we believe that and that it’s easier to alter by coming up with various schemes than to alter , then altering becomes vitally important.
I think the biggest difference between wall-thinking and bridge-thinking here isn’t actually about the size of , but rather how easy is to alter. From a more mathematical standpoint, what matters is the rate of change of with respect to the effort put in. In AI safety, believing is very high is also correlated with a bunch of bridge thinking like “Alignment is incredibly difficult and current techniques have essentially zero chances of working”—i.e .
If I were to sum it up—bridge thinking assumes you need a lot of effort to start to meaningfully reduce from where we currently are. Wall thinking thinks there are marginal gains from the current position.
As an example, let’s take the following hypothetical belief: “If we had really good interpretability tools, then there would be a lot of low-hanging fruit we could pick with those tools. But without those tools we’re operating blindly, and can’t make much progress at all”. By this belief, we are currently in a bridge model—small improvements to current interpretability techniques will yield almost nothing. But if we did develop those good tools, we would now transition to wall thinking—there’s lots of marginal effort that leads to a reduction in by using those tools.
initially starts high or low. Which frame is appropriate would here depend on where you think we currently are on this graph, and is invariant with respect to the initial value of , provided is at least large enough to be concerning.
Here is a Claude-generated visualisation of what that would look like, demonstrating what the curves look like in each regime in my mind. This works whether