I somewhat agree with this, but I don’t agree with conclusions like illegible problems being made legible means the value of working on the problem flips sign, and I want to explain why I disagree with this:
I generally believe that even unsolved legible problems won’t halt deployment of powerful AIs, an example scenario is here, at least without blatant signs that are basically impossible to spin, and even more importantly, not halting the deployment of powerful AIs might be the best choice we have, with inaction risk being too high for reasonable AI developers (for example, Anthropic) to avoid shutting down.
One of my general beliefs on philosophical problems is that lots of the solutions to philosophical problems will be unsatisfying, and most importantly here the general solution doesn’t matter, because more specific/tailored solutions that reflect what our universe is like is also fine, which is a very common way to make tractable previously intractable problems by changing the problem statement.
Because of the fact that a lot of philosophical problems only matter when AI capabilities are very, very high (as in, the thought experiments that motivate them assume almost arbitrarily high capabilities from AIs), this means that human work on them doesn’t actually matter, and this has to be delegated to (aligned) ASIs. This is also strengthened to the extent that philosophical problems require capabilities insights to solve, and they’re roadblockers for AI’s value, meaning AI folks will be incentivized to solve them by default.
More generally, a big reason why people are focusing on more legible problems nowadays is in large part because of a shift in focus in what regime of AI capabilities to target for safety interventions, and in particular there’s a lot less focus on the post-intelligence explosion era where AIs can do stuff that 0 humans can reliably hope to do, and much more focus on the first steps of say, AI fully automating AI R&D, where it’s easier to reason about intervention effectiveness and you can rely more on non-perfect/not fully justifiable solutions like AI control.
I somewhat agree with this, but I don’t agree with conclusions like illegible problems being made legible means the value of working on the problem flips sign, and I want to explain why I disagree with this:
I generally believe that even unsolved legible problems won’t halt deployment of powerful AIs, an example scenario is here, at least without blatant signs that are basically impossible to spin, and even more importantly, not halting the deployment of powerful AIs might be the best choice we have, with inaction risk being too high for reasonable AI developers (for example, Anthropic) to avoid shutting down.
One of my general beliefs on philosophical problems is that lots of the solutions to philosophical problems will be unsatisfying, and most importantly here the general solution doesn’t matter, because more specific/tailored solutions that reflect what our universe is like is also fine, which is a very common way to make tractable previously intractable problems by changing the problem statement.
Because of the fact that a lot of philosophical problems only matter when AI capabilities are very, very high (as in, the thought experiments that motivate them assume almost arbitrarily high capabilities from AIs), this means that human work on them doesn’t actually matter, and this has to be delegated to (aligned) ASIs. This is also strengthened to the extent that philosophical problems require capabilities insights to solve, and they’re roadblockers for AI’s value, meaning AI folks will be incentivized to solve them by default.
More generally, a big reason why people are focusing on more legible problems nowadays is in large part because of a shift in focus in what regime of AI capabilities to target for safety interventions, and in particular there’s a lot less focus on the post-intelligence explosion era where AIs can do stuff that 0 humans can reliably hope to do, and much more focus on the first steps of say, AI fully automating AI R&D, where it’s easier to reason about intervention effectiveness and you can rely more on non-perfect/not fully justifiable solutions like AI control.