See also Robustness to Scale. You wrote that “we expect that the failure modes which still appear under such assumptions are the hard failure modes” (emphasis mine). But there are some failure modes which don’t appear with existing algorithms, yet are hypothesized to appear in the limit of more data and compute, such as the “malign universal prior” problem. It’s unclear how much to worry about these problems, because as you say, we don’t actually expect to use e.g. Solomonoff induction. I suspect a key issue is whether the problem is an inevitable result of scaling any algorithm, vs a quirk of the particular infinite data/compute algorithm being discussed.
But there are some failure modes which don’t appear with existing algorithms, yet are hypothesized to appear in the limit of more data and compute...
This is a great point to bring up. One thing the OP probably doesn’t emphasize enough is: just because one particular infinite-data/compute algorithm runs into a problem, does not mean that problem is hard.
Zooming out for a moment, the strategy the OP is using is problem relaxation: we remove a constraint from the problem (in this case data/compute constraints), solve that relaxed problem, then use the relaxed solution to inform our solution to the original problem. Note that any solution to the original problem is still a solution to the relaxed problem, so the relaxed problem cannot ever be any harder than the original. If it ever seems like a relaxed problem is harder than the original problem, then a mistake has been made.
In context: we relax alignment problems by removing the data/compute constraints. That does not mean we’re required to use approximations of Solomonoff induction, or required to use perfect predictive power; it just means that we are allowed to use those things in our solution. If we can solve the problem by e.g. simply not using Solomonoff induction, then it’s an easy problem in the infinite data/compute setting just like it’s an easy problem in a more realistic setting.
If we don’t know of any way to solve the problem, even when we’re allowed infinite data/compute, then it’s a good hard-problem candidate.
See also Robustness to Scale. You wrote that “we expect that the failure modes which still appear under such assumptions are the hard failure modes” (emphasis mine). But there are some failure modes which don’t appear with existing algorithms, yet are hypothesized to appear in the limit of more data and compute, such as the “malign universal prior” problem. It’s unclear how much to worry about these problems, because as you say, we don’t actually expect to use e.g. Solomonoff induction. I suspect a key issue is whether the problem is an inevitable result of scaling any algorithm, vs a quirk of the particular infinite data/compute algorithm being discussed.
This is a great point to bring up. One thing the OP probably doesn’t emphasize enough is: just because one particular infinite-data/compute algorithm runs into a problem, does not mean that problem is hard.
Zooming out for a moment, the strategy the OP is using is problem relaxation: we remove a constraint from the problem (in this case data/compute constraints), solve that relaxed problem, then use the relaxed solution to inform our solution to the original problem. Note that any solution to the original problem is still a solution to the relaxed problem, so the relaxed problem cannot ever be any harder than the original. If it ever seems like a relaxed problem is harder than the original problem, then a mistake has been made.
In context: we relax alignment problems by removing the data/compute constraints. That does not mean we’re required to use approximations of Solomonoff induction, or required to use perfect predictive power; it just means that we are allowed to use those things in our solution. If we can solve the problem by e.g. simply not using Solomonoff induction, then it’s an easy problem in the infinite data/compute setting just like it’s an easy problem in a more realistic setting.
If we don’t know of any way to solve the problem, even when we’re allowed infinite data/compute, then it’s a good hard-problem candidate.