I’d also worry a lot about scenarios where the difficulty level in practice is hard but not impossible, where making incremental progress is most likely to lead you down misleading paths that make the correct solutions harder rather than easier to find, because you have a much harder time directing attention to them, keeping attention there, or being rewarded for any incremental progress given ‘the competition’ and you have to worry those in charge will go with other, non-working solutions instead at many points.
...
I do not think that long-term alignment is so continuous with near-term reliability and control. I expect that successful solutions will likely be found to many near-term reliability and control problems, and that those solutions will have very little chance of working on AGI and then ASI systems. If I did not believe this, a lot of the strategic landscape would change dramatically, and my p(doom) from lack of alignment would decline dramatically, although I would still worry about whether that alignment actually saves us from the dynamics that happen after that – which is an under-considered problem.
I strongly agree with this, and have discussed this with a friend who does theoretical safety research who also agrees. It’s easy to get excited by making incremental iterative progress. That’s a thing that teams of humans tend to be great at. This makes it much easier to put additional resources into it. But it’s likely that focusing only on this would lead us to ignore the likely fact that we’re iterating on the wrong things and marching into a blind alley. I, and my friend, expect that the true paths to good solutions are not yet found and thus not yet available for incremental iterative progress. If that’s the case, we need more theoretical researchers, and more serial time, to get to the beginning of a workable path to iterate on.
If these currently-iterable safety-ish plans manage to buy us at least a delay before things get doom-y, then that could be a benefit. I think it’s plausible they could buy us a year or two of doom-delay.
Related: I also think that compute governance is potentially a good idea in that it might buy us some delay while we are in a low-algorithmic-efficiency regime. It current seems like the first models to be really dangerous will be made by well-resourced groups using lots of compute. I think that compute governance is potentially a terrible idea in that I expect it to fail completely and suddenly when, in the not so distant future, we transition to a high-algorithmic-efficiency regime. Then the barrier will be knowledge of the efficient algorithms, not large amounts of compute. I believe we can know that this high-algorithmic-efficiency regime exists because of looking at the way compute and learning work in the brain, but that we can’t be sure of when algorithmic leaps will be made or how far they will get us. So if we put our trust in compute governance, we are driving our bus onto a lake which we know ahead of time has patches of ice thin enough to suddenly give way beneath us. With no way to know when we will reach the weak ice. Seems scary.
I strongly agree with this, and have discussed this with a friend who does theoretical safety research who also agrees. It’s easy to get excited by making incremental iterative progress. That’s a thing that teams of humans tend to be great at. This makes it much easier to put additional resources into it. But it’s likely that focusing only on this would lead us to ignore the likely fact that we’re iterating on the wrong things and marching into a blind alley. I, and my friend, expect that the true paths to good solutions are not yet found and thus not yet available for incremental iterative progress. If that’s the case, we need more theoretical researchers, and more serial time, to get to the beginning of a workable path to iterate on.
If these currently-iterable safety-ish plans manage to buy us at least a delay before things get doom-y, then that could be a benefit. I think it’s plausible they could buy us a year or two of doom-delay.
Related: I also think that compute governance is potentially a good idea in that it might buy us some delay while we are in a low-algorithmic-efficiency regime. It current seems like the first models to be really dangerous will be made by well-resourced groups using lots of compute. I think that compute governance is potentially a terrible idea in that I expect it to fail completely and suddenly when, in the not so distant future, we transition to a high-algorithmic-efficiency regime. Then the barrier will be knowledge of the efficient algorithms, not large amounts of compute. I believe we can know that this high-algorithmic-efficiency regime exists because of looking at the way compute and learning work in the brain, but that we can’t be sure of when algorithmic leaps will be made or how far they will get us. So if we put our trust in compute governance, we are driving our bus onto a lake which we know ahead of time has patches of ice thin enough to suddenly give way beneath us. With no way to know when we will reach the weak ice. Seems scary.