I would say it is basically-always true, but there are some fields (including deep learning today, for purposes of your comment) where the big hard central problems have already been solved, and therefore the many small pieces of progress on subproblems are all of what remains.
Maybe, but it is interesting to note that:
A majority of productive work is occuring on small subproblems even if some previous paradigm change was required for this.
For many fields, (e.g., deep learning) many people didn’t recognize (and potentially still don’t recognize!) that the big hard central problem was already solved. This potentially implies it might be non-obvious whether this has been solved and making bets on some existing paradigm which doesn’t obviously suffice can be reasonable.
Things feel more continuous to me than your model suggests.
And insofar as there remains some problem which is simply not solvable within a certain paradigm, that is a “big hard central problem”, and progress on the smaller subproblems of the current paradigm is unlikely by-default to generalize to whatever new paradigm solves that big hard central problem.
It doesn’t seem clear to me this is true in AI safety at all, at least for non-worst-case AI safety.
I claim it is extremely obvious and very overdetermined that this will occur in AI safety sometime between now and superintelligence.
Yes, I added “prior to human obsolescence” (which is what I meant).
Depending on what you mean by “superintelligence”, this isn’t at all obvious to me. It’s not clear to me we’ll have (or will “need”) new paradigms before fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks. Doing this hand over doesn’t directly require understanding whether the AI is specifically producing alignment work that generalizes. For instance, the system might pursue routes other than alignment work and we might determine its judgement/taste/epistimics/etc are good enough based on examining things other than alignment research intended to generalize beyond the current paradigm.
If by superintelligence, you mean wildly superhuman AI, it remains non-obvious to me that new paradigms are needed (though I agree they will pretty likely arise prior to this point due to AIs doing vast quantity of research if nothing else). I think thoughtful and laborious implementation of current paradigm strategies (including substantial experimentation) could directly reduce risk from handing off to superintelligence down to perhaps 25% and I could imagine being argued considerably lower.
It’s not clear to me we’ll have (or will “need”) new paradigms before fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks.
If you want to not die to slop, then “fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks” not a thing which happens at all until the full superintelligence alignment problem is solved. That is how you die to slop.
“fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks”
Suppose we replace “AIs” with “aliens” (or even, some other group of humans). Do you agree that doesn’t (necessarily) kill you due to slop if you don’t have a full solution to the superintelligence alignment problem?
Aliens kill you due to slop, humans depend on the details.
The basic issue here is that the problem of slop (i.e. outputs which look fine upon shallow review but aren’t fine) plus the problem of aligning a parent-AI in such a way that its more-powerful descendants will robustly remain aligned, is already the core of the superintelligence alignment problem. You need to handle those problems in order to safely do the handoff, and at that point the core hard problems are done anyway. Same still applies to aliens: in order to safely do the handoff, you need to handle the “slop/nonslop is hard to verify” problem, and you need to handle the “make sure agents the aliens build will also be aligned, and their children, etc” problem.
If by superintelligence, you mean wildly superhuman AI, it remains non-obvious to me that new paradigms are needed (though I agree they will pretty likely arise prior to this point due to AIs doing vast quantity of research if nothing else). I think thoughtful and laborious implementation of current paradigm strategies (including substantial experimentation) could directly reduce risk from handing off to superintelligence down to perhaps 25% and I could imagine being argued considerably lower.
I find it hard to imagine such a thing being at all plausible. Are you imagining that jupiter brains will be running neural nets? That their internal calculations will all be differentiable? That they’ll be using strings of human natural language internally? I’m having trouble coming up with any “alignment” technique of today which would plausibly generalize to far superintelligence. What are you picturing?
I think you might first reach wildly superhuman AI via scaling up some sort of machine learning (and most of that is something well described as deep learning). Note that I said “needed”. So, I would also count it as acceptable to build the AI with deep learning to allow for current tools to be applied even if something else would be more competitive.
(Note that I was responding to “between now and superintelligence”, not claiming that this would generalize to all superintelligences built in the future.)
I agree that literal jupiter brains will very likely be built using something totally different than machine learning.
Yeah ok. Seems very unlikely to actually happen, and unsure whether it would even work in principle (as e.g. scaling might not take you there at all, or might become more resource intensive faster than the AIs can produce more resources). But I buy that someone could try to intentionally push today’s methods (both AI and alignment) to far superintelligence and simply turn down any opportunity to change paradigm.
Maybe, but it is interesting to note that:
A majority of productive work is occuring on small subproblems even if some previous paradigm change was required for this.
For many fields, (e.g., deep learning) many people didn’t recognize (and potentially still don’t recognize!) that the big hard central problem was already solved. This potentially implies it might be non-obvious whether this has been solved and making bets on some existing paradigm which doesn’t obviously suffice can be reasonable.
Things feel more continuous to me than your model suggests.
It doesn’t seem clear to me this is true in AI safety at all, at least for non-worst-case AI safety.
Yes, I added “prior to human obsolescence” (which is what I meant).
Depending on what you mean by “superintelligence”, this isn’t at all obvious to me. It’s not clear to me we’ll have (or will “need”) new paradigms before fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks. Doing this hand over doesn’t directly require understanding whether the AI is specifically producing alignment work that generalizes. For instance, the system might pursue routes other than alignment work and we might determine its judgement/taste/epistimics/etc are good enough based on examining things other than alignment research intended to generalize beyond the current paradigm.
If by superintelligence, you mean wildly superhuman AI, it remains non-obvious to me that new paradigms are needed (though I agree they will pretty likely arise prior to this point due to AIs doing vast quantity of research if nothing else). I think thoughtful and laborious implementation of current paradigm strategies (including substantial experimentation) could directly reduce risk from handing off to superintelligence down to perhaps 25% and I could imagine being argued considerably lower.
If you want to not die to slop, then “fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks” not a thing which happens at all until the full superintelligence alignment problem is solved. That is how you die to slop.
Suppose we replace “AIs” with “aliens” (or even, some other group of humans). Do you agree that doesn’t (necessarily) kill you due to slop if you don’t have a full solution to the superintelligence alignment problem?
Aliens kill you due to slop, humans depend on the details.
The basic issue here is that the problem of slop (i.e. outputs which look fine upon shallow review but aren’t fine) plus the problem of aligning a parent-AI in such a way that its more-powerful descendants will robustly remain aligned, is already the core of the superintelligence alignment problem. You need to handle those problems in order to safely do the handoff, and at that point the core hard problems are done anyway. Same still applies to aliens: in order to safely do the handoff, you need to handle the “slop/nonslop is hard to verify” problem, and you need to handle the “make sure agents the aliens build will also be aligned, and their children, etc” problem.
I find it hard to imagine such a thing being at all plausible. Are you imagining that jupiter brains will be running neural nets? That their internal calculations will all be differentiable? That they’ll be using strings of human natural language internally? I’m having trouble coming up with any “alignment” technique of today which would plausibly generalize to far superintelligence. What are you picturing?
I think you might first reach wildly superhuman AI via scaling up some sort of machine learning (and most of that is something well described as deep learning). Note that I said “needed”. So, I would also count it as acceptable to build the AI with deep learning to allow for current tools to be applied even if something else would be more competitive.
(Note that I was responding to “between now and superintelligence”, not claiming that this would generalize to all superintelligences built in the future.)
I agree that literal jupiter brains will very likely be built using something totally different than machine learning.
Yeah ok. Seems very unlikely to actually happen, and unsure whether it would even work in principle (as e.g. scaling might not take you there at all, or might become more resource intensive faster than the AIs can produce more resources). But I buy that someone could try to intentionally push today’s methods (both AI and alignment) to far superintelligence and simply turn down any opportunity to change paradigm.