This post seems to assume that research fields have big, hard central problems that are solved with some specific technique or paradigm.
This isn’t always true. Many fields have the property that most of the work is on making small components work slightly better in ways that are very interoperable and don’t have complex interactions. For instance, consider the case of making AIs more capable in the current paradigm. There are many different subcomponents which are mostly independent and interact mostly multiplicatively:
Better training data: This is extremely independent: finding some source of better data or better data filtering can be basically arbitrarily combined with other work on constructing better training data. That’s not to say this parallelizes perfectly (given that work on filtering or curation might obsolete some prior piece of work), but just to say that marginal work can often just myopically improve performance.
Better architectures: This breaks down into a large number of mostly independent categories that typically don’t interact non-trivially:
All of attention, MLPs, and positional embeddings can be worked on independently.
A bunch of hyperparameters can be better understood in parallel
Better optimizers and regularization (often insights within a given optimizer like AdamW can be mixed into other optimizers)
Often larger scale changes (e.g., mamba) can incorporate many or most components from prior architectures.
Better optimized kernels / code
Better hardware
Other examples of fields like this include: medicine, mechanical engineering, education, SAT solving, and computer chess.
I agree that paradigm shifts can invalidate large amounts of prior work (and this has occurred at some point in each of the fields I list above), but it isn’t obvious whether this will occur in AI safety prior to human obsolescence. In many fields, this doesn’t occur very often.
This post seems to assume that research fields have big hard central problems that are solved with some specific technique or paradigm.
This isn’t always true. [...]
I would say it is basically-always true, but there are some fields (including deep learning today, for purposes of your comment) where the big hard central problems have already been solved, and therefore the many small pieces of progress on subproblems are all of what remains.
And insofar as there remains some problem which is simply not solvable within a certain paradigm, that is a “big hard central problem”, and progress on the smaller subproblems of the current paradigm is unlikely by-default to generalize to whatever new paradigm solves that big hard central problem.
I agree that paradigm shifts can invalidate large amounts or prior work, but it isn’t obvious whether this will occur in AI safety.
I claim it is extremely obvious and very overdetermined that this will occur in AI safety sometime between now and superintelligence. The question which you’d probably find more cruxy is not whether, but when—in particular, does it come before or after AI takes over most of the research?
… but (I claim) that shouldn’t be the cruxy question, because we should not be imagining completely handing off the entire alignment-of-superintelligence problem to early transformative AI; that’s a recipe for slop. We ourselves need to understand a lot about how things will generalize beyond the current paradigm, in order to recognize when that early transformative AI is itself producing research which will generalize beyond the current paradigm, in the process of figuring out how to align superintelligence. If an AI assistant produces alignment research which looks good to a human user, but won’t generalize across the paradigm shifts between here and superintelligence, then that’s a very plausible way for us to die.
I would say it is basically-always true, but there are some fields (including deep learning today, for purposes of your comment) where the big hard central problems have already been solved, and therefore the many small pieces of progress on subproblems are all of what remains.
Maybe, but it is interesting to note that:
A majority of productive work is occuring on small subproblems even if some previous paradigm change was required for this.
For many fields, (e.g., deep learning) many people didn’t recognize (and potentially still don’t recognize!) that the big hard central problem was already solved. This potentially implies it might be non-obvious whether this has been solved and making bets on some existing paradigm which doesn’t obviously suffice can be reasonable.
Things feel more continuous to me than your model suggests.
And insofar as there remains some problem which is simply not solvable within a certain paradigm, that is a “big hard central problem”, and progress on the smaller subproblems of the current paradigm is unlikely by-default to generalize to whatever new paradigm solves that big hard central problem.
It doesn’t seem clear to me this is true in AI safety at all, at least for non-worst-case AI safety.
I claim it is extremely obvious and very overdetermined that this will occur in AI safety sometime between now and superintelligence.
Yes, I added “prior to human obsolescence” (which is what I meant).
Depending on what you mean by “superintelligence”, this isn’t at all obvious to me. It’s not clear to me we’ll have (or will “need”) new paradigms before fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks. Doing this hand over doesn’t directly require understanding whether the AI is specifically producing alignment work that generalizes. For instance, the system might pursue routes other than alignment work and we might determine its judgement/taste/epistimics/etc are good enough based on examining things other than alignment research intended to generalize beyond the current paradigm.
If by superintelligence, you mean wildly superhuman AI, it remains non-obvious to me that new paradigms are needed (though I agree they will pretty likely arise prior to this point due to AIs doing vast quantity of research if nothing else). I think thoughtful and laborious implementation of current paradigm strategies (including substantial experimentation) could directly reduce risk from handing off to superintelligence down to perhaps 25% and I could imagine being argued considerably lower.
It’s not clear to me we’ll have (or will “need”) new paradigms before fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks.
If you want to not die to slop, then “fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks” not a thing which happens at all until the full superintelligence alignment problem is solved. That is how you die to slop.
“fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks”
Suppose we replace “AIs” with “aliens” (or even, some other group of humans). Do you agree that doesn’t (necessarily) kill you due to slop if you don’t have a full solution to the superintelligence alignment problem?
Aliens kill you due to slop, humans depend on the details.
The basic issue here is that the problem of slop (i.e. outputs which look fine upon shallow review but aren’t fine) plus the problem of aligning a parent-AI in such a way that its more-powerful descendants will robustly remain aligned, is already the core of the superintelligence alignment problem. You need to handle those problems in order to safely do the handoff, and at that point the core hard problems are done anyway. Same still applies to aliens: in order to safely do the handoff, you need to handle the “slop/nonslop is hard to verify” problem, and you need to handle the “make sure agents the aliens build will also be aligned, and their children, etc” problem.
If by superintelligence, you mean wildly superhuman AI, it remains non-obvious to me that new paradigms are needed (though I agree they will pretty likely arise prior to this point due to AIs doing vast quantity of research if nothing else). I think thoughtful and laborious implementation of current paradigm strategies (including substantial experimentation) could directly reduce risk from handing off to superintelligence down to perhaps 25% and I could imagine being argued considerably lower.
I find it hard to imagine such a thing being at all plausible. Are you imagining that jupiter brains will be running neural nets? That their internal calculations will all be differentiable? That they’ll be using strings of human natural language internally? I’m having trouble coming up with any “alignment” technique of today which would plausibly generalize to far superintelligence. What are you picturing?
I think you might first reach wildly superhuman AI via scaling up some sort of machine learning (and most of that is something well described as deep learning). Note that I said “needed”. So, I would also count it as acceptable to build the AI with deep learning to allow for current tools to be applied even if something else would be more competitive.
(Note that I was responding to “between now and superintelligence”, not claiming that this would generalize to all superintelligences built in the future.)
I agree that literal jupiter brains will very likely be built using something totally different than machine learning.
Yeah ok. Seems very unlikely to actually happen, and unsure whether it would even work in principle (as e.g. scaling might not take you there at all, or might become more resource intensive faster than the AIs can produce more resources). But I buy that someone could try to intentionally push today’s methods (both AI and alignment) to far superintelligence and simply turn down any opportunity to change paradigm.
Other examples of fields like this include: medicine, mechanical engineering, education, SAT solving, and computer chess.
To give a maybe helpful anecdote—I am a mechanical engineer (though I now work in AI governance), and in my experience that isnt true at least for R&D (e.g. a surgical robot) where you arent just iterating or working in a highly standardized field (aerospace, hvac, mass manufacturing etc). The “bottleneck” in that case is usually figuring out the requirements (e.g. which surgical tools to support? whats the motion range, design envelope for interferences). If those are wrong, the best design will still be wrong.
In more standardized engineering fields the requirements (and user needs) are much better known, so perhaps the bottleneck now becomes a bunch of small things rather than one big thing.
Importantly, this is an example of developing a specific application (surgical robot) rather than advancing the overall field (robots in general). It’s unclear whether the analogy to an individual application or an overall field is more appropriate for AI safety.
Good point. Thinking of robotics overall, it’s much more of a bunch of small stuff than one big thing. Though it depends how far you “zoom out” I guess. Technically Linear Algebra itself, or the Jacobian, is an essential element of robotics. But could also zoom in on a different aspect and then say that “zero backlash gearboxes” (where Harmonic Drive is notable as it’s much more compact and accurate than prev versions—but perhaps a still small effect in the big picture) are the main element. Or PID control, or high resolution encoders.
I’m not quite sure how to think of how these all fit together to form “robotics” and whether they are small elements of a larger thing, or large breakthroughs stacked over the course of many years (where they might appear small at that zoomed out level).
I think that if we take a snapshot in a specific time (e.g. 5 years) in robotics, there will often be one or very few large bottlenecks that are holding it back. Right now it is mostly ML/vision and batteries. 10-15 years ago, maybe it was the CPU real time processing latency or the motor power density. A bit earlier it might be gearbox. These things were fairly major bottlenecks until they got good enough that it switches to a minor revision/iteration regime (nowadays there’s not much left to improve on gearboxes e.g., except for maybe in very specific use cases)
This post seems to assume that research fields have big, hard central problems that are solved with some specific technique or paradigm.
This isn’t always true. Many fields have the property that most of the work is on making small components work slightly better in ways that are very interoperable and don’t have complex interactions. For instance, consider the case of making AIs more capable in the current paradigm. There are many different subcomponents which are mostly independent and interact mostly multiplicatively:
Better training data: This is extremely independent: finding some source of better data or better data filtering can be basically arbitrarily combined with other work on constructing better training data. That’s not to say this parallelizes perfectly (given that work on filtering or curation might obsolete some prior piece of work), but just to say that marginal work can often just myopically improve performance.
Better architectures: This breaks down into a large number of mostly independent categories that typically don’t interact non-trivially:
All of attention, MLPs, and positional embeddings can be worked on independently.
A bunch of hyperparameters can be better understood in parallel
Better optimizers and regularization (often insights within a given optimizer like AdamW can be mixed into other optimizers)
Often larger scale changes (e.g., mamba) can incorporate many or most components from prior architectures.
Better optimized kernels / code
Better hardware
Other examples of fields like this include: medicine, mechanical engineering, education, SAT solving, and computer chess.
I agree that paradigm shifts can invalidate large amounts of prior work (and this has occurred at some point in each of the fields I list above), but it isn’t obvious whether this will occur in AI safety prior to human obsolescence. In many fields, this doesn’t occur very often.
I would say it is basically-always true, but there are some fields (including deep learning today, for purposes of your comment) where the big hard central problems have already been solved, and therefore the many small pieces of progress on subproblems are all of what remains.
And insofar as there remains some problem which is simply not solvable within a certain paradigm, that is a “big hard central problem”, and progress on the smaller subproblems of the current paradigm is unlikely by-default to generalize to whatever new paradigm solves that big hard central problem.
I claim it is extremely obvious and very overdetermined that this will occur in AI safety sometime between now and superintelligence. The question which you’d probably find more cruxy is not whether, but when—in particular, does it come before or after AI takes over most of the research?
… but (I claim) that shouldn’t be the cruxy question, because we should not be imagining completely handing off the entire alignment-of-superintelligence problem to early transformative AI; that’s a recipe for slop. We ourselves need to understand a lot about how things will generalize beyond the current paradigm, in order to recognize when that early transformative AI is itself producing research which will generalize beyond the current paradigm, in the process of figuring out how to align superintelligence. If an AI assistant produces alignment research which looks good to a human user, but won’t generalize across the paradigm shifts between here and superintelligence, then that’s a very plausible way for us to die.
Maybe, but it is interesting to note that:
A majority of productive work is occuring on small subproblems even if some previous paradigm change was required for this.
For many fields, (e.g., deep learning) many people didn’t recognize (and potentially still don’t recognize!) that the big hard central problem was already solved. This potentially implies it might be non-obvious whether this has been solved and making bets on some existing paradigm which doesn’t obviously suffice can be reasonable.
Things feel more continuous to me than your model suggests.
It doesn’t seem clear to me this is true in AI safety at all, at least for non-worst-case AI safety.
Yes, I added “prior to human obsolescence” (which is what I meant).
Depending on what you mean by “superintelligence”, this isn’t at all obvious to me. It’s not clear to me we’ll have (or will “need”) new paradigms before fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks. Doing this hand over doesn’t directly require understanding whether the AI is specifically producing alignment work that generalizes. For instance, the system might pursue routes other than alignment work and we might determine its judgement/taste/epistimics/etc are good enough based on examining things other than alignment research intended to generalize beyond the current paradigm.
If by superintelligence, you mean wildly superhuman AI, it remains non-obvious to me that new paradigms are needed (though I agree they will pretty likely arise prior to this point due to AIs doing vast quantity of research if nothing else). I think thoughtful and laborious implementation of current paradigm strategies (including substantial experimentation) could directly reduce risk from handing off to superintelligence down to perhaps 25% and I could imagine being argued considerably lower.
If you want to not die to slop, then “fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks” not a thing which happens at all until the full superintelligence alignment problem is solved. That is how you die to slop.
Suppose we replace “AIs” with “aliens” (or even, some other group of humans). Do you agree that doesn’t (necessarily) kill you due to slop if you don’t have a full solution to the superintelligence alignment problem?
Aliens kill you due to slop, humans depend on the details.
The basic issue here is that the problem of slop (i.e. outputs which look fine upon shallow review but aren’t fine) plus the problem of aligning a parent-AI in such a way that its more-powerful descendants will robustly remain aligned, is already the core of the superintelligence alignment problem. You need to handle those problems in order to safely do the handoff, and at that point the core hard problems are done anyway. Same still applies to aliens: in order to safely do the handoff, you need to handle the “slop/nonslop is hard to verify” problem, and you need to handle the “make sure agents the aliens build will also be aligned, and their children, etc” problem.
I find it hard to imagine such a thing being at all plausible. Are you imagining that jupiter brains will be running neural nets? That their internal calculations will all be differentiable? That they’ll be using strings of human natural language internally? I’m having trouble coming up with any “alignment” technique of today which would plausibly generalize to far superintelligence. What are you picturing?
I think you might first reach wildly superhuman AI via scaling up some sort of machine learning (and most of that is something well described as deep learning). Note that I said “needed”. So, I would also count it as acceptable to build the AI with deep learning to allow for current tools to be applied even if something else would be more competitive.
(Note that I was responding to “between now and superintelligence”, not claiming that this would generalize to all superintelligences built in the future.)
I agree that literal jupiter brains will very likely be built using something totally different than machine learning.
Yeah ok. Seems very unlikely to actually happen, and unsure whether it would even work in principle (as e.g. scaling might not take you there at all, or might become more resource intensive faster than the AIs can produce more resources). But I buy that someone could try to intentionally push today’s methods (both AI and alignment) to far superintelligence and simply turn down any opportunity to change paradigm.
To give a maybe helpful anecdote—I am a mechanical engineer (though I now work in AI governance), and in my experience that isnt true at least for R&D (e.g. a surgical robot) where you arent just iterating or working in a highly standardized field (aerospace, hvac, mass manufacturing etc). The “bottleneck” in that case is usually figuring out the requirements (e.g. which surgical tools to support? whats the motion range, design envelope for interferences). If those are wrong, the best design will still be wrong.
In more standardized engineering fields the requirements (and user needs) are much better known, so perhaps the bottleneck now becomes a bunch of small things rather than one big thing.
Importantly, this is an example of developing a specific application (surgical robot) rather than advancing the overall field (robots in general). It’s unclear whether the analogy to an individual application or an overall field is more appropriate for AI safety.
Good point. Thinking of robotics overall, it’s much more of a bunch of small stuff than one big thing. Though it depends how far you “zoom out” I guess. Technically Linear Algebra itself, or the Jacobian, is an essential element of robotics. But could also zoom in on a different aspect and then say that “zero backlash gearboxes” (where Harmonic Drive is notable as it’s much more compact and accurate than prev versions—but perhaps a still small effect in the big picture) are the main element. Or PID control, or high resolution encoders.
I’m not quite sure how to think of how these all fit together to form “robotics” and whether they are small elements of a larger thing, or large breakthroughs stacked over the course of many years (where they might appear small at that zoomed out level).
I think that if we take a snapshot in a specific time (e.g. 5 years) in robotics, there will often be one or very few large bottlenecks that are holding it back. Right now it is mostly ML/vision and batteries. 10-15 years ago, maybe it was the CPU real time processing latency or the motor power density. A bit earlier it might be gearbox. These things were fairly major bottlenecks until they got good enough that it switches to a minor revision/iteration regime (nowadays there’s not much left to improve on gearboxes e.g., except for maybe in very specific use cases)
Or, as Orwell would prefer, “typically interact trivially”.