I think the threshold of brainpower where you can start making meaningful progress on the technical problem of AGI alignment is significantly higher than the threshold where you can start making meaningful progress toward AGI.
Simply put, it’s a harder problem. More specifically, it’s got significantly worse feedback signals: it’s easier to tell when / on what tasks your performance is and is not going up, compared to telling when you’ve made a thing that will continue pursuing XYZ as it gets much smarter. You can also tell because progress in capabilities seems to accelerate given more resources, but that is (according to me) barely true or not true in alignment, so far.
My own experience (which I don’t expect you to update much on, but this is part of why I believe these things) is that I’m really smart and as far as I can tell, I’m too dumb to even really get started (cf. https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html). I’ve worked with people who are smarter than I am, and they also are AFAICT totally failing to address the problem. (To be clear, I definitely don’t think it’s “just about being smart”; but I do think there’s some threshold effect.) It’s hard to even stay focused on the problem for the years that it apparently takes to work through wrong preconceptions, bad ideas, etc., and you (or rather, I, and ~everyone I’ve directly worked with) apparently have to do that in order to understand the problem.
I think the threshold of brainpower where you can start making meaningful progress on the technical problem of AGI alignment is significantly higher than the threshold where you can start making meaningful progress toward AGI.
This is also my guess, but I think required intelligence thresholds (for the individual scientists/inventors involved) are only weak evidence about relative problem difficulty (for society, which seems to me the relevant sort of “difficulty” here).
I’d guess the work of Newton, Maxwell, and Shannon required a higher intelligence threshold-for-making-progress than was required to help invent decent steam engines or rockets, for example, but it nonetheless seems to me that the latter were meaningfully “harder” for society to invent. (Most obviously in the sense that their invention took more person-hours, but I suspect they similarly required more experience of frustration, taking on of personal risk, and other such things which tend to make given populations less likely to solve problems in given calendar-years).
required intelligence thresholds (for the individual scientists/inventors involved) are only weak evidence about relative problem difficulty (for society, which seems to me the relevant sort of “difficulty” here).
This sounds right, yeah. If I had to guess, I would guess AGI alignment is both kinds of problem (Maxwell/Faraday equations, and rockets).
So I wrote:
Simply put, it’s a harder problem. More specifically, it’s got significantly worse feedback signals: it’s easier to tell when / on what tasks your performance is and is not going up, compared to telling when you’ve made a thing that will continue pursuing XYZ as it gets much smarter. You can also tell because progress in capabilities seems to accelerate given more resources, but that is (according to me) barely true or not true in alignment, so far.
My own experience (which I don’t expect you to update much on, but this is part of why I believe these things) is that I’m really smart and as far as I can tell, I’m too dumb to even really get started (cf. https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html). I’ve worked with people who are smarter than I am, and they also are AFAICT totally failing to address the problem. (To be clear, I definitely don’t think it’s “just about being smart”; but I do think there’s some threshold effect.) It’s hard to even stay focused on the problem for the years that it apparently takes to work through wrong preconceptions, bad ideas, etc., and you (or rather, I, and ~everyone I’ve directly worked with) apparently have to do that in order to understand the problem.
This is also my guess, but I think required intelligence thresholds (for the individual scientists/inventors involved) are only weak evidence about relative problem difficulty (for society, which seems to me the relevant sort of “difficulty” here).
I’d guess the work of Newton, Maxwell, and Shannon required a higher intelligence threshold-for-making-progress than was required to help invent decent steam engines or rockets, for example, but it nonetheless seems to me that the latter were meaningfully “harder” for society to invent. (Most obviously in the sense that their invention took more person-hours, but I suspect they similarly required more experience of frustration, taking on of personal risk, and other such things which tend to make given populations less likely to solve problems in given calendar-years).
This sounds right, yeah. If I had to guess, I would guess AGI alignment is both kinds of problem (Maxwell/Faraday equations, and rockets).