That’s why I want to convince more people that actually understand the problem to identify and work like mad on the hard parts like the world is on fire, instead of hoping it somehow isn’t or can be put out.
FYI something similar to this was basically my “last year’s plan”, and it’s on hold because I think it is plausible right now to meaningfully move the overton window around pauses or at least dramatic slowdowns. (This is based on seeing the amount of traffic AI 2027 got, and the number of NatSec endorsements that If Anyone Builds It Got, and having recently gotten to read it and thinking it is pretty good)
I think if Yoshua Bengio, Geoffrey Hinton, or Dario actually really tried to move overton windows instead of sort of trying to manuever within the current one, it’d make a huge difference. (I don’t think this means it’s necessarily tractable for most people to help. It’s a high-skill operation)
(Another reason for me putting “increase the rate of people able to think seriously about the problem” on hold is that my plans there weren’t getting that much traction. I have some models of what I’d try next when/if I return to it but it wasn’t a slam dunk to keep going)
What I think would be really useful is more dialogue across “party lines” on strategies. I think I’m seeing nontrivial polarization, because attempted dialogues seem to usually end in frustration rather than progress.
I’m thinking of a slightly different plan than “increase the rate of people being able to think seriously about the problem” I’d like to convince people who already understand the problem to accept that pause is unlikely and alignment is not known to be impossibly hard even on short timelines. If they agreed with both of those it seems like they’d want to work on aligning LLM-based AGI, on what looks like the current default path. I think just a few more might help nontrivially. The number of people going “straight for the throat” is very small.
I’m interested in the opposite variant too, trying to convince people working on “aligning” current LLMs to focus more on the hard parts we haven’t encountered yet.
I do think shifting the Overton window is possible. Actually I think it’s almost inevitable; I just don’t know if it happens soon enough to help. I just think a pause is unlikely even if the public screams for it—but I’m not sure, particularly if that happens sooner than I think. Public opinion can shift rapidly.
The Bengio/Hinton/Dario efforts seem like they are changing the Overton window, but cautiously. PR seems to require both skill and status.
Getting entirely new people to understand the hard parts of the problem and then understand all of the technical skills or theoretical subtleties is another route. I haven’t thought as much about that one because I don’t have a public platform, but I do try to engage newcomers to LW in case they’re the type to actually figure things out enough to really help.
I’m thinking of a slightly different plan than “increase the rate of people being able to think seriously about the problem” I’d like to convince people who already understand the problem to accept that pause is unlikely and alignment is not known to be impossibly hard even on short timelines. …
...Getting entirely new people to understand the hard parts of the problem and then understand all of the technical skills or theoretical subtleties is another route. I haven’t thought as much about that one because I don’t have a public platform,
I think it’s useful to think of “rate of competent people think seriously about the right problems” is, like, the “units” of success for various flavors of plans here. There are different bottlenecks.
I currently think the rate-limiting reagent is “people who understand the problem”. And I think that’s in turn rate-limited on:
“the problem is sort of wonky and hard with bad feedbackloops and there’s a cluster of attitudes and skills you need to have any traction sitting and grokking the problem.”
“we don’t have much ability to evaluate progress on the problem, which in turn means it’s harder to provide a good funding/management infrastructure for it.”
Better education can help with your first problem, although that pulls people who understand the problem away from working on it.
I agree that the difficulty of evaluating progress is a big problem. One solution is to just fund more alignment research. I am dismayed if it’s true that Open Phil is holding back available funding because they don’t see good projects. Just fund them and get more donations later when the whole world is properly more freaked out. If it’s bad research now, at least those people will spend some time thi8nking about and debating what might be better research.
I’d also love to see funding directly on people understanding the whole problem including the several hard parts. It is a lot easier to evaluate whether someone is learning a curriculum than doing good research. Exposing people to a lot of perspectives and arguments and sort of paying and forcing them to think hard about it should at least improve their choice of research and understanding of the problem.
I definitely agree that understanding the problem is the rate-limiting factor. I’d argue that it’s not just the technical problem you need to understand, but the surrounding factors, eg how likely is a pause or slowdown and how likely is it we reach AGI how soon on the default path. I’m afraid some of our best technical thinkers understand the technical problem but are confused about how unlikely it is that any approach but directLLM descendents will be the first critical attempt at aligning AGI. But arguments for or against that are quite complex.
I think “moving an overton window” is a sort of different operation than what Bengio/Hinton/Dario are doing. (Or, like, yes, they are expanding an overton window, but, their entire strategy for doing so seems predicated on a certain kind of caution/incrementalness)
I think there are two pretty different workable strategies:
say things somewhat outside the window, picking your battles
make bold claims, while believing in your convictions with enough strength and without looking “attackable for mispeaking”.
Going halfway from one to the other doesn’t actually work, and the second one doesn’t really work unless you actually do have those convictions. There are a few people trying to do the latter, but, most of them just don’t actually have the reputation that’d make anyone care (and also there’s a lot of skill to doing it right). I think if at least one of Yoshua/Geoffrey/Dario/Demis switched strategies it’d make a big difference.
FYI something similar to this was basically my “last year’s plan”, and it’s on hold because I think it is plausible right now to meaningfully move the overton window around pauses or at least dramatic slowdowns. (This is based on seeing the amount of traffic AI 2027 got, and the number of NatSec endorsements that If Anyone Builds It Got, and having recently gotten to read it and thinking it is pretty good)
I think if Yoshua Bengio, Geoffrey Hinton, or Dario actually really tried to move overton windows instead of sort of trying to manuever within the current one, it’d make a huge difference. (I don’t think this means it’s necessarily tractable for most people to help. It’s a high-skill operation)
(Another reason for me putting “increase the rate of people able to think seriously about the problem” on hold is that my plans there weren’t getting that much traction. I have some models of what I’d try next when/if I return to it but it wasn’t a slam dunk to keep going)
What I think would be really useful is more dialogue across “party lines” on strategies. I think I’m seeing nontrivial polarization, because attempted dialogues seem to usually end in frustration rather than progress.
I’m thinking of a slightly different plan than “increase the rate of people being able to think seriously about the problem” I’d like to convince people who already understand the problem to accept that pause is unlikely and alignment is not known to be impossibly hard even on short timelines. If they agreed with both of those it seems like they’d want to work on aligning LLM-based AGI, on what looks like the current default path. I think just a few more might help nontrivially. The number of people going “straight for the throat” is very small.
I’m interested in the opposite variant too, trying to convince people working on “aligning” current LLMs to focus more on the hard parts we haven’t encountered yet.
I do think shifting the Overton window is possible. Actually I think it’s almost inevitable; I just don’t know if it happens soon enough to help. I just think a pause is unlikely even if the public screams for it—but I’m not sure, particularly if that happens sooner than I think. Public opinion can shift rapidly.
The Bengio/Hinton/Dario efforts seem like they are changing the Overton window, but cautiously. PR seems to require both skill and status.
Getting entirely new people to understand the hard parts of the problem and then understand all of the technical skills or theoretical subtleties is another route. I haven’t thought as much about that one because I don’t have a public platform, but I do try to engage newcomers to LW in case they’re the type to actually figure things out enough to really help.
I think it’s useful to think of “rate of competent people think seriously about the right problems” is, like, the “units” of success for various flavors of plans here. There are different bottlenecks.
I currently think the rate-limiting reagent is “people who understand the problem”. And I think that’s in turn rate-limited on:
“the problem is sort of wonky and hard with bad feedbackloops and there’s a cluster of attitudes and skills you need to have any traction sitting and grokking the problem.”
“we don’t have much ability to evaluate progress on the problem, which in turn means it’s harder to provide a good funding/management infrastructure for it.”
Better education can help with your first problem, although that pulls people who understand the problem away from working on it.
I agree that the difficulty of evaluating progress is a big problem. One solution is to just fund more alignment research. I am dismayed if it’s true that Open Phil is holding back available funding because they don’t see good projects. Just fund them and get more donations later when the whole world is properly more freaked out. If it’s bad research now, at least those people will spend some time thi8nking about and debating what might be better research.
I’d also love to see funding directly on people understanding the whole problem including the several hard parts. It is a lot easier to evaluate whether someone is learning a curriculum than doing good research. Exposing people to a lot of perspectives and arguments and sort of paying and forcing them to think hard about it should at least improve their choice of research and understanding of the problem.
I definitely agree that understanding the problem is the rate-limiting factor. I’d argue that it’s not just the technical problem you need to understand, but the surrounding factors, eg how likely is a pause or slowdown and how likely is it we reach AGI how soon on the default path. I’m afraid some of our best technical thinkers understand the technical problem but are confused about how unlikely it is that any approach but directLLM descendents will be the first critical attempt at aligning AGI. But arguments for or against that are quite complex.
I think “moving an overton window” is a sort of different operation than what Bengio/Hinton/Dario are doing. (Or, like, yes, they are expanding an overton window, but, their entire strategy for doing so seems predicated on a certain kind of caution/incrementalness)
I think there are two pretty different workable strategies:
say things somewhat outside the window, picking your battles
make bold claims, while believing in your convictions with enough strength and without looking “attackable for mispeaking”.
Going halfway from one to the other doesn’t actually work, and the second one doesn’t really work unless you actually do have those convictions. There are a few people trying to do the latter, but, most of them just don’t actually have the reputation that’d make anyone care (and also there’s a lot of skill to doing it right). I think if at least one of Yoshua/Geoffrey/Dario/Demis switched strategies it’d make a big difference.