I’m not 100% sure which point you’re referring to here. I think you’re talking less about the specific subclaim Ryan was replying, and the more broad “takeoff is probably going to be quite fast.” Is that right?
Yes, sorry to be unclear.
I don’t think I’m actually very sure what you think should be done – if you were following the strategy of “state your beliefs clearly and throw a big brick into the overton window so leaders can talk about what might actually work” (I think this what MIRI is trying to do) but with your own set of beliefs, what sort of things would you say and how would you communicate them?
Probably something pretty similar to the AI Futures Project; I have pretty similar beliefs to them (and I’m collaborating with them). This looks pretty similar to what MIRI does on some level, but involves making different arguments that I think are correct instead of incorrect, and involves making different policy recommendations.
My model of Buck is stating his beliefs clearly along-the-way, but, not really trying to do so in a way that’s aimed at a major overton-shift. Like, “get 10 guys at each lab” seems like “try to work with limited resources” rather than “try to radically change how many resources are available.”
Yep that’s right, I don’t mostly personally aim at major Overton window shift (except through mechanisms like causing AI escape attempts to be caught and made public, which are an important theory of change for my work).
(I’m currently pretty bullish on the MIRI strategy, both because I think the object level claims seem probably correct, and because even in more Buck-shaped-worlds, we only survive or at least avoid being Very Fucked if the government starts preparing now in a way that I’d think Buck and MIRI would roughly agree on. In my opinion there needs to be some work done that at least looks pretty close to what MIRI is currently doing, and I’m curious if you disagree with more on the nuanced-specifics-level or more the “is the overton brick strategy correct?” level)
My guess about the disagreements are:
Things substantially downstream of takeoff speeds:
My understanding is that a crucial aspect of Eliezer’s worldview is that we’d be fucked even if we had a 10-year pause where we had access to AGI that we could use to work on developing and aligning superintelligence. I disagree. But this means that he thinks that some truly crazy stuff has to happen in order for ASI to be aligned, which naturally leads to lots of disagreements. (I am curious whether you agree with him on this point.)
I think it’s possible to change company behavior in ways that substantially reduce risk without relying substantially on governments.
I have different favorite asks for governments.
I have a different sense of what strategy is effective for making asks of governments.
I disagree about many specific details of arguments.
Things about the MIRI strategy
I have various disagreements about how to get things done in the world.
I am more concerned by the downsides, and less excited about the upside, of Eliezer and Nate being public intellectuals on the topics of AI or AI safety.
(I should note that some MIRI staff I’ve talked to share these concerns; in many places here where I said MIRI I really mean the strategy that the org ends up pursuing, rather than what individual MIRI people want.)
My understanding is that a crucial aspect of Eliezer’s worldview is that we’d be fucked even if we had a 10-year pause where we had access to AGI that we could use to work on developing and aligning superintelligence. I disagree.
Given the tradeoffs of extinction and the entire future, the potential for FOOM and/or an irreversible singleton takeover, and the shocking dearth of a scientific understanding of intelligence and agentic behavior, I think a 1,000-year investment into researching AI alignment with very carefully increasing capability levels would be a totally natural trade-off to make. While there are substantive differences between 3 vs 10 years, feeling non-panicked or remotely satisfied with either of them seems to me quite unwise.
(This argues for a slightly weaker position than “10 years certainly cannot be survived”, but it gets one to a pretty similar attitude.)
You might think it would be best for humanity to do a 1,000 year investment, but nevertheless to think that in terms of tractability aiming for something like a 10-year pause is by far the best option available. The value of such a 10-year pause seems pretty sensitive to the success probability of such a pause, so I wouldn’t describe this as “quibbling”.
(I edited out the word ‘quibbling’ within a few mins of writing my comment, before seeing your reply.)
It is an extremely high-pressure scenario, where a single mistake mistake can cause extinction. It is perhaps analogous to a startup in stealth mode that planned to have 1-3 years to build a product, suddenly having a NYT article cover them and force them into launching right now; or being told in the first weeks of an otherwise excellent romantic relationship that you suddenly need to decide whether to get married and have children, or break up. In both cases the difference of a few weeks is not really a big difference, overall you’re still in an undesirable and unnecessarily high-pressure situation. Similarly, 10 years is better than 3 years, but from the perspective of thinking one might have enough time to be confident of getting it right (e.g. 1,000 years), they’re both incredible pressure and very early, panic / extreme stress is a natural response; you’re in a terrible crisis and don’t have any guarantees of being able to get an acceptable outcome.
I am responding to something of a missing mood about the crisis and lack of guarantee of any good outcome. For instance, in many 10-year worlds, we have no hope and are already dead yet walking, and the few that do require extremely high-performance in lots and lots of areas to have a shot, and that reads to me not to be found in the parts of this discussion that hold that it’s plausible humanity will survive in the world histories where we have 10 years until human-superior AGI is built.
Probably something pretty similar to the AI Futures Project; I have pretty similar beliefs to them (and I’m collaborating with them).
Nod, part of my motivation here is that AI Futures and MIRI are doing similar things, AI Futures’ vibe and approach feels slightly off to me (in a way that seemed probably downstream of Buck/Redwood convos), and… I don’t think the differentiating cruxes are that extreme. And man, it’d be so cool, and feels almost tractable, to resolve some kinds of disagreements… not to the point where the MIRI/Redwood crowd are aligned on everything, but, like, reasonably aligned on “the next steps”, which feels like it’d ameliorate some of the downside risk.
(I acknowledge Eliezer/Nate often talking/arguing in a way that I’d find really frustrating. I would be happy if there were others trying to do overton-shifting that acknowledged what seem-to-me to be the hardest parts)
My own confidence in doom isn’t because I’m like 100% or even 90% on board with the subtler MIRI arguments, it’s the combination of “they seem probably right to me” and “also, when I imagine Buck world playing out, that still seems >50% likely to get everyone killed.[1] Even if for somewhat different reasons than Eliezer’s mainline guesses.[2]
I have different favorite asks for governments.
I have a different sense of what strategy is effective for making asks of governments.
Nod, I was hoping for more like, “what are those asks/strategy?′
I think it’s possible to change company behavior in ways that substantially reduce risk without relying substantially on governments.
Something around here seems cruxy although not sure what followup question to ask. Have there been past examples of companies changing behavior that you think demonstrate proof-of-concept for that working?
(My crux here is that you do need basically all companies bought in on a very high level of caution, which we have seen before, but, the company culture would need to be very different from a move-fast-and-break-things-startup, and it’s very hard to change company cultures, and even if you got OpenAI/Deepmind/Anthropic bought in (a heavy lift, but, maybe achievable), I don’t see how you stop other companies from doing reckless things in the meanwhile)
This probably is slightly-askew of how you’d think about it. In your mind what are the right questions to be asking?
My understanding is that a crucial aspect of Eliezer’s worldview is that we’d be fucked even if we had a 10-year pause where we had access to AGI that we could use to work on developing and aligning superintelligence. I disagree.
This seems wrong to me. I think Eliezer[3] would probably still bet on humanity losing in this scenario, but, I think he’d think we had noticeably better odds. Less because “it’s near-impossible to extract useful work out of safely controlled near-human-intelligence”, and more:
A) in practice, he doesn’t expect researchers to do the work necessary to enable safe longterm control.
And b) there’s a particular kind of intellectual work (“technical philosophy”) they think needs to get done, and it doesn’t seem like the AI companies focused on “use AI to solve alignment” are pointed in remotely the right direction for getting that cognitive work done.” And, even if they did, 10 years is still on the short side, even with a lot of careful AI speedup.
or at least extremely obviously harmed, in a way that is closer in horror-level to “everyone dies” than “a billion people die” or “we lose 90% of the value of the future”
My understanding is that a crucial aspect of Eliezer’s worldview is that we’d be fucked even if we had a 10-year pause where we had access to AGI that we could use to work on developing and aligning superintelligence. I disagree. But this means that he thinks that some truly crazy stuff has to happen in order for ASI to be aligned, which naturally leads to lots of disagreements. (I am curious whether you agree with him on this point.)
I don’t feel competent to have that strong opinion on this, but I’m like 60% on “you need to do some major ‘solve difficult technical philosophy’ that you can only partially outsource to AI, that still requires significant serial time.”
And, while it’s hard for someone withy my (lack-of) background to have a strong opinion, it feels intuitively crazy to me put that as <15% likely, which feels sufficient to me to motivate “indefinite pause is basically necessary, or, humanity has clearly fucked up if we don’t do it, even if it turned out to be on the easier side.”
indefinite pause is basically necessary, or, humanity has clearly fucked up if we don’t do it
I think it’s really important to not equivocate between “necessary” and “humanity has clearly fucked up if we don’t do it.”
“Necessary” means “we need this in order to succeed; there’s no chance of success without this”. Because humanity is going to massively underestimate the risk of AI takeover, there is going to be lots of stuff that doesn’t happen that would have passed cost-benefit analysis for humanity.
If you think it’s 15% likely that we need really large amounts of serial time to prevent AI takeover, then it’s very easy to imagine situations where the best strategy on the margin is to work on the other 85% of worlds. I have no idea why you’re describing this as “basically necessary”.
Yes, sorry to be unclear.
Probably something pretty similar to the AI Futures Project; I have pretty similar beliefs to them (and I’m collaborating with them). This looks pretty similar to what MIRI does on some level, but involves making different arguments that I think are correct instead of incorrect, and involves making different policy recommendations.
Yep that’s right, I don’t mostly personally aim at major Overton window shift (except through mechanisms like causing AI escape attempts to be caught and made public, which are an important theory of change for my work).
My guess about the disagreements are:
Things substantially downstream of takeoff speeds:
My understanding is that a crucial aspect of Eliezer’s worldview is that we’d be fucked even if we had a 10-year pause where we had access to AGI that we could use to work on developing and aligning superintelligence. I disagree. But this means that he thinks that some truly crazy stuff has to happen in order for ASI to be aligned, which naturally leads to lots of disagreements. (I am curious whether you agree with him on this point.)
I think it’s possible to change company behavior in ways that substantially reduce risk without relying substantially on governments.
I have different favorite asks for governments.
I have a different sense of what strategy is effective for making asks of governments.
I disagree about many specific details of arguments.
Things about the MIRI strategy
I have various disagreements about how to get things done in the world.
I am more concerned by the downsides, and less excited about the upside, of Eliezer and Nate being public intellectuals on the topics of AI or AI safety.
(I should note that some MIRI staff I’ve talked to share these concerns; in many places here where I said MIRI I really mean the strategy that the org ends up pursuing, rather than what individual MIRI people want.)
Given the tradeoffs of extinction and the entire future, the potential for FOOM and/or an irreversible singleton takeover, and the shocking dearth of a scientific understanding of intelligence and agentic behavior, I think a 1,000-year investment into researching AI alignment with very carefully increasing capability levels would be a totally natural trade-off to make. While there are substantive differences between 3 vs 10 years, feeling non-panicked or remotely satisfied with either of them seems to me quite unwise.
(This argues for a slightly weaker position than “10 years certainly cannot be survived”, but it gets one to a pretty similar attitude.)
You might think it would be best for humanity to do a 1,000 year investment, but nevertheless to think that in terms of tractability aiming for something like a 10-year pause is by far the best option available. The value of such a 10-year pause seems pretty sensitive to the success probability of such a pause, so I wouldn’t describe this as “quibbling”.
(I edited out the word ‘quibbling’ within a few mins of writing my comment, before seeing your reply.)
It is an extremely high-pressure scenario, where a single mistake mistake can cause extinction. It is perhaps analogous to a startup in stealth mode that planned to have 1-3 years to build a product, suddenly having a NYT article cover them and force them into launching right now; or being told in the first weeks of an otherwise excellent romantic relationship that you suddenly need to decide whether to get married and have children, or break up. In both cases the difference of a few weeks is not really a big difference, overall you’re still in an undesirable and unnecessarily high-pressure situation. Similarly, 10 years is better than 3 years, but from the perspective of thinking one might have enough time to be confident of getting it right (e.g. 1,000 years), they’re both incredible pressure and very early, panic / extreme stress is a natural response; you’re in a terrible crisis and don’t have any guarantees of being able to get an acceptable outcome.
I am responding to something of a missing mood about the crisis and lack of guarantee of any good outcome. For instance, in many 10-year worlds, we have no hope and are already dead yet walking, and the few that do require extremely high-performance in lots and lots of areas to have a shot, and that reads to me not to be found in the parts of this discussion that hold that it’s plausible humanity will survive in the world histories where we have 10 years until human-superior AGI is built.
What are your favorite asks for governments?
Thanks!
Nod, part of my motivation here is that AI Futures and MIRI are doing similar things, AI Futures’ vibe and approach feels slightly off to me (in a way that seemed probably downstream of Buck/Redwood convos), and… I don’t think the differentiating cruxes are that extreme. And man, it’d be so cool, and feels almost tractable, to resolve some kinds of disagreements… not to the point where the MIRI/Redwood crowd are aligned on everything, but, like, reasonably aligned on “the next steps”, which feels like it’d ameliorate some of the downside risk.
(I acknowledge Eliezer/Nate often talking/arguing in a way that I’d find really frustrating. I would be happy if there were others trying to do overton-shifting that acknowledged what seem-to-me to be the hardest parts)
My own confidence in doom isn’t because I’m like 100% or even 90% on board with the subtler MIRI arguments, it’s the combination of “they seem probably right to me” and “also, when I imagine Buck world playing out, that still seems >50% likely to get everyone killed.[1] Even if for somewhat different reasons than Eliezer’s mainline guesses.[2]
Nod, I was hoping for more like, “what are those asks/strategy?′
Something around here seems cruxy although not sure what followup question to ask. Have there been past examples of companies changing behavior that you think demonstrate proof-of-concept for that working?
(My crux here is that you do need basically all companies bought in on a very high level of caution, which we have seen before, but, the company culture would need to be very different from a move-fast-and-break-things-startup, and it’s very hard to change company cultures, and even if you got OpenAI/Deepmind/Anthropic bought in (a heavy lift, but, maybe achievable), I don’t see how you stop other companies from doing reckless things in the meanwhile)
This probably is slightly-askew of how you’d think about it. In your mind what are the right questions to be asking?
This seems wrong to me. I think Eliezer[3] would probably still bet on humanity losing in this scenario, but, I think he’d think we had noticeably better odds. Less because “it’s near-impossible to extract useful work out of safely controlled near-human-intelligence”, and more:
A) in practice, he doesn’t expect researchers to do the work necessary to enable safe longterm control.
And b) there’s a particular kind of intellectual work (“technical philosophy”) they think needs to get done, and it doesn’t seem like the AI companies focused on “use AI to solve alignment” are pointed in remotely the right direction for getting that cognitive work done.” And, even if they did, 10 years is still on the short side, even with a lot of careful AI speedup.
or at least extremely obviously harmed, in a way that is closer in horror-level to “everyone dies” than “a billion people die” or “we lose 90% of the value of the future”
i.e. Another (outer) alignment failure story, and Going Out With a Whimper, from What failure looks like
I don’t expect him to reply here but I am curious about @Eliezer Yudkowsky or maybe @Rob Bensinger’s reply
I don’t feel competent to have that strong opinion on this, but I’m like 60% on “you need to do some major ‘solve difficult technical philosophy’ that you can only partially outsource to AI, that still requires significant serial time.”
And, while it’s hard for someone withy my (lack-of) background to have a strong opinion, it feels intuitively crazy to me put that as <15% likely, which feels sufficient to me to motivate “indefinite pause is basically necessary, or, humanity has clearly fucked up if we don’t do it, even if it turned out to be on the easier side.”
I think it’s really important to not equivocate between “necessary” and “humanity has clearly fucked up if we don’t do it.”
“Necessary” means “we need this in order to succeed; there’s no chance of success without this”. Because humanity is going to massively underestimate the risk of AI takeover, there is going to be lots of stuff that doesn’t happen that would have passed cost-benefit analysis for humanity.
If you think it’s 15% likely that we need really large amounts of serial time to prevent AI takeover, then it’s very easy to imagine situations where the best strategy on the margin is to work on the other 85% of worlds. I have no idea why you’re describing this as “basically necessary”.