Probably something pretty similar to the AI Futures Project; I have pretty similar beliefs to them (and I’m collaborating with them).
Nod, part of my motivation here is that AI Futures and MIRI are doing similar things, AI Futures’ vibe and approach feels slightly off to me (in a way that seemed probably downstream of Buck/Redwood convos), and… I don’t think the differentiating cruxes are that extreme. And man, it’d be so cool, and feels almost tractable, to resolve some kinds of disagreements… not to the point where the MIRI/Redwood crowd are aligned on everything, but, like, reasonably aligned on “the next steps”, which feels like it’d ameliorate some of the downside risk.
(I acknowledge Eliezer/Nate often talking/arguing in a way that I’d find really frustrating. I would be happy if there were others trying to do overton-shifting that acknowledged what seem-to-me to be the hardest parts)
My own confidence in doom isn’t because I’m like 100% or even 90% on board with the subtler MIRI arguments, it’s the combination of “they seem probably right to me” and “also, when I imagine Buck world playing out, that still seems >50% likely to get everyone killed.[1] Even if for somewhat different reasons than Eliezer’s mainline guesses.[2]
I have different favorite asks for governments.
I have a different sense of what strategy is effective for making asks of governments.
Nod, I was hoping for more like, “what are those asks/strategy?′
I think it’s possible to change company behavior in ways that substantially reduce risk without relying substantially on governments.
Something around here seems cruxy although not sure what followup question to ask. Have there been past examples of companies changing behavior that you think demonstrate proof-of-concept for that working?
(My crux here is that you do need basically all companies bought in on a very high level of caution, which we have seen before, but, the company culture would need to be very different from a move-fast-and-break-things-startup, and it’s very hard to change company cultures, and even if you got OpenAI/Deepmind/Anthropic bought in (a heavy lift, but, maybe achievable), I don’t see how you stop other companies from doing reckless things in the meanwhile)
This probably is slightly-askew of how you’d think about it. In your mind what are the right questions to be asking?
My understanding is that a crucial aspect of Eliezer’s worldview is that we’d be fucked even if we had a 10-year pause where we had access to AGI that we could use to work on developing and aligning superintelligence. I disagree.
This seems wrong to me. I think Eliezer[3] would probably still bet on humanity losing in this scenario, but, I think he’d think we had noticeably better odds. Less because “it’s near-impossible to extract useful work out of safely controlled near-human-intelligence”, and more:
A) in practice, he doesn’t expect researchers to do the work necessary to enable safe longterm control.
And b) there’s a particular kind of intellectual work (“technical philosophy”) they think needs to get done, and it doesn’t seem like the AI companies focused on “use AI to solve alignment” are pointed in remotely the right direction for getting that cognitive work done.” And, even if they did, 10 years is still on the short side, even with a lot of careful AI speedup.
or at least extremely obviously harmed, in a way that is closer in horror-level to “everyone dies” than “a billion people die” or “we lose 90% of the value of the future”
Thanks!
Nod, part of my motivation here is that AI Futures and MIRI are doing similar things, AI Futures’ vibe and approach feels slightly off to me (in a way that seemed probably downstream of Buck/Redwood convos), and… I don’t think the differentiating cruxes are that extreme. And man, it’d be so cool, and feels almost tractable, to resolve some kinds of disagreements… not to the point where the MIRI/Redwood crowd are aligned on everything, but, like, reasonably aligned on “the next steps”, which feels like it’d ameliorate some of the downside risk.
(I acknowledge Eliezer/Nate often talking/arguing in a way that I’d find really frustrating. I would be happy if there were others trying to do overton-shifting that acknowledged what seem-to-me to be the hardest parts)
My own confidence in doom isn’t because I’m like 100% or even 90% on board with the subtler MIRI arguments, it’s the combination of “they seem probably right to me” and “also, when I imagine Buck world playing out, that still seems >50% likely to get everyone killed.[1] Even if for somewhat different reasons than Eliezer’s mainline guesses.[2]
Nod, I was hoping for more like, “what are those asks/strategy?′
Something around here seems cruxy although not sure what followup question to ask. Have there been past examples of companies changing behavior that you think demonstrate proof-of-concept for that working?
(My crux here is that you do need basically all companies bought in on a very high level of caution, which we have seen before, but, the company culture would need to be very different from a move-fast-and-break-things-startup, and it’s very hard to change company cultures, and even if you got OpenAI/Deepmind/Anthropic bought in (a heavy lift, but, maybe achievable), I don’t see how you stop other companies from doing reckless things in the meanwhile)
This probably is slightly-askew of how you’d think about it. In your mind what are the right questions to be asking?
This seems wrong to me. I think Eliezer[3] would probably still bet on humanity losing in this scenario, but, I think he’d think we had noticeably better odds. Less because “it’s near-impossible to extract useful work out of safely controlled near-human-intelligence”, and more:
A) in practice, he doesn’t expect researchers to do the work necessary to enable safe longterm control.
And b) there’s a particular kind of intellectual work (“technical philosophy”) they think needs to get done, and it doesn’t seem like the AI companies focused on “use AI to solve alignment” are pointed in remotely the right direction for getting that cognitive work done.” And, even if they did, 10 years is still on the short side, even with a lot of careful AI speedup.
or at least extremely obviously harmed, in a way that is closer in horror-level to “everyone dies” than “a billion people die” or “we lose 90% of the value of the future”
i.e. Another (outer) alignment failure story, and Going Out With a Whimper, from What failure looks like
I don’t expect him to reply here but I am curious about @Eliezer Yudkowsky or maybe @Rob Bensinger’s reply