This is a draft post to hold my thoughts on Disaster-By-Default.
I have an intuition that either the SUV Triad can be turned into an argument for Disaster-By-Default and so I created this post to explore this possibility.
However, I consider this post experimental in that it may not pan out.
AGI leads to some kind of societal scale catastrophe by default
Clarification: This isn’t a claim that it wouldn’t be possible to avoid this fate if humanity decided to wake up and decide it’s serious about winning. This is just a claim about what happens by default.
Why might this be true?
Recap — 𝚃 𝙷 𝙴 🅂🅄🅅 𝚃 𝚁 𝙸 𝙰 𝙳 — Old version, to be updated to the latest version just before release
✷ The development of advanced AI technologies will have a massive impact on society given the essentially infinite ways to deploy such a general technology. There are lots of ways this could go well, and lots of ways we all die this could go extremely poorly.
Catastrophic malfunctions, permanent dictatorships, AI-designed pandemics, mass cyberattacks, information warfare, war machines lacking any mercy, the list goes on...
The kicker: Interaction effects. Technological unemployment breaking politics. Loss of control + automated warfare. AI-enabled theft of specialised biological models...
In more detail...
i) At this stage I’m not claiming any particular timelines.
I believe it’s likely to be absurdly quite fast, but I don’t claim this until we get to 🅂𝙿 𝙴 𝙴 𝙳 😅⏳.
I suspect that often when people doubt this claim, they’ve implicitly assumed that I was talking about the short or medium term, rather than the long term 🤔. After all, the claim that there are many ways that AI could plausible lead to dramatic benefits or harms over the next 50 or 100 years feels like an extremely robust claim. There are many things that a true artificial general intelligence could do. It’s mainly just a question of how long it takes to develop the technology.
ii) It’s quite likely that at least some of these threats will turn out to be overhyped. That doesn’t defeat this argument! Even in the unlikely event that most of these threats turned out to be paper tigers, as claimed in The Kicker, a single one of these threats going through could cause absurd amounts of damage.
iii) TODO
🅄 𝙽 𝙲 𝙴 𝚁 𝚃 𝙰 𝙸 𝙽 𝚃 𝚈 – 🌅💥:
✷ We have massive disagreement on what we expect the development of AI, let alone the best strategy[46]. Making the wrong call could prove catastrophic.
Strategically: Accelerate or pause? What’s the offence-defence balance? Do we need global unity or to win the arms race? Who (or what) should we be aligning AIs to? Can AI “do our alignment homework” for us?
Situationally[47]: Will LLM’s scale AGI or are they merely a dead-end? Should we expect AGI in 2027 or is it more than a decade away? Will AI create jobs and cure economic malaise or will it crush, destroy and obliviate them? Will the masses embrace as the key to their salvation or call for a Butlerian Jihad?
The kicker: We’re facing these issues at an extremly bad time – when both trust and society’s epistemic infrastructure is crumbling. Even if our task were epistemically easy, we might still fail.
In more detail...
i) A lot of this uncertainty just seems inherently really hard to resolve. Predicting the future is hard.
ii) However hard this is to resolve in theory, it’s worse in practise. Instead of an objective search for the truth, these discussions are distorted by all these different factors including money, social status and the need for meaning.
iii) More on the kicker: We’re seeing increasing polarisation, less trust in media and experts[48] and AI stands to make this worse. This is not where we want to be starting from and who knows how long this might take to resolve?
🅂 𝙿 𝙴 𝙴 𝙳 – 😅⏳:
✷ AI Is developing incredibly rapidly… We have limited time to act and to figure out how to act.[49].
Worse: We may already be witnessing an AI arms race. Stopping this may simply not be possible – forget about AGI, an inconcievably large prize – many see winning as a matter of survival.
The kicker: It’s entirely plausible that as the stakes increase AI actually accelerates. That we look back at the current rate of progress and laugh about how we used to consider it fast.
In more detail...
i) The speed at which things are happening makes the problem much harder. Humanity does have the ability to deal with civilisational scale challenges. It’s not easy—global co-ordination is incredibly difficult to achieve—but it’s possible. However, one at a time is a lot easier than dozens. When this happens, its hard to give each problem the proper attention it deserves 😅🌫️💣💣💣
ii) Even if timelines aren’t short, we might still be in trouble if the take-off speed is fast. Unfortunately, humanity is not very good at preparing for abstract, speculative-seeming threats ahead of time.
iii) Even if neither timelines nor take-off speeds are fast in an absolute sense, we might still expect disaster if they are fast in a relative sense. Governance—especially global governance—tends to proceed rather slowly. Even though it can happen much faster when there’s a crisis, sometimes problems need to be solved ahead of time and once you’re in them it’s too late. As an example, once an AI induced pandemic is spreading, you may have already lost.
I believe that the following reflections provide a strong, but defeasible reason to believe the Disaster-By-Default hypothesis.
Recap: Reflections on The SUV Triad — Old version, to be updated to the latest version just before release
Reflections—Why the SUV Triad is Fucking Scary
Many of the threats constitute civilisational-level risk by themselves. We could successful navigate all the other threats, but simply drop the ball once and all that could be for naught.
Even if a threat can’t led to catastrophe, it can still distract us from those that can. It’s hard to avoid catastrophe when we don’t know where to focus our efforts ⚪🪙⚫.
The speed of development makes this much harder. Even if alignment were easy and governance didn’t require anything special, we could still fail because certain people have decided that we have to race toward AGI as fast as possible.
That said, I want to explore making a more rigorous argument. Here’s the general intuition:
That said, I want to try to see if it’s possible to make a more rigorous argument.
Consider the following model. I wonder whether it would be applicable to the SUV Triad:
Suppose we had a series of ten independent decisions and that we have to get each of them right in order to survive. Assume each is a binary choice. Further assume that these choices are quite confusing so we only have a 60% chance of getting each right. Then we’d only have a 0.01% chance of survival.
The question I want to explore is whether it would be appropriate to model the SUV Triad with something along these lines.
That said, even if it made sense from an inside view perspective, we’d still have to adjust for the outside view. The hardness of each decision is not independent, but most likely tied to some general factors such as: overall problem difficulty, general societal competence, the speed at which we have to face these problems. In other words, treating each probability as independent, likely makes the probability look more extreme.
So the first step would be figuring out whether this model is applicable (not in an absolute sense, but whether or not it’d be appropriate for making a defeasible claim.
How could we evaluate this claim? Here’s one such method:
Define a threshold of catastrophe
Make a list of threats that could meet that bar
Make a list of key choices that we’d have to make in relation to these threats and where making the wrong choice could prove catastrophic
Estimate the difficulty of each choice
Consider the degree of correlation between various threats/choices
Potentially: estimate how many additions such choices there may be such that we’ve missed
This model could fail if the number of key choices wasn’t that large, these choices weren’t hard or due to correlation.
And if the model were applicable, then we’d have to consider possible defeaters:
“There’s too much uncertainty here” - <todo reponse>
“Humanity has overcome significant challenges in the past” - <todo reponse>
“There will almost certainly be a smaller wake-up call before a bigger disaster” - <todo response>
In more detail/further objections...
I suggest ignoring for now. Copied from a different context, so needs to be adapted:
Most notable counterargument: “We most likely encounter smaller wake up calls first. Society wakes up by default”.
Rich and powerful actors will be incentivised to use their influence to downplay the role of AI in any such incidents, argue that we should focus solely on that threat model or even assert that that further accelerating capabilities is the best defense. Worse, we’ll likely be in the middle of a US-China arms race where there’s national security issues at play that could make slowing things down feel almost inconcievable.
Maybe there is eventually an incident that is too serious to ignore, but by then it will probably be too late. Capabilities increase fast and we should expect a major overhang of elicitable capabilties, so we would need to trigger a stop significantly before threshold of dangerous capabilities.
“But the AI industry doesn’t want to destroy society. They’re in society” — Look at what happened with “gain of function” research. If it had been prominently accepted that gain of function is bad, then that would have caused a massive loss of status for medical researchers, so they didn’t allow that to happend. The same incentives apply to AI developers.
“Open source/weights models are behind the frontier and it’s possible that society will enforce restrictions on them, even if it’ll be impossible to prevent closed source development from continuing” — Not that far behind and attempting to restrict open-source models will result in massive pushback/subversion. There’s a large community dedicated to open source software, for some it’s essentially a substitute for a religion, for others it’s the basis of their company or their national competitiveness. Even if the entire UN security council agreed, they couldn’t just say, “Stop!” and expect it to be instantly obeyed. Our default expectation should be that capabilities broadly proliferate.
Second most notable counterargument: “AI is aligned by default”
This looked much more plausible before inference time compute took off.
Third most notable counterargument: “We’ve overcome challenges in the past, we should expect that we most likely stumble through”
AI is unique (generality, speed of development, proliferation).
Much more plausible if there was a narrower threat model or if development moved more slowly.
I was persuaded by Professor David Manly that I didn’t need to argue for Disaster-By-Default in order to justify wise AI advisors and that focusing too much on this aspect would simply cause me to lose people, so I needed somewhere to paste this content.
I just clicked “Remove from Frontpage”. I’m unsure if it does anything for short-form posts though.
Also, the formatting on this is wild, what’s the context for that?
Just experimenting to see what’s possible. Copied it directly from that post, haven’t had time to rethink the formatting yet now that it is its own post. Nowhere near as wild as it gets in the main post though!
😱✂️💣💣💣💣💣 𝙳 𝙸 𝚂 𝙰 𝚂 𝚃 𝙴 𝚁 - 𝙱 𝚈 - 𝙳 𝙴 𝙵 𝙰 𝚄 𝙻 𝚃 ? - Public Draft
This is a draft post to hold my thoughts on Disaster-By-Default.
I have an intuition that either the SUV Triad can be turned into an argument for Disaster-By-Default and so I created this post to explore this possibility.
However, I consider this post experimental in that it may not pan out.
☞ The 𝙳 𝙸 𝚂 𝙰 𝚂 𝚃 𝙴 𝚁 - 𝙱 𝚈 - 𝙳 𝙴 𝙵 𝙰 𝚄 𝙻 𝚃 hypothesis:
AGI leads to some kind of societal scale catastrophe by default
Clarification: This isn’t a claim that it wouldn’t be possible to avoid this fate if humanity decided to wake up and decide it’s serious about winning. This is just a claim about what happens by default.
Why might this be true?
Recap — 𝚃 𝙷 𝙴 🅂🅄🅅 𝚃 𝚁 𝙸 𝙰 𝙳 — Old version, to be updated to the latest version just before release
For convenience, I’ve copied the description of the SUV Triad from my post Why the focus on wise AI advisors?
Covered in reverse order:
🅅 𝚄 𝙻 𝙽 𝙴 𝚁 𝙰 𝙱 𝙸 𝙻 𝙸 𝚃 𝚈 – 🌊🚣:
✷ The development of advanced AI technologies will have a massive impact on society given the essentially infinite ways to deploy such a general technology. There are lots of ways this could go well, and lots of ways w
e all diethis could go extremely poorly.In more detail...
i) At this stage I’m not claiming any particular timelines.
I believe it’s likely to be
absurdlyquite fast, but I don’t claim this until we get to 🅂𝙿 𝙴 𝙴 𝙳 😅⏳.I suspect that often when people doubt this claim, they’ve implicitly assumed that I was talking about the short or medium term, rather than the long term 🤔. After all, the claim that there are many ways that AI could plausible lead to dramatic benefits or harms over the next 50 or 100 years feels like an extremely robust claim. There are many things that a true artificial general intelligence could do. It’s mainly just a question of how long it takes to develop the technology.
ii) It’s quite likely that at least some of these threats will turn out to be overhyped. That doesn’t defeat this argument! Even in the unlikely event that most of these threats turned out to be paper tigers, as claimed in The Kicker, a single one of these threats going through could cause absurd amounts of damage.
iii) TODO
🅄 𝙽 𝙲 𝙴 𝚁 𝚃 𝙰 𝙸 𝙽 𝚃 𝚈 – 🌅💥:
✷ We have massive disagreement on what we expect the development of AI, let alone the best strategy[46]. Making the wrong call could prove catastrophic.
In more detail...
i) A lot of this uncertainty just seems inherently really hard to resolve. Predicting the future is hard.
ii) However hard this is to resolve in theory, it’s worse in practise. Instead of an objective search for the truth, these discussions are distorted by all these different factors including money, social status and the need for meaning.
iii) More on the kicker: We’re seeing increasing polarisation, less trust in media and experts[48] and AI stands to make this worse. This is not where we want to be starting from and who knows how long this might take to resolve?
🅂 𝙿 𝙴 𝙴 𝙳 – 😅⏳:
✷ AI Is developing incredibly rapidly… We have limited time to act and to figure out how to act.[49].
In more detail...
i) The speed at which things are happening makes the problem much harder. Humanity does have the ability to deal with civilisational scale challenges. It’s not easy—global co-ordination is incredibly difficult to achieve—but it’s possible. However, one at a time is a lot easier than dozens. When this happens, its hard to give each problem the proper attention it deserves 😅🌫️💣💣💣
ii) Even if timelines aren’t short, we might still be in trouble if the take-off speed is fast. Unfortunately, humanity is not very good at preparing for abstract, speculative-seeming threats ahead of time.
iii) Even if neither timelines nor take-off speeds are fast in an absolute sense, we might still expect disaster if they are fast in a relative sense. Governance—especially global governance—tends to proceed rather slowly. Even though it can happen much faster when there’s a crisis, sometimes problems need to be solved ahead of time and once you’re in them it’s too late. As an example, once an AI induced pandemic is spreading, you may have already lost.
I believe that the following reflections provide a strong, but defeasible reason to believe the Disaster-By-Default hypothesis.
Recap: Reflections on The SUV Triad — Old version, to be updated to the latest version just before release
Reflections—Why the SUV Triad is Fucking Scary
Many of the threats constitute civilisational-level risk by themselves. We could successful navigate all the other threats, but simply drop the ball once and all that could be for naught.
Even if a threat can’t led to catastrophe, it can still distract us from those that can. It’s hard to avoid catastrophe when we don’t know where to focus our efforts ⚪🪙⚫.
The speed of development makes this much harder. Even if alignment were easy and governance didn’t require anything special, we could still fail because certain people have decided that we have to race toward AGI as fast as possible.
Controversial: It may even present a reason to expect Disaster-By-Default (draft post) (‼️).
That said, I want to explore making a more rigorous argument. Here’s the general intuition:
That said, I want to try to see if it’s possible to make a more rigorous argument.
Consider the following model. I wonder whether it would be applicable to the SUV Triad:
Suppose we had a series of ten independent decisions and that we have to get each of them right in order to survive. Assume each is a binary choice. Further assume that these choices are quite confusing so we only have a 60% chance of getting each right. Then we’d only have a 0.01% chance of survival.
The question I want to explore is whether it would be appropriate to model the SUV Triad with something along these lines.
That said, even if it made sense from an inside view perspective, we’d still have to adjust for the outside view. The hardness of each decision is not independent, but most likely tied to some general factors such as: overall problem difficulty, general societal competence, the speed at which we have to face these problems. In other words, treating each probability as independent, likely makes the probability look more extreme.
So the first step would be figuring out whether this model is applicable (not in an absolute sense, but whether or not it’d be appropriate for making a defeasible claim.
How could we evaluate this claim? Here’s one such method:
Define a threshold of catastrophe
Make a list of threats that could meet that bar
Make a list of key choices that we’d have to make in relation to these threats and where making the wrong choice could prove catastrophic
Estimate the difficulty of each choice
Consider the degree of correlation between various threats/choices
Potentially: estimate how many additions such choices there may be such that we’ve missed
This model could fail if the number of key choices wasn’t that large, these choices weren’t hard or due to correlation.
And if the model were applicable, then we’d have to consider possible defeaters:
In more detail/further objections...
I suggest ignoring for now. Copied from a different context, so needs to be adapted:
Most notable counterargument: “We most likely encounter smaller wake up calls first. Society wakes up by default”.
Rich and powerful actors will be incentivised to use their influence to downplay the role of AI in any such incidents, argue that we should focus solely on that threat model or even assert that that further accelerating capabilities is the best defense. Worse, we’ll likely be in the middle of a US-China arms race where there’s national security issues at play that could make slowing things down feel almost inconcievable.
Maybe there is eventually an incident that is too serious to ignore, but by then it will probably be too late. Capabilities increase fast and we should expect a major overhang of elicitable capabilties, so we would need to trigger a stop significantly before threshold of dangerous capabilities.
“But the AI industry doesn’t want to destroy society. They’re in society” — Look at what happened with “gain of function” research. If it had been prominently accepted that gain of function is bad, then that would have caused a massive loss of status for medical researchers, so they didn’t allow that to happend. The same incentives apply to AI developers.
“Open source/weights models are behind the frontier and it’s possible that society will enforce restrictions on them, even if it’ll be impossible to prevent closed source development from continuing” — Not that far behind and attempting to restrict open-source models will result in massive pushback/subversion. There’s a large community dedicated to open source software, for some it’s essentially a substitute for a religion, for others it’s the basis of their company or their national competitiveness. Even if the entire UN security council agreed, they couldn’t just say, “Stop!” and expect it to be instantly obeyed. Our default expectation should be that capabilities broadly proliferate.
Second most notable counterargument: “AI is aligned by default”
This looked much more plausible before inference time compute took off.
Third most notable counterargument: “We’ve overcome challenges in the past, we should expect that we most likely stumble through”
AI is unique (generality, speed of development, proliferation).
Much more plausible if there was a narrower threat model or if development moved more slowly.
What is the SUV Triad? Also, the formatting on this is wild, what’s the context for that?
Sorry, this is some content that I had in my short-form Why the focus on wise AI advisors?. The SUV Triad is described there.
I was persuaded by Professor David Manly that I didn’t need to argue for Disaster-By-Default in order to justify wise AI advisors and that focusing too much on this aspect would simply cause me to lose people, so I needed somewhere to paste this content.
I just clicked “Remove from Frontpage”. I’m unsure if it does anything for short-form posts though.
Just experimenting to see what’s possible. Copied it directly from that post, haven’t had time to rethink the formatting yet now that it is its own post. Nowhere near as wild as it gets in the main post though!