I think it’s quite likely that if there is a crisis that leads to beneficial response, it’ll be one of these three:
An undeployed privately developed system, not yet clearly aligned nor misaligned, either:
passes the Humanity’s Last Exam benchmark, demonstrating ASI, and the developers go to congress and say “we have a godlike creature here, you can all talk to it if you don’t believe us, it’s time to act accordingly.”
Not quite doing that, but demonstrating dangerous capability levels in red-teaming, ie, replication ability, ability to operate independently, pass the hardest versions of the turing test, get access to biolabs etc. And METR and hopefully their client go to the congress and say “This AI stuff is a very dangerous situation and now we can prove it.”
A deployed military (beyond frontier) system demonstrates such generality that, eg, Palmer Luckey (possibly specifically Palmer Luckey) has to go to congress and confess something like “that thing we were building for coordinating military operations and providing deterrence, turns out it can also coordinate other really beneficial tasks like disaster relief, mining, carbon drawdown, research, you know, curing cancer? But we aren’t being asked to use it for those tasks. So, what are we supposed to do? Shouldn’t we be using it for that kind of thing?” And this could lead to some mildly dystopian outcomes, or not, I don’t think the congress or the emerging post-prime defence research scene is evil, I think it’s pretty likely they’d decide to share it with the world (though I doubt they’d seek direct input from the rest of the world on how it should be aligned)
Some of the crises I expect, I guess, wont be recognized as crises. Boiled frog situations.
A private system passes those tests, but instead of doing the responsible thing and raising the alarm, the company just treats it like a normal release and sells it. (and the die is rolled and we live or we don’t.)
Or crises in the deployment of AI that reinforce the “AI as tool” frame so deeply that it becomes harder to discuss preparations for AI as independent agents:
Automated invasion: a country is successfully invaded, disarmed, controlled and reshaped with almost entirely automated systems, minimal human presence from the invading side. Probable in gaza or taiwan.
It’s hard to imagine a useful policy response to this. I can only imagine this leading to reactions like “Wow. So dystopian and oppressive. They Should Not have done that and we should write them some sternly worded letters at the UN. Also let’s build stronger AI weapons so that they can’t do that to us.”
A terrorist attack or a targeted assassination using lethal autonomous weapons.
I expect this to just be treated as if it’s just a new kind of bomb.
I think there’s at least one missing one, “You wake up one morning and find out that a private equity firm has bought up a company everyone knows the name of, fired 90% of the workers, and says they can replace them with AI.”
Mm, scenario where mass unemployment can be framed as a discrete event with a name and a face.
I guess I think it’s just as likely there isn’t an event, human-run businesses die off, new businesses arise, none of them outwardly emphasise their automation levels, the press can’t turn it into a scary story because automation and foreclosures are nothing fundamentally new (only in quantity, but you can’t photograph a quantity), the public become complicit by buying their cheaper higher quality goods and services so appetite for public discussion remains low.
I think something doesn’t need to be fundamentally new for the press to turn it into a scary story, e.g. news reports about crime or environmental devastation being on the rise have scared a lot of people quite a bit. You can’t photograph a quantity but you can photograph individuals affected by a thing and make it feel common by repeatedly running stories of different individuals affected.
I wonder what the crisis will be.
I think it’s quite likely that if there is a crisis that leads to beneficial response, it’ll be one of these three:
An undeployed privately developed system, not yet clearly aligned nor misaligned, either:
passes the Humanity’s Last Exam benchmark, demonstrating ASI, and the developers go to congress and say “we have a godlike creature here, you can all talk to it if you don’t believe us, it’s time to act accordingly.”
Not quite doing that, but demonstrating dangerous capability levels in red-teaming, ie, replication ability, ability to operate independently, pass the hardest versions of the turing test, get access to biolabs etc. And METR and hopefully their client go to the congress and say “This AI stuff is a very dangerous situation and now we can prove it.”
A deployed military (beyond frontier) system demonstrates such generality that, eg, Palmer Luckey (possibly specifically Palmer Luckey) has to go to congress and confess something like “that thing we were building for coordinating military operations and providing deterrence, turns out it can also coordinate other really beneficial tasks like disaster relief, mining, carbon drawdown, research, you know, curing cancer? But we aren’t being asked to use it for those tasks. So, what are we supposed to do? Shouldn’t we be using it for that kind of thing?” And this could lead to some mildly dystopian outcomes, or not, I don’t think the congress or the emerging post-prime defence research scene is evil, I think it’s pretty likely they’d decide to share it with the world (though I doubt they’d seek direct input from the rest of the world on how it should be aligned)
Some of the crises I expect, I guess, wont be recognized as crises. Boiled frog situations.
A private system passes those tests, but instead of doing the responsible thing and raising the alarm, the company just treats it like a normal release and sells it. (and the die is rolled and we live or we don’t.)
Or crises in the deployment of AI that reinforce the “AI as tool” frame so deeply that it becomes harder to discuss preparations for AI as independent agents:
Automated invasion: a country is successfully invaded, disarmed, controlled and reshaped with almost entirely automated systems, minimal human presence from the invading side. Probable in gaza or taiwan.
It’s hard to imagine a useful policy response to this. I can only imagine this leading to reactions like “Wow. So dystopian and oppressive. They Should Not have done that and we should write them some sternly worded letters at the UN. Also let’s build stronger AI weapons so that they can’t do that to us.”
A terrorist attack or a targeted assassination using lethal autonomous weapons.
I expect this to just be treated as if it’s just a new kind of bomb.
I think there’s at least one missing one, “You wake up one morning and find out that a private equity firm has bought up a company everyone knows the name of, fired 90% of the workers, and says they can replace them with AI.”
Mm, scenario where mass unemployment can be framed as a discrete event with a name and a face.
I guess I think it’s just as likely there isn’t an event, human-run businesses die off, new businesses arise, none of them outwardly emphasise their automation levels, the press can’t turn it into a scary story because automation and foreclosures are nothing fundamentally new (only in quantity, but you can’t photograph a quantity), the public become complicit by buying their cheaper higher quality goods and services so appetite for public discussion remains low.
I think something doesn’t need to be fundamentally new for the press to turn it into a scary story, e.g. news reports about crime or environmental devastation being on the rise have scared a lot of people quite a bit. You can’t photograph a quantity but you can photograph individuals affected by a thing and make it feel common by repeatedly running stories of different individuals affected.
I agree that mass unemployment may spark policy change, but why do you see that change as being relevant to misalignment vs. specific to automation?