(Quick thoughts, Ive read your comment and written this one from my phone, sorry if I misunderstood something/the reply missed the points or not very concise/etc.)
I’m not sure whether your reasoning is “well, in the scenarios with what I heard Yudkowsky calls a “fast take off”, we’re dead, so let’s think about other scenarios where we have an easier time having impact”. Like, no, we live in some specific world; if this is a world where surviving is hard, better try to actually save everyone, than flinch away and hope for a “slow takeoff”.
If Meta trained an AI on hundreds of thousands of GPUs and it’s requires much less to run/can run faster at inference, and it’s as smart as smartest humans, and can find and exploit zero-day vulnerabilities, then it can do what smart/specialised humans can do, just much faster, including being able to hack literally everything. If it has a lot of GPUs to run on, it is can do even more per time, can multitask, can have a hierarchical AutoGPT kind of thing, etc., and the whole thing is smarter than humans. No self-modification, coding new AIs, etc. required. It kills everyone before it focuses on recoursive self-improving and paperclipping.
What’s your model of what you call “slow takeoff” after the AI is smarter than humans, including being better than humans at finding and exploiting zero-days? Or what’s your model for how we don’t get to that point?
I’m pretty sure that in our world, indeed, being on the level of smartest humans at hacking but running much faster and being able to use hacked machines to run even faster, means that you can hack almost literally everything, easily, and be smart enough to kill everyone.
If stopping developing generally capable AI doesn’t happen, because politicians “can never agree to this”, then, sorry to inform you, we’ll all be dead, soon. Having a couple of months more to live doesn’t change this fact. Alignment won’t be solved in time, there’s no insight we’re in a path to, that would get us a full solution to alignment. (Problems that need to be solved are disjunct enough for stumbling across a solution to one to not increase the chances of stumbling across solutions to others that much. A couple of month don’t matter.)
But this seems false.
Sure, you can predict biorisks and other warning shots and work with the govs to prevent those. If you also mention x-risk, does this really change the other things you’re informing the govs of? If your competence in other AI-related risks has already been confirmed by the experts who work for/with the government on eg biorisks, does talking about x-risk before warning shots happen make it harder to persuade them about x-risk later?
By default, warning shots make it easier to show that these things can be smart and potentially dangerous. This makes it easier to introduce some regulation. It also incentivises governments to invest heavily in these dangerous toys that can come up with offensive things. It doesn’t prevent the labs from training anything in an environment inaccessible by bad actors, and it doesn’t make the government worry about future AI systems as potential agentic bad actors. Without x-risk, there isn’t a reason to prohibit insanely valuable technology from being developed. Governments think that AIs are powerful guns that can kill but also spill out money. Protecting these guns from bad actors makes sense; protecting humanity from the guns going rogue and killing literally everyone isn’t a threat model they have at all, unless you explain it.
There are sets of people who if you persuade everyone in the group of the x-risk, you significantly increase the chances of not being dead. I don’t think it’s impossible to persuade one specific person, although it takes resources. It takes more people and more resources to persuade more people. If you want eight billion people alive today and hardly countable generations to come to live, maybe just actually try?
I’m not sure whether your reasoning is “well, in the scenarios with what I heard Yudkowsky calls a “fast take off”, we’re dead, so let’s think about other scenarios where we have an easier time having impact”. Like, no, we live in some specific world; if this is a world where surviving is hard, better try to actually save everyone, than flinch away and hope for a “slow takeoff”.
Yes, we live in some specific world, but we don’t know which. We can only guess. To simplify, if I have 80% belief that we live in a slow take off world, and 20% we live in a fast take off world, and I think that strategy A has a 50% chance of working in the former kind of world and none in the latter, whereas strategy B has a 5% chance of working regardless of the world, I’ll still go with strategy A because that gives me a 40% overall chance of getting out of this alive. And yes, in this case it helps that I do place a higher probability on this being a slow takeoff kind of world.
What’s your model of what you call “slow takeoff” after the AI is smarter than humans, including being better than humans at finding and exploiting zero-days? Or what’s your model for how we don’t get to that point?
Honestly I think it’s a tad more complicated than that. Don’t get me wrong, any world with smarter-than-humans AGI in it is already incredibly dangerous and tethering on the brink of a bunch of possible disasters, but I don’t expect it to end in instant paperclipping either. There are two main thrusts for this belief:
the means problem: to me, the big distinction between a slow and fast takeoff world is how much technological low-hanging fruit there exists, potentially, for such a smarter-than-humans AGI to reap and use. Whichever its goals, killing us is unlikely to be a terminal one. It can be instrumental, but then it’s only optimal policy if the AI can exist independently of us. In a world in which its intelligence is easily translated into effective replacements for us for all maintenance and infrastructural needs (e.g. repair nanobots and stuff), then we’re dead. In a world in which that’s not the case, we experience a time in which the AI behaves nicely and feeds us robot technology to put all the pieces in places before it can kill us;
the motive problem: I think that different kinds of AGI would also wildly differ in their drive to kill us at all. There are still inherent dangers from having them around, but not every AGI would be a paperclipper that very deliberately aims at removing us from the board as one of its first steps. I’d expect that from a general AlphaZero-like RL agent trained on a specific goal. I wouldn’t expect it from a really really smart LLM, because those are less focused (the goal that the LLM was trained on isn’t the same as the goal you give to the simulacra) and more shaped specifically by human content. This doesn’t make them aligned, but it makes them I think less alien than the former example, to an extent where probably they’d have a different approach. Again, I would still be extremely wary of them—I just don’t think they’d get murder-y as their very first step.
If your competence in other AI-related risks has already been confirmed by the experts who work for/with the government on eg biorisks, does talking about x-risk before warning shots happen make it harder to persuade them about x-risk later?
Ok, so to be clear, I’m not saying we should NOT talk about x-risk or push the fact that it’s absolutely an important possibility to always keep in mind. But I see this more as preparing the terrain. IF we ever get a warning shot, then, if the seed was planted already, we get a more coherent and consistent response. But I don’t think anyone would commit to anything sufficiently drastic on purely theoretical grounds. So I expect that right now the achievable wins are much more limited, and many are still good in how they might e.g. shift incentives to make alignment and interpretability more desirable and valuable than just blind capability improvement. But yes, by all means, x-risk should be put on the table right away.
(Quick thoughts, Ive read your comment and written this one from my phone, sorry if I misunderstood something/the reply missed the points or not very concise/etc.)
I’m not sure whether your reasoning is “well, in the scenarios with what I heard Yudkowsky calls a “fast take off”, we’re dead, so let’s think about other scenarios where we have an easier time having impact”. Like, no, we live in some specific world; if this is a world where surviving is hard, better try to actually save everyone, than flinch away and hope for a “slow takeoff”.
If Meta trained an AI on hundreds of thousands of GPUs and it’s requires much less to run/can run faster at inference, and it’s as smart as smartest humans, and can find and exploit zero-day vulnerabilities, then it can do what smart/specialised humans can do, just much faster, including being able to hack literally everything. If it has a lot of GPUs to run on, it is can do even more per time, can multitask, can have a hierarchical AutoGPT kind of thing, etc., and the whole thing is smarter than humans. No self-modification, coding new AIs, etc. required. It kills everyone before it focuses on recoursive self-improving and paperclipping.
What’s your model of what you call “slow takeoff” after the AI is smarter than humans, including being better than humans at finding and exploiting zero-days? Or what’s your model for how we don’t get to that point?
I’m pretty sure that in our world, indeed, being on the level of smartest humans at hacking but running much faster and being able to use hacked machines to run even faster, means that you can hack almost literally everything, easily, and be smart enough to kill everyone.
If stopping developing generally capable AI doesn’t happen, because politicians “can never agree to this”, then, sorry to inform you, we’ll all be dead, soon. Having a couple of months more to live doesn’t change this fact. Alignment won’t be solved in time, there’s no insight we’re in a path to, that would get us a full solution to alignment. (Problems that need to be solved are disjunct enough for stumbling across a solution to one to not increase the chances of stumbling across solutions to others that much. A couple of month don’t matter.)
But this seems false.
Sure, you can predict biorisks and other warning shots and work with the govs to prevent those. If you also mention x-risk, does this really change the other things you’re informing the govs of? If your competence in other AI-related risks has already been confirmed by the experts who work for/with the government on eg biorisks, does talking about x-risk before warning shots happen make it harder to persuade them about x-risk later?
By default, warning shots make it easier to show that these things can be smart and potentially dangerous. This makes it easier to introduce some regulation. It also incentivises governments to invest heavily in these dangerous toys that can come up with offensive things. It doesn’t prevent the labs from training anything in an environment inaccessible by bad actors, and it doesn’t make the government worry about future AI systems as potential agentic bad actors. Without x-risk, there isn’t a reason to prohibit insanely valuable technology from being developed. Governments think that AIs are powerful guns that can kill but also spill out money. Protecting these guns from bad actors makes sense; protecting humanity from the guns going rogue and killing literally everyone isn’t a threat model they have at all, unless you explain it.
There are sets of people who if you persuade everyone in the group of the x-risk, you significantly increase the chances of not being dead. I don’t think it’s impossible to persuade one specific person, although it takes resources. It takes more people and more resources to persuade more people. If you want eight billion people alive today and hardly countable generations to come to live, maybe just actually try?
Yes, we live in some specific world, but we don’t know which. We can only guess. To simplify, if I have 80% belief that we live in a slow take off world, and 20% we live in a fast take off world, and I think that strategy A has a 50% chance of working in the former kind of world and none in the latter, whereas strategy B has a 5% chance of working regardless of the world, I’ll still go with strategy A because that gives me a 40% overall chance of getting out of this alive. And yes, in this case it helps that I do place a higher probability on this being a slow takeoff kind of world.
Honestly I think it’s a tad more complicated than that. Don’t get me wrong, any world with smarter-than-humans AGI in it is already incredibly dangerous and tethering on the brink of a bunch of possible disasters, but I don’t expect it to end in instant paperclipping either. There are two main thrusts for this belief:
the means problem: to me, the big distinction between a slow and fast takeoff world is how much technological low-hanging fruit there exists, potentially, for such a smarter-than-humans AGI to reap and use. Whichever its goals, killing us is unlikely to be a terminal one. It can be instrumental, but then it’s only optimal policy if the AI can exist independently of us. In a world in which its intelligence is easily translated into effective replacements for us for all maintenance and infrastructural needs (e.g. repair nanobots and stuff), then we’re dead. In a world in which that’s not the case, we experience a time in which the AI behaves nicely and feeds us robot technology to put all the pieces in places before it can kill us;
the motive problem: I think that different kinds of AGI would also wildly differ in their drive to kill us at all. There are still inherent dangers from having them around, but not every AGI would be a paperclipper that very deliberately aims at removing us from the board as one of its first steps. I’d expect that from a general AlphaZero-like RL agent trained on a specific goal. I wouldn’t expect it from a really really smart LLM, because those are less focused (the goal that the LLM was trained on isn’t the same as the goal you give to the simulacra) and more shaped specifically by human content. This doesn’t make them aligned, but it makes them I think less alien than the former example, to an extent where probably they’d have a different approach. Again, I would still be extremely wary of them—I just don’t think they’d get murder-y as their very first step.
Ok, so to be clear, I’m not saying we should NOT talk about x-risk or push the fact that it’s absolutely an important possibility to always keep in mind. But I see this more as preparing the terrain. IF we ever get a warning shot, then, if the seed was planted already, we get a more coherent and consistent response. But I don’t think anyone would commit to anything sufficiently drastic on purely theoretical grounds. So I expect that right now the achievable wins are much more limited, and many are still good in how they might e.g. shift incentives to make alignment and interpretability more desirable and valuable than just blind capability improvement. But yes, by all means, x-risk should be put on the table right away.