I agree things don’t automatically happen eventually just because they can. At least, not automatically on relevant timescales. (i.e. eventually infinite monkeys mashing keyboards will produce shakespeare, but, not for bazillions of years)
Not important to your general point, but here I guess you run into some issues with the definition of “can”. You could argue that if something doesn’t happen it means it couldn’t have happened (if the universe is deterministic). And so then yes, everything that can happen, actually happens. But that isn’t the sense in which people normally use the word “can”. Instead it’s reasonable to say “it’s possible my son’s first word is ‘Mama’”, “it’s possible my son’s first word is ‘Papa’”, both of these things can happen (i.e. they are not prohibited by any natural laws that we know of). But only one of these things can be true; in many situations we’d say that two mutually incompatible events “can happen”. And therefore it’s not just a matter of timescale.
The argument is:
If something can happen
and there’s a fairly strong reason to expect some process to steer towards that thing happening
and there’s not a reason to expect some other processes to steer towards that thing not happening
...then the thing probably happens eventually, on a somewhat reasonable timescale, all else equal.
Sure, I agree with that. I think this makes superintelligence much more likely than it otherwise would be (because it’s not prohibited by any laws of physics that we know of, and people are trying to build it, and no-one is effectively preventing it from being built). But the same argument doesn’t apply to misaligned superintelligence or other doom-related claims. In fact, the opposite is true.
Superintelligence not killing everyone is not prohibited by the laws of physics
People are trying to ensure superintelligence doesn’t kill everyone
No-one is trying to make superintelligence kill everyone
So you could apply a similarly-shaped argument to “prove” that aligned superintelligence is coming on a “somewhat reasonable timescale”.
Yeah, when I say “things that can happen most likely will”, I don’t mean “in any specific case.” A given baby’s first words can’t be both mama and papa. But, there’s a range of phonemes that babies can make. And over time, eventually every combination of first 2-4 phonemes will happen to be a baby’s first “word”.
Before respond to the rest, I want to check back on, this bit, at the meta level:
why do you think that insofar as a coherent, non-trivial goal emerges, it is likely to eventually result in humanity’s destruction?
This is something Eliezer (and I think I) have written about recently, which I think you read. (In the chapter “It’s favorite things”).
I get that you didn’t really buy those arguments as being dominating. But, a feeling I get when reading your question there is something like “there are a lot of moving parts to this argument, and when we focus on one for awhile the earlier bits lose salience.”
And, perhaps similar to “things that can happen, eventually will, given enough chances, unless stopped”, another pretty loadbearing structural claim is:
“It is possible to just actually exhaustively think through a large number of questions and arguments, and for each one, get to a pretty confident state of what the answer to that question is.”
And the,n it’s at least possible to make a pretty good guess about how things will play out, at least if we don’t learn new information.
And maybe you can’t get to 100% confidence. But you can rule out things like “well, it won’t work unless Claim A turns out to be false, even though it looks most likely true.” And this constraints what types of worlds you might possibly be living in.
Or, maybe you can’t reach that even a moderate confidence with your current knowledge, but, you can see which things you’re still uncertain of, which if you became more certain of, would change the overall picture.
...
(i.e. the “unless something stops it” clause in the “if it can happen, it will, unless stopped” argument, means we live in worlds whether either it eventually happens, or is stopped, and then we can start asking “okay, what are the ways it could hypothetically be stopped? how likely do those look?”)
“Things that can happen, eventually will, given enough chances, unless stopped” is one particular argument that is relevant to some of the subpoints here. Yesterday you were like “yeah I don’t buy that.” I spelled out what I meant, and its sounds like now your position is “okay, I do see what you mean there, but I don’t see how it leads to the final conclusion.”
There are a lot more steps, at that level of detail, before I’d expect you to believe something more similar to what I believe.
I’m super grateful for getting to talk to you about this so far, I’ve enjoyed the convo and it’s been helpful to me for getting more clarity on how all the pieces fit together in my own head. If you wanna tap out, seems super understandable.
But, the thing I am kinda hoping/asking for is for you to actually track all all the arguments as they build, and if a new argument changes your mind on a given claim, track how that fits into all the existing claims and whether it has any new implications.
...
I’m not quite sure how you’re relating to your previous beliefs about “if it can happen, it will” and the arguments I just made. I’m guessing it wasn’t exactly an update for you so much as a “reframing.”
But, it sounds like you now understand what I meant, and why it at least means “the fact superintelligence is possible, and that people are trying, means that it’ll probably happen [in some timeframe]”.
And, while I haven’t yet proven all the rest of the steps of the argument to you, like… I’m asking you to notice that I did have an answer there, and there are other pieces that I think I also have answers to. But the complete edifice is indeed multiple books worth, and because each individual (like you) has different cruxes, it’s hard to present all the arguments in a succinct, compelling way.
But, I’m asking if you’re up for at least being willing to entertain the structure of “maybe, Ray will be right that there is a large-but-finite set of claims, and it’s possible to get enough certainty on each claim to at least put pretty significant bounds on how unaligned AI may play out.”
I’m asking if you’re up for at least being willing to entertain the structure of “maybe, Ray will be right that there is a large-but-finite set of claims, and it’s possible to get enough certainty on each claim to at least put pretty significant bounds on how unaligned AI may play out
Certainly, I could be wrong! I don’t mean to:
Dismiss the possibility of misaligned AI related X-risk
Dismiss the possibility that your particular lines of argument make sense and I’m missing some things
And I think caution with AI development is warranted for a number of reasons beyond pure misalignment risk.
But it’s a little worrying when a community widely shares a strong belief in doom while implying that the required arguments are esoteric and require lots of subtle claims, each of which might have counterarguments, but which overall will eventually convince you. 1a3orn has a good essay about this: https://1a3orn.com/sub/essays-ai-doom-invincible.html.
I think having intuitions around general intelligences being dangerous is perfectly reasonable; I have them too. As a very risk-averse and pro-humanity person, I’d almost be tempted to press a button to peacefully prevent AI advancement purely on the basis of a tiny potential risk (for I think everyone dying is very, very, very bad, I am not disagreeing with that point at all). But no such button exists, and attempts to stop AI development have their own side-effects that could add up to more risks on net. And though that’s unfortunate, it doesn’t mean that we should spread a message of “we are definitely doomed unless we stop”. A large number of people believing they are doomed is not a free way to increase the chances of an AI slowdown or pause. It has a lot of negative side-effects. Many smart and caring people I know have put their lives on pause and made serious (in my opinion, bad) decisions on the basis that superintelligence will probably kill us, or if not there’ll be a guaranteed utopia. To be clear, I am not saying that we should believe or spread false things about AI risk being lower than it actually is so that people’s personal lives temporarily improve. But rather I am saying that exaggerating claims of doom or making arguments sound more certain than they are for consequentialist purposes is not free.
That seems like an understandable position to have – one of the things that sucks about the situation is I do think it’s just kinda reasonable from the outside to trigger some kind of immune reaction.
But from my perspective it’s “The evidence just says pretty clearly we are pretty doomed”, and the people who disagree seem to be pretty consistently be sliding off in weird ways or responding to something about vibes rather than engaging with the arguments.
(This is compounded by people who disagree also often picking up on a vibe from some doomy people I agree is sus, one variant of which is pointed at in Val’s Here’s the exit).
I do think it sucks that it’s hard to tell how much of this is the sort of failure mode that la3orn piece is pointing at, vs Epistemic Slipperiness, vs just “it’s actually a fairly complex argument but relatively straightforward once you deal with the complexity.”
But it’s a little worrying when a community widely shares a strong belief in doom while implying that the required arguments are esoteric and require lots of subtle claims, each of which might have counterarguments, but which overall will eventually convince you. 1a3orn has a good essay about this: https://1a3orn.com/sub/essays-ai-doom-invincible.html.
I wrote a post on that exact selection effect, and there’s an even trickier problem where results are heavy tailed, meaning that a small, insular smart group reaching the correct conclusions is basically indistinguishable from a small, insular smart group reaching the wrong conclusion but believing it’s true due to selection effects plus unconscious selection effects towards weaker arguments, at least without very expensive experiments or access to ground truth.
Not important to your general point, but here I guess you run into some issues with the definition of “can”. You could argue that if something doesn’t happen it means it couldn’t have happened (if the universe is deterministic). And so then yes, everything that can happen, actually happens. But that isn’t the sense in which people normally use the word “can”. Instead it’s reasonable to say “it’s possible my son’s first word is ‘Mama’”, “it’s possible my son’s first word is ‘Papa’”, both of these things can happen (i.e. they are not prohibited by any natural laws that we know of). But only one of these things can be true; in many situations we’d say that two mutually incompatible events “can happen”. And therefore it’s not just a matter of timescale.
Sure, I agree with that. I think this makes superintelligence much more likely than it otherwise would be (because it’s not prohibited by any laws of physics that we know of, and people are trying to build it, and no-one is effectively preventing it from being built). But the same argument doesn’t apply to misaligned superintelligence or other doom-related claims. In fact, the opposite is true.
Superintelligence not killing everyone is not prohibited by the laws of physics
People are trying to ensure superintelligence doesn’t kill everyone
No-one is trying to make superintelligence kill everyone
So you could apply a similarly-shaped argument to “prove” that aligned superintelligence is coming on a “somewhat reasonable timescale”.
Yeah, when I say “things that can happen most likely will”, I don’t mean “in any specific case.” A given baby’s first words can’t be both mama and papa. But, there’s a range of phonemes that babies can make. And over time, eventually every combination of first 2-4 phonemes will happen to be a baby’s first “word”.
Before respond to the rest, I want to check back on, this bit, at the meta level:
This is something Eliezer (and I think I) have written about recently, which I think you read. (In the chapter “It’s favorite things”).
I get that you didn’t really buy those arguments as being dominating. But, a feeling I get when reading your question there is something like “there are a lot of moving parts to this argument, and when we focus on one for awhile the earlier bits lose salience.”
And, perhaps similar to “things that can happen, eventually will, given enough chances, unless stopped”, another pretty loadbearing structural claim is:
“It is possible to just actually exhaustively think through a large number of questions and arguments, and for each one, get to a pretty confident state of what the answer to that question is.”
And the,n it’s at least possible to make a pretty good guess about how things will play out, at least if we don’t learn new information.
And maybe you can’t get to 100% confidence. But you can rule out things like “well, it won’t work unless Claim A turns out to be false, even though it looks most likely true.” And this constraints what types of worlds you might possibly be living in.
Or, maybe you can’t reach that even a moderate confidence with your current knowledge, but, you can see which things you’re still uncertain of, which if you became more certain of, would change the overall picture.
...
(i.e. the “unless something stops it” clause in the “if it can happen, it will, unless stopped” argument, means we live in worlds whether either it eventually happens, or is stopped, and then we can start asking “okay, what are the ways it could hypothetically be stopped? how likely do those look?”)
“Things that can happen, eventually will, given enough chances, unless stopped” is one particular argument that is relevant to some of the subpoints here. Yesterday you were like “yeah I don’t buy that.” I spelled out what I meant, and its sounds like now your position is “okay, I do see what you mean there, but I don’t see how it leads to the final conclusion.”
There are a lot more steps, at that level of detail, before I’d expect you to believe something more similar to what I believe.
I’m super grateful for getting to talk to you about this so far, I’ve enjoyed the convo and it’s been helpful to me for getting more clarity on how all the pieces fit together in my own head. If you wanna tap out, seems super understandable.
But, the thing I am kinda hoping/asking for is for you to actually track all all the arguments as they build, and if a new argument changes your mind on a given claim, track how that fits into all the existing claims and whether it has any new implications.
...
I’m not quite sure how you’re relating to your previous beliefs about “if it can happen, it will” and the arguments I just made. I’m guessing it wasn’t exactly an update for you so much as a “reframing.”
But, it sounds like you now understand what I meant, and why it at least means “the fact superintelligence is possible, and that people are trying, means that it’ll probably happen [in some timeframe]”.
And, while I haven’t yet proven all the rest of the steps of the argument to you, like… I’m asking you to notice that I did have an answer there, and there are other pieces that I think I also have answers to. But the complete edifice is indeed multiple books worth, and because each individual (like you) has different cruxes, it’s hard to present all the arguments in a succinct, compelling way.
But, I’m asking if you’re up for at least being willing to entertain the structure of “maybe, Ray will be right that there is a large-but-finite set of claims, and it’s possible to get enough certainty on each claim to at least put pretty significant bounds on how unaligned AI may play out.”
Certainly, I could be wrong! I don’t mean to:
Dismiss the possibility of misaligned AI related X-risk
Dismiss the possibility that your particular lines of argument make sense and I’m missing some things
And I think caution with AI development is warranted for a number of reasons beyond pure misalignment risk.
But it’s a little worrying when a community widely shares a strong belief in doom while implying that the required arguments are esoteric and require lots of subtle claims, each of which might have counterarguments, but which overall will eventually convince you. 1a3orn has a good essay about this: https://1a3orn.com/sub/essays-ai-doom-invincible.html.
I think having intuitions around general intelligences being dangerous is perfectly reasonable; I have them too. As a very risk-averse and pro-humanity person, I’d almost be tempted to press a button to peacefully prevent AI advancement purely on the basis of a tiny potential risk (for I think everyone dying is very, very, very bad, I am not disagreeing with that point at all). But no such button exists, and attempts to stop AI development have their own side-effects that could add up to more risks on net. And though that’s unfortunate, it doesn’t mean that we should spread a message of “we are definitely doomed unless we stop”. A large number of people believing they are doomed is not a free way to increase the chances of an AI slowdown or pause. It has a lot of negative side-effects. Many smart and caring people I know have put their lives on pause and made serious (in my opinion, bad) decisions on the basis that superintelligence will probably kill us, or if not there’ll be a guaranteed utopia. To be clear, I am not saying that we should believe or spread false things about AI risk being lower than it actually is so that people’s personal lives temporarily improve. But rather I am saying that exaggerating claims of doom or making arguments sound more certain than they are for consequentialist purposes is not free.
That seems like an understandable position to have – one of the things that sucks about the situation is I do think it’s just kinda reasonable from the outside to trigger some kind of immune reaction.
But from my perspective it’s “The evidence just says pretty clearly we are pretty doomed”, and the people who disagree seem to be pretty consistently be sliding off in weird ways or responding to something about vibes rather than engaging with the arguments.
(This is compounded by people who disagree also often picking up on a vibe from some doomy people I agree is sus, one variant of which is pointed at in Val’s Here’s the exit).
I do think it sucks that it’s hard to tell how much of this is the sort of failure mode that la3orn piece is pointing at, vs Epistemic Slipperiness, vs just “it’s actually a fairly complex argument but relatively straightforward once you deal with the complexity.”
I wrote a post on that exact selection effect, and there’s an even trickier problem where results are heavy tailed, meaning that a small, insular smart group reaching the correct conclusions is basically indistinguishable from a small, insular smart group reaching the wrong conclusion but believing it’s true due to selection effects plus unconscious selection effects towards weaker arguments, at least without very expensive experiments or access to ground truth.
Here’s an EA Forum version of the post.