Yes, I object to the “things that can happen, eventually will” line of reasoning.
Nod, makes sense, I think I want to just focus on this atm.
(also, btw I super appreciate you engaging, I’m sure you’ve argued a bunch with folk like this already)
So here’s the more specific thing I actually believe. I agree things don’t automatically happen eventually just because they can. At least, not automatically on relevant timescales. (i.e. eventually infinite monkeys mashing keyboards will produce shakespeare, but, not for bazillions of years)
The argument is:
If something can happen
and there’s a fairly strong reason to expect some process to steer towards that thing happening
and there’s not a reason to expect some other processes to steer towards that thing not happening
...then the thing probably happens eventually, on a somewhat reasonable timescale, all else equal. (“reasonable” timescale depends on how fast whatever steering process works. i.e. stellar evolution might take billions of years, evolution millions of years, and human engineers thousands of years).
For example, when the first organisms appeared on earth and begin to mutate, I think a smart outside observer could predict “evolution will happen, and unless all the replicators die out, there will probably eventually be a variety of complex organisms.”
But, they wouldn’t be able to predict that any particular complex mutation would happen. (for example, flying birds, or human intelligence). It was a long time before we got birds. We only have one Earth to sample from, but we’re already ~halfway between the time the earth was born and and when the sun engulfs it, so, it’s not too surprising if evolution never got around to the combination of traits that birds have.
I think this is a fairly basic probability argument? Like, if each day, there’s an n% chance of a beneficial mutation occuring (and then it’s host surviving and replicating, given a long enough chunk of time, it would (eventually) be pretty surprising if it never happened. Maybe any specific mutation would be difficult to predict happening in 10 billion years. But, if we had trillions and trillions of years, it would be a pretty weird claim that it’d never happen.
Similarly, if each day there are N engineers thinking about how to solve a problem, and making a reasonable effort to creatively explore the space of ways of solving the problem, and we know the problem is solveable, then each day there’s an n% chance of one of them stumbling towards the solution.
(In both evolution and the engineer’s cases, a thing that makes this a lot easier is that the search isn’t completely blind. Partial successes can compound. Evolution wasn’t going to invent birds in one go, that would be indeed be way too combinatorically hard. But, it got to invent “wiggling appendages”, which then opened up new, better search space of “particular ways of wiggling appendages” which eventually leads to locomotion and then flight)
How fast you should expect that to happens on how much resources are being thrown at it.
(maybe worth reiterate: I don’t think the “things that can happen, eventually will” applies to AI in the near future, that’s a much more specific claim, and Eliezer-et-all are much less confident about that).
There exists some math, for a given creation/steering-system and some models of how often it generates new ideas and how fast those ideas then reach saturation, for “at what point it becomes more surprising, that a thing has never happened, than that it’s happened at least once.”
We can’t perfectly predict it, but it’s not something we’re perfectly ignorant about. It is possible to look at cells replicating, and model how often mutations happen, and what distribution of mutations tend to happen, and then make predictions about what distribution of mutations you might expect to see, how quickly.
I think early hypothetical non-evolved-but-smart aliens observing early evolution for a thousand years, wouldn’t be able to predict “birds”, but they would might be able to predict “wiggling appendages” (or at least whatever was earlier on the tech-tree than wiggling appendages. I’m meaning to include single-cells that are, like, twisting slightly here)
Looking at the rate of human engineering, it’d be pretty hard to predict exactly when heavier-than-air-flight would be invented, but, once you’ve reached the agricultural age and you’ve gotten to start seeing specialization of labor and creativity and spreading of ideas, and the existence-proof of birds, I think a hypothetical smart observer could put upper and lower bounds on when humans might eventually figure out it out. It would be weird if it took literally billions of years, given that it only took evolution like a few billion years and evolution was clearly way slower.
And, I haven’t yet laid out any of the specifics of “and here are the upper and lower bounds on long it seems like it should plausibly take humanity to invent superintelligence.” I don’t think I’d get a more specific answer than “hundreds or possibly thousand of years.”, but I think it is something that in principle has an answer and you should be able to find/evaluate evidence that narrows down the answer.
(I am still interested in finding nearterm things to bet on since it sounds like I’m more confident than you that general-intelligence-looking things are decently [like, >50%] likely to happen in the next 20 years or so)
I agree things don’t automatically happen eventually just because they can. At least, not automatically on relevant timescales. (i.e. eventually infinite monkeys mashing keyboards will produce shakespeare, but, not for bazillions of years)
Not important to your general point, but here I guess you run into some issues with the definition of “can”. You could argue that if something doesn’t happen it means it couldn’t have happened (if the universe is deterministic). And so then yes, everything that can happen, actually happens. But that isn’t the sense in which people normally use the word “can”. Instead it’s reasonable to say “it’s possible my son’s first word is ‘Mama’”, “it’s possible my son’s first word is ‘Papa’”, both of these things can happen (i.e. they are not prohibited by any natural laws that we know of). But only one of these things can be true; in many situations we’d say that two mutually incompatible events “can happen”. And therefore it’s not just a matter of timescale.
The argument is:
If something can happen
and there’s a fairly strong reason to expect some process to steer towards that thing happening
and there’s not a reason to expect some other processes to steer towards that thing not happening
...then the thing probably happens eventually, on a somewhat reasonable timescale, all else equal.
Sure, I agree with that. I think this makes superintelligence much more likely than it otherwise would be (because it’s not prohibited by any laws of physics that we know of, and people are trying to build it, and no-one is effectively preventing it from being built). But the same argument doesn’t apply to misaligned superintelligence or other doom-related claims. In fact, the opposite is true.
Superintelligence not killing everyone is not prohibited by the laws of physics
People are trying to ensure superintelligence doesn’t kill everyone
No-one is trying to make superintelligence kill everyone
So you could apply a similarly-shaped argument to “prove” that aligned superintelligence is coming on a “somewhat reasonable timescale”.
Yeah, when I say “things that can happen most likely will”, I don’t mean “in any specific case.” A given baby’s first words can’t be both mama and papa. But, there’s a range of phonemes that babies can make. And over time, eventually every combination of first 2-4 phonemes will happen to be a baby’s first “word”.
Before respond to the rest, I want to check back on, this bit, at the meta level:
why do you think that insofar as a coherent, non-trivial goal emerges, it is likely to eventually result in humanity’s destruction?
This is something Eliezer (and I think I) have written about recently, which I think you read. (In the chapter “It’s favorite things”).
I get that you didn’t really buy those arguments as being dominating. But, a feeling I get when reading your question there is something like “there are a lot of moving parts to this argument, and when we focus on one for awhile the earlier bits lose salience.”
And, perhaps similar to “things that can happen, eventually will, given enough chances, unless stopped”, another pretty loadbearing structural claim is:
“It is possible to just actually exhaustively think through a large number of questions and arguments, and for each one, get to a pretty confident state of what the answer to that question is.”
And the,n it’s at least possible to make a pretty good guess about how things will play out, at least if we don’t learn new information.
And maybe you can’t get to 100% confidence. But you can rule out things like “well, it won’t work unless Claim A turns out to be false, even though it looks most likely true.” And this constraints what types of worlds you might possibly be living in.
Or, maybe you can’t reach that even a moderate confidence with your current knowledge, but, you can see which things you’re still uncertain of, which if you became more certain of, would change the overall picture.
...
(i.e. the “unless something stops it” clause in the “if it can happen, it will, unless stopped” argument, means we live in worlds whether either it eventually happens, or is stopped, and then we can start asking “okay, what are the ways it could hypothetically be stopped? how likely do those look?”)
“Things that can happen, eventually will, given enough chances, unless stopped” is one particular argument that is relevant to some of the subpoints here. Yesterday you were like “yeah I don’t buy that.” I spelled out what I meant, and its sounds like now your position is “okay, I do see what you mean there, but I don’t see how it leads to the final conclusion.”
There are a lot more steps, at that level of detail, before I’d expect you to believe something more similar to what I believe.
I’m super grateful for getting to talk to you about this so far, I’ve enjoyed the convo and it’s been helpful to me for getting more clarity on how all the pieces fit together in my own head. If you wanna tap out, seems super understandable.
But, the thing I am kinda hoping/asking for is for you to actually track all all the arguments as they build, and if a new argument changes your mind on a given claim, track how that fits into all the existing claims and whether it has any new implications.
...
I’m not quite sure how you’re relating to your previous beliefs about “if it can happen, it will” and the arguments I just made. I’m guessing it wasn’t exactly an update for you so much as a “reframing.”
But, it sounds like you now understand what I meant, and why it at least means “the fact superintelligence is possible, and that people are trying, means that it’ll probably happen [in some timeframe]”.
And, while I haven’t yet proven all the rest of the steps of the argument to you, like… I’m asking you to notice that I did have an answer there, and there are other pieces that I think I also have answers to. But the complete edifice is indeed multiple books worth, and because each individual (like you) has different cruxes, it’s hard to present all the arguments in a succinct, compelling way.
But, I’m asking if you’re up for at least being willing to entertain the structure of “maybe, Ray will be right that there is a large-but-finite set of claims, and it’s possible to get enough certainty on each claim to at least put pretty significant bounds on how unaligned AI may play out.”
I’m asking if you’re up for at least being willing to entertain the structure of “maybe, Ray will be right that there is a large-but-finite set of claims, and it’s possible to get enough certainty on each claim to at least put pretty significant bounds on how unaligned AI may play out
Certainly, I could be wrong! I don’t mean to:
Dismiss the possibility of misaligned AI related X-risk
Dismiss the possibility that your particular lines of argument make sense and I’m missing some things
And I think caution with AI development is warranted for a number of reasons beyond pure misalignment risk.
But it’s a little worrying when a community widely shares a strong belief in doom while implying that the required arguments are esoteric and require lots of subtle claims, each of which might have counterarguments, but which overall will eventually convince you. 1a3orn has a good essay about this: https://1a3orn.com/sub/essays-ai-doom-invincible.html.
I think having intuitions around general intelligences being dangerous is perfectly reasonable; I have them too. As a very risk-averse and pro-humanity person, I’d almost be tempted to press a button to peacefully prevent AI advancement purely on the basis of a tiny potential risk (for I think everyone dying is very, very, very bad, I am not disagreeing with that point at all). But no such button exists, and attempts to stop AI development have their own side-effects that could add up to more risks on net. And though that’s unfortunate, it doesn’t mean that we should spread a message of “we are definitely doomed unless we stop”. A large number of people believing they are doomed is not a free way to increase the chances of an AI slowdown or pause. It has a lot of negative side-effects. Many smart and caring people I know have put their lives on pause and made serious (in my opinion, bad) decisions on the basis that superintelligence will probably kill us, or if not there’ll be a guaranteed utopia. To be clear, I am not saying that we should believe or spread false things about AI risk being lower than it actually is so that people’s personal lives temporarily improve. But rather I am saying that exaggerating claims of doom or making arguments sound more certain than they are for consequentialist purposes is not free.
That seems like an understandable position to have – one of the things that sucks about the situation is I do think it’s just kinda reasonable from the outside to trigger some kind of immune reaction.
But from my perspective it’s “The evidence just says pretty clearly we are pretty doomed”, and the people who disagree seem to be pretty consistently be sliding off in weird ways or responding to something about vibes rather than engaging with the arguments.
(This is compounded by people who disagree also often picking up on a vibe from some doomy people I agree is sus, one variant of which is pointed at in Val’s Here’s the exit).
I do think it sucks that it’s hard to tell how much of this is the sort of failure mode that la3orn piece is pointing at, vs Epistemic Slipperiness, vs just “it’s actually a fairly complex argument but relatively straightforward once you deal with the complexity.”
But it’s a little worrying when a community widely shares a strong belief in doom while implying that the required arguments are esoteric and require lots of subtle claims, each of which might have counterarguments, but which overall will eventually convince you. 1a3orn has a good essay about this: https://1a3orn.com/sub/essays-ai-doom-invincible.html.
I wrote a post on that exact selection effect, and there’s an even trickier problem where results are heavy tailed, meaning that a small, insular smart group reaching the correct conclusions is basically indistinguishable from a small, insular smart group reaching the wrong conclusion but believing it’s true due to selection effects plus unconscious selection effects towards weaker arguments, at least without very expensive experiments or access to ground truth.
Nod, makes sense, I think I want to just focus on this atm.
(also, btw I super appreciate you engaging, I’m sure you’ve argued a bunch with folk like this already)
So here’s the more specific thing I actually believe. I agree things don’t automatically happen eventually just because they can. At least, not automatically on relevant timescales. (i.e. eventually infinite monkeys mashing keyboards will produce shakespeare, but, not for bazillions of years)
The argument is:
If something can happen
and there’s a fairly strong reason to expect some process to steer towards that thing happening
and there’s not a reason to expect some other processes to steer towards that thing not happening
...then the thing probably happens eventually, on a somewhat reasonable timescale, all else equal. (“reasonable” timescale depends on how fast whatever steering process works. i.e. stellar evolution might take billions of years, evolution millions of years, and human engineers thousands of years).
For example, when the first organisms appeared on earth and begin to mutate, I think a smart outside observer could predict “evolution will happen, and unless all the replicators die out, there will probably eventually be a variety of complex organisms.”
But, they wouldn’t be able to predict that any particular complex mutation would happen. (for example, flying birds, or human intelligence). It was a long time before we got birds. We only have one Earth to sample from, but we’re already ~halfway between the time the earth was born and and when the sun engulfs it, so, it’s not too surprising if evolution never got around to the combination of traits that birds have.
I think this is a fairly basic probability argument? Like, if each day, there’s an n% chance of a beneficial mutation occuring (and then it’s host surviving and replicating, given a long enough chunk of time, it would (eventually) be pretty surprising if it never happened. Maybe any specific mutation would be difficult to predict happening in 10 billion years. But, if we had trillions and trillions of years, it would be a pretty weird claim that it’d never happen.
Similarly, if each day there are N engineers thinking about how to solve a problem, and making a reasonable effort to creatively explore the space of ways of solving the problem, and we know the problem is solveable, then each day there’s an n% chance of one of them stumbling towards the solution.
(In both evolution and the engineer’s cases, a thing that makes this a lot easier is that the search isn’t completely blind. Partial successes can compound. Evolution wasn’t going to invent birds in one go, that would be indeed be way too combinatorically hard. But, it got to invent “wiggling appendages”, which then opened up new, better search space of “particular ways of wiggling appendages” which eventually leads to locomotion and then flight)
How fast you should expect that to happens on how much resources are being thrown at it.
(maybe worth reiterate: I don’t think the “things that can happen, eventually will” applies to AI in the near future, that’s a much more specific claim, and Eliezer-et-all are much less confident about that).
There exists some math, for a given creation/steering-system and some models of how often it generates new ideas and how fast those ideas then reach saturation, for “at what point it becomes more surprising, that a thing has never happened, than that it’s happened at least once.”
We can’t perfectly predict it, but it’s not something we’re perfectly ignorant about. It is possible to look at cells replicating, and model how often mutations happen, and what distribution of mutations tend to happen, and then make predictions about what distribution of mutations you might expect to see, how quickly.
I think early hypothetical non-evolved-but-smart aliens observing early evolution for a thousand years, wouldn’t be able to predict “birds”, but they would might be able to predict “wiggling appendages” (or at least whatever was earlier on the tech-tree than wiggling appendages. I’m meaning to include single-cells that are, like, twisting slightly here)
Looking at the rate of human engineering, it’d be pretty hard to predict exactly when heavier-than-air-flight would be invented, but, once you’ve reached the agricultural age and you’ve gotten to start seeing specialization of labor and creativity and spreading of ideas, and the existence-proof of birds, I think a hypothetical smart observer could put upper and lower bounds on when humans might eventually figure out it out. It would be weird if it took literally billions of years, given that it only took evolution like a few billion years and evolution was clearly way slower.
And, I haven’t yet laid out any of the specifics of “and here are the upper and lower bounds on long it seems like it should plausibly take humanity to invent superintelligence.” I don’t think I’d get a more specific answer than “hundreds or possibly thousand of years.”, but I think it is something that in principle has an answer and you should be able to find/evaluate evidence that narrows down the answer.
(I am still interested in finding nearterm things to bet on since it sounds like I’m more confident than you that general-intelligence-looking things are decently [like, >50%] likely to happen in the next 20 years or so)
Not important to your general point, but here I guess you run into some issues with the definition of “can”. You could argue that if something doesn’t happen it means it couldn’t have happened (if the universe is deterministic). And so then yes, everything that can happen, actually happens. But that isn’t the sense in which people normally use the word “can”. Instead it’s reasonable to say “it’s possible my son’s first word is ‘Mama’”, “it’s possible my son’s first word is ‘Papa’”, both of these things can happen (i.e. they are not prohibited by any natural laws that we know of). But only one of these things can be true; in many situations we’d say that two mutually incompatible events “can happen”. And therefore it’s not just a matter of timescale.
Sure, I agree with that. I think this makes superintelligence much more likely than it otherwise would be (because it’s not prohibited by any laws of physics that we know of, and people are trying to build it, and no-one is effectively preventing it from being built). But the same argument doesn’t apply to misaligned superintelligence or other doom-related claims. In fact, the opposite is true.
Superintelligence not killing everyone is not prohibited by the laws of physics
People are trying to ensure superintelligence doesn’t kill everyone
No-one is trying to make superintelligence kill everyone
So you could apply a similarly-shaped argument to “prove” that aligned superintelligence is coming on a “somewhat reasonable timescale”.
Yeah, when I say “things that can happen most likely will”, I don’t mean “in any specific case.” A given baby’s first words can’t be both mama and papa. But, there’s a range of phonemes that babies can make. And over time, eventually every combination of first 2-4 phonemes will happen to be a baby’s first “word”.
Before respond to the rest, I want to check back on, this bit, at the meta level:
This is something Eliezer (and I think I) have written about recently, which I think you read. (In the chapter “It’s favorite things”).
I get that you didn’t really buy those arguments as being dominating. But, a feeling I get when reading your question there is something like “there are a lot of moving parts to this argument, and when we focus on one for awhile the earlier bits lose salience.”
And, perhaps similar to “things that can happen, eventually will, given enough chances, unless stopped”, another pretty loadbearing structural claim is:
“It is possible to just actually exhaustively think through a large number of questions and arguments, and for each one, get to a pretty confident state of what the answer to that question is.”
And the,n it’s at least possible to make a pretty good guess about how things will play out, at least if we don’t learn new information.
And maybe you can’t get to 100% confidence. But you can rule out things like “well, it won’t work unless Claim A turns out to be false, even though it looks most likely true.” And this constraints what types of worlds you might possibly be living in.
Or, maybe you can’t reach that even a moderate confidence with your current knowledge, but, you can see which things you’re still uncertain of, which if you became more certain of, would change the overall picture.
...
(i.e. the “unless something stops it” clause in the “if it can happen, it will, unless stopped” argument, means we live in worlds whether either it eventually happens, or is stopped, and then we can start asking “okay, what are the ways it could hypothetically be stopped? how likely do those look?”)
“Things that can happen, eventually will, given enough chances, unless stopped” is one particular argument that is relevant to some of the subpoints here. Yesterday you were like “yeah I don’t buy that.” I spelled out what I meant, and its sounds like now your position is “okay, I do see what you mean there, but I don’t see how it leads to the final conclusion.”
There are a lot more steps, at that level of detail, before I’d expect you to believe something more similar to what I believe.
I’m super grateful for getting to talk to you about this so far, I’ve enjoyed the convo and it’s been helpful to me for getting more clarity on how all the pieces fit together in my own head. If you wanna tap out, seems super understandable.
But, the thing I am kinda hoping/asking for is for you to actually track all all the arguments as they build, and if a new argument changes your mind on a given claim, track how that fits into all the existing claims and whether it has any new implications.
...
I’m not quite sure how you’re relating to your previous beliefs about “if it can happen, it will” and the arguments I just made. I’m guessing it wasn’t exactly an update for you so much as a “reframing.”
But, it sounds like you now understand what I meant, and why it at least means “the fact superintelligence is possible, and that people are trying, means that it’ll probably happen [in some timeframe]”.
And, while I haven’t yet proven all the rest of the steps of the argument to you, like… I’m asking you to notice that I did have an answer there, and there are other pieces that I think I also have answers to. But the complete edifice is indeed multiple books worth, and because each individual (like you) has different cruxes, it’s hard to present all the arguments in a succinct, compelling way.
But, I’m asking if you’re up for at least being willing to entertain the structure of “maybe, Ray will be right that there is a large-but-finite set of claims, and it’s possible to get enough certainty on each claim to at least put pretty significant bounds on how unaligned AI may play out.”
Certainly, I could be wrong! I don’t mean to:
Dismiss the possibility of misaligned AI related X-risk
Dismiss the possibility that your particular lines of argument make sense and I’m missing some things
And I think caution with AI development is warranted for a number of reasons beyond pure misalignment risk.
But it’s a little worrying when a community widely shares a strong belief in doom while implying that the required arguments are esoteric and require lots of subtle claims, each of which might have counterarguments, but which overall will eventually convince you. 1a3orn has a good essay about this: https://1a3orn.com/sub/essays-ai-doom-invincible.html.
I think having intuitions around general intelligences being dangerous is perfectly reasonable; I have them too. As a very risk-averse and pro-humanity person, I’d almost be tempted to press a button to peacefully prevent AI advancement purely on the basis of a tiny potential risk (for I think everyone dying is very, very, very bad, I am not disagreeing with that point at all). But no such button exists, and attempts to stop AI development have their own side-effects that could add up to more risks on net. And though that’s unfortunate, it doesn’t mean that we should spread a message of “we are definitely doomed unless we stop”. A large number of people believing they are doomed is not a free way to increase the chances of an AI slowdown or pause. It has a lot of negative side-effects. Many smart and caring people I know have put their lives on pause and made serious (in my opinion, bad) decisions on the basis that superintelligence will probably kill us, or if not there’ll be a guaranteed utopia. To be clear, I am not saying that we should believe or spread false things about AI risk being lower than it actually is so that people’s personal lives temporarily improve. But rather I am saying that exaggerating claims of doom or making arguments sound more certain than they are for consequentialist purposes is not free.
That seems like an understandable position to have – one of the things that sucks about the situation is I do think it’s just kinda reasonable from the outside to trigger some kind of immune reaction.
But from my perspective it’s “The evidence just says pretty clearly we are pretty doomed”, and the people who disagree seem to be pretty consistently be sliding off in weird ways or responding to something about vibes rather than engaging with the arguments.
(This is compounded by people who disagree also often picking up on a vibe from some doomy people I agree is sus, one variant of which is pointed at in Val’s Here’s the exit).
I do think it sucks that it’s hard to tell how much of this is the sort of failure mode that la3orn piece is pointing at, vs Epistemic Slipperiness, vs just “it’s actually a fairly complex argument but relatively straightforward once you deal with the complexity.”
I wrote a post on that exact selection effect, and there’s an even trickier problem where results are heavy tailed, meaning that a small, insular smart group reaching the correct conclusions is basically indistinguishable from a small, insular smart group reaching the wrong conclusion but believing it’s true due to selection effects plus unconscious selection effects towards weaker arguments, at least without very expensive experiments or access to ground truth.
Here’s an EA Forum version of the post.