The people currently taking actions that increase the probability that {the former is solved first} are not evil people trying to kill everyone, they’re confused people who think that their actions are actually increasing the probability that {the latter is solved first}.
rests on a few over-simplifications associated with
Almost all of what the future contains is determined by which of the two following engineering problems is solved first:
How to build an aligned superintelligent AI (if solved first, everyone gets utopia)
For example:
Say, I’m too old to expect aligned AI to give me eternal life (or aligned AI simply might not mean eternal life/bliss for me, for whichever reason; maybe as it’s still better to start with newborns more efficiently made into bliss-enjoying automatons or whatever utopia entails), so for me individually, the intermediate years before superintelligence are the relevant ones, so I might rationally want to earn money by working on enriching myself, whatever the (un-)alignment impact of it
Given the public goods nature of alignment, I might fear it’s unlikely for us to cooperate, so we’ll all freeride by working to enrich ourselves working on various things rather leaning towards building unaligned AI. With such a prior, it may be rational for any self-interested person—also absent confusion—to indeed freeride: ‘hopefully I make a bit of money for an enjoyable few or many years, before unaligned AGI destroys us with near-certainty anyway’.
Even if technical alignment was not too difficult, standard Moloch-type effects (again, all sorts of freeriding/power seeking) might mean chances of unaligned users of even otherwise ‘aligned’ technology are overwhelming, again meaning most value for most people lies in increasing their material welfare for the next few years to come, rather than ‘wasting’ their resources towards an futile alignment project.
Assume we’re natural humans, instead of perfectly patient beings (I know, weird assumption). So you’re eating the chocolate and drinking the booze, even if you know in the longer-term future it’s a net negative for you. You need not be confused about it, you need not even be strictly irrational from today’s point of view (depending on the exact definition of the term): As you might genuinely care more about the “you” in the current and coming few days or years, than about that future self that simply doesn’t yet feel too close to you—in other words, just the way it can be rational to care a bit more about yourself than about others, you might care a bit more about your ‘current’ self than about that “you” a few decades—or an eternity—ahead. So, the near-eternal utopia might mean a lot to you, but not infinitely much. It is easy to see that then, it may well remain rational for you to use your resources towards more mundane aims of increasing your material welfare for the few intermediate years to come—given that your own potential marginal contribution to P(Alignment first) is very small (<<1).
Hence, I’m afraid, when you take into account real-world complexities, it may well be rather perfectly rational for individuals to race on towards unaligned superintelligence; you’d require more altruism rather than only more enlightenment to improve the situation. In a messy world of 8 billions in capitalistic partly-zero-sum competition (or here even partly negative-sum competition), it simply isn’t simple to get cooperation even if individuals were approximately rational and informed—even if, indeed, we are all in this together.
Say, I’m too old to expect aligned AI to give me eternal life (or aligned AI simply might not mean eternal life/bliss for me, for whichever reason; maybe as it’s still better to start with newborns more efficiently made into bliss-enjoying automatons or whatever utopia entails), so for me individually, the intermediate years before superintelligence are the relevant ones, so I might rationally want to earn money by working on enriching myself, whatever the (un-)alignment impact of it
I expect that the set of people who:
Expect to have died of old age within five years
Are willing to reduce how long they’ll expected-live in order to be richer before they die
Are willing to sacrifice all of humanity’s future (including the future of their loved-ones who aren’t expected to die of old age within five years)
Take actions who impact the what superintelligence is built
is extremely small. It’s not like Sam Altman is 70.
Given the public goods nature of alignment, I might fear it’s unlikely for us to cooperate, so we’ll all freeride by working to enrich ourselves working on various things rather leaning towards building unaligned AI. With such a prior, it may be rational for any self-interested person—also absent confusion—to indeed freeride: ‘hopefully I make a bit of money for an enjoyable few or many years, before unaligned AGI destroys us with near-certainty anyway’.
Even if technical alignment was not too difficult, standard Moloch-type effects (again, all sorts of freeriding/power seeking) might mean chances of unaligned users of even otherwise ‘aligned’ technology are overwhelming, again meaning most value for most people lies in increasing their material welfare for the next few years to come, rather than ‘wasting’ their resources towards an futile alignment project.
But it’s not public-goods kind of thing. If they knew that the choice was between:
Rich now, dead in five years
Less rich now, post-scarcely-rich and immortal and in utopia forever in five years
then pretty much nobody would still choose the former. If people realized the truth, they would choose otherwise.
In my experience, most of the selfishness people claim to have to justify continuing to destroy the world instead of helping alignment is less {because that’s their actual core values and they’re acting rationally} and more just {finding excuses to not have to think about the problem and change their minds/actions}. I talk more about this here.
To be fair, this is less a failure of {being wrong about this specific thing}, and more a failure of {being less good at rationality in general}. But it’s still mistake-theoritic moreso than conflict-theoritic.
Are willing to reduce how long they’ll expected-live in order to be richer before they die
Are willing to sacrifice all of humanity’s future (including the future of their loved-ones who aren’t expected to die of old age within five years)
Take actions who impact the what superintelligence is built
is extremely small.
It would be extremely small if we’d be talking about binaries/pure certainty.
If in reality, everything is uncertain, and in particular (as I think), everyone has individually a tiny probability of changing the outcome, everyone ends up free-riding.
This is true for the commoner[1] who’s using ChatGPT or whichever cheapest & fastest AI tool he finds for him to succeed in his work, therefore supporting the AI race and “Take actions who impact the what superintelligence is built”.
It may also be true for CEOs of many AI companies. Yes their distopia-probability-impact is larger, but equally so do their own career, status, power—and future position within the potential new society, see jacob_cannell’s comment—depend more strongly hinge on their action.
(Imperfect illustrative analogy: Climate change may kill a hundred million people or so, the being called human will tend to fly around the world, heating it up. Would anyone be willing to “sacrifice” hundred million people for her trip to Bali? I have some hope they wouldn’t. But, she’ll not avoid the holiday if her probability of avoiding disastrous climate change anyway is tiny. And if instead of her holiday, her entire career, fame, power depended on her to continue polluting, even if she was a global scale polluter, she’d likely enough not stop emitting for the sake of changing. I think we clearly must acknowledge this type of public good/freerider dynamics in the AI domain.
***
In my experience, most of the selfishness people claim to have to justify continuing to destroy the world instead of helping alignment is less {because that’s their actual core values and they’re acting rationally} and more just {finding excuses to not have to think about the problem and change their minds/actions}.
Agree with a lot in this, but w/o changing my interpretation much: Yes, humans are good in rationalization of their bad actions indeed. But they’re especially good at it when it’s in their egoistic interest to continue the bad thing. So both the commoner and the AI CEO alike, might well rationalize ‘for complicated reason it’s fine for the world if we (one way or another) heat up the AI race a bit’ in irrational ways—really as they might rightly see it in their own material interest to be continuing to do so, and want to make their own brain & others see them as good persons nevertheless.
Btw, I agree the situation is a bit different for commoners vs. Sam Altman & co. I read your post as being about persons in general, even people who are merely using the AI tools and therefore economically influence the domain via the market forces. If that was not only my wrong reading, then you might simplify the discussion if you edit your post to refer to those with significant probability of making a difference (I interpret your reply in that way; though I also don’t think the result changes much, as I try to explain)
Your strong conclusion
rests on a few over-simplifications associated with
For example:
Say, I’m too old to expect aligned AI to give me eternal life (or aligned AI simply might not mean eternal life/bliss for me, for whichever reason; maybe as it’s still better to start with newborns more efficiently made into bliss-enjoying automatons or whatever utopia entails), so for me individually, the intermediate years before superintelligence are the relevant ones, so I might rationally want to earn money by working on enriching myself, whatever the (un-)alignment impact of it
Given the public goods nature of alignment, I might fear it’s unlikely for us to cooperate, so we’ll all freeride by working to enrich ourselves working on various things rather leaning towards building unaligned AI. With such a prior, it may be rational for any self-interested person—also absent confusion—to indeed freeride: ‘hopefully I make a bit of money for an enjoyable few or many years, before unaligned AGI destroys us with near-certainty anyway’.
Even if technical alignment was not too difficult, standard Moloch-type effects (again, all sorts of freeriding/power seeking) might mean chances of unaligned users of even otherwise ‘aligned’ technology are overwhelming, again meaning most value for most people lies in increasing their material welfare for the next few years to come, rather than ‘wasting’ their resources towards an futile alignment project.
Assume we’re natural humans, instead of perfectly patient beings (I know, weird assumption). So you’re eating the chocolate and drinking the booze, even if you know in the longer-term future it’s a net negative for you. You need not be confused about it, you need not even be strictly irrational from today’s point of view (depending on the exact definition of the term): As you might genuinely care more about the “you” in the current and coming few days or years, than about that future self that simply doesn’t yet feel too close to you—in other words, just the way it can be rational to care a bit more about yourself than about others, you might care a bit more about your ‘current’ self than about that “you” a few decades—or an eternity—ahead. So, the near-eternal utopia might mean a lot to you, but not infinitely much. It is easy to see that then, it may well remain rational for you to use your resources towards more mundane aims of increasing your material welfare for the few intermediate years to come—given that your own potential marginal contribution to P(Alignment first) is very small (<<1).
Hence, I’m afraid, when you take into account real-world complexities, it may well be rather perfectly rational for individuals to race on towards unaligned superintelligence; you’d require more altruism rather than only more enlightenment to improve the situation. In a messy world of 8 billions in capitalistic partly-zero-sum competition (or here even partly negative-sum competition), it simply isn’t simple to get cooperation even if individuals were approximately rational and informed—even if, indeed, we are all in this together.
I expect that the set of people who:
Expect to have died of old age within five years
Are willing to reduce how long they’ll expected-live in order to be richer before they die
Are willing to sacrifice all of humanity’s future (including the future of their loved-ones who aren’t expected to die of old age within five years)
Take actions who impact the what superintelligence is built
is extremely small. It’s not like Sam Altman is 70.
But it’s not public-goods kind of thing. If they knew that the choice was between:
Rich now, dead in five years
Less rich now, post-scarcely-rich and immortal and in utopia forever in five years
then pretty much nobody would still choose the former. If people realized the truth, they would choose otherwise.
In my experience, most of the selfishness people claim to have to justify continuing to destroy the world instead of helping alignment is less {because that’s their actual core values and they’re acting rationally} and more just {finding excuses to not have to think about the problem and change their minds/actions}. I talk more about this here.
To be fair, this is less a failure of {being wrong about this specific thing}, and more a failure of {being less good at rationality in general}. But it’s still mistake-theoritic moreso than conflict-theoritic.
It would be extremely small if we’d be talking about binaries/pure certainty.
If in reality, everything is uncertain, and in particular (as I think), everyone has individually a tiny probability of changing the outcome, everyone ends up free-riding.
This is true for the commoner[1] who’s using ChatGPT or whichever cheapest & fastest AI tool he finds for him to succeed in his work, therefore supporting the AI race and “Take actions who impact the what superintelligence is built”.
It may also be true for CEOs of many AI companies. Yes their distopia-probability-impact is larger, but equally so do their own career, status, power—and future position within the potential new society, see jacob_cannell’s comment—depend more strongly hinge on their action.
(Imperfect illustrative analogy: Climate change may kill a hundred million people or so, the being called human will tend to fly around the world, heating it up. Would anyone be willing to “sacrifice” hundred million people for her trip to Bali? I have some hope they wouldn’t. But, she’ll not avoid the holiday if her probability of avoiding disastrous climate change anyway is tiny. And if instead of her holiday, her entire career, fame, power depended on her to continue polluting, even if she was a global scale polluter, she’d likely enough not stop emitting for the sake of changing. I think we clearly must acknowledge this type of public good/freerider dynamics in the AI domain.
***
Agree with a lot in this, but w/o changing my interpretation much: Yes, humans are good in rationalization of their bad actions indeed. But they’re especially good at it when it’s in their egoistic interest to continue the bad thing. So both the commoner and the AI CEO alike, might well rationalize ‘for complicated reason it’s fine for the world if we (one way or another) heat up the AI race a bit’ in irrational ways—really as they might rightly see it in their own material interest to be continuing to do so, and want to make their own brain & others see them as good persons nevertheless.
Btw, I agree the situation is a bit different for commoners vs. Sam Altman & co. I read your post as being about persons in general, even people who are merely using the AI tools and therefore economically influence the domain via the market forces. If that was not only my wrong reading, then you might simplify the discussion if you edit your post to refer to those with significant probability of making a difference (I interpret your reply in that way; though I also don’t think the result changes much, as I try to explain)