What would it look like for AI to go extremely well?
Here’s a long list of things that I intuitively want AI to do. It’s meant to gesture at an ambitiously great vision of the future, rather than being a precise target. (In fact, there are obvious tradeoffs between some of these things, and I’m just ignoring those for now.)
I want AGI to:
end factory farming
end poverty
enable space exploration and colonization
solve governance & coordination problems
execute the abolitionist project
cultivate amazing and rich lives for everyone
tile a large chunk of the future lightcone with hedonium?
be motivating and fun to interact with
promote the best and strongest kinds of social bonds
figure out how consciousness works as well as possible and how to make it extremely positive
counteract climate change and racism and sexual abuse and sexism and crime and scamming and depression
make transportation amazing
make healthcare amazing (sickness, long-lasting body pain, aging, disease, cancer; speed and quality and cost of healthcare)
make incredible art and games and sports and get the right people to engage in them
make delicious food
take over menial tasks
maintain opportunities for immersion in natural beauty
figure out the right way to interact with aliens and interact with them that way (assuming we ever interact with aliens, which I think is quite plausible, even for generally intelligent ones)
maintain opportunities for intellectual stimulation and discovery?
defend against x-risks (other AGIs, nuclear, bio, climate, aliens)
reverse entropy?
help humans be happier w/ less (e.g. via meditation)
satisfy Maslow’s full hierarchy of needs for everyone
improve wild animal welfare (drastically)
make moral progress consistently (if needed)
maintain human agency (where appropriate)
continuously work on a balance between figuring out what is best for the universe and acting according to its best current understanding of what is best for the universe (like I try to do)
(Relevant context is that I’m a pretty confident total hedonistic utilitarian.)
One possibly important takeaway is that a lot of this has to do with AI applications, and may continue to be hard to achieve with intent aligned AI for the same reasons they’re hard to achieve today: lack of consensus, going against local incentives for some powerful people and groups, etc. A couple (not novel) ideas for improving this:
Maybe AI startups that work on these applications are actually counterfactually important
Maybe we should try to make AIs be truth-seeking and prosocial in competitive envs (including when competing with each other), which may be better than e.g. the highly partisan US Congress as a mechanism for pursuing these goals on a large scale
(Relevant context is that I’m a pretty confident total hedonistic utilitarian.)
What does a “pretty confident total hedonistic utilitarian” do with the people who endorse other kinds of things and don’t consider hedonic valence important (in the long future, outside the context of the modern world)? Exploring or developing a philosophical position is distinct from espousing it.
I’m not so sure about total hedonistic utilitarianism that I want to directly stick it into future-shaping AIs, I’d rather have them “continuously work on a balance between figuring out what is best for the universe and acting according to [their] best current understanding of what is best for the universe (like I try to do)”
I think other people can be wrong about morality, in which case I don’t think their notion of a good future is something I need to try to promote
“Exploring or developing a philosophical position is distinct from espousing it.” If I understand correctly, this is mostly a matter of how well fleshed out I think the position is and how confident I am in it. I think there may be small pieces of the argument and the position that aren’t perfectly fleshed out, but I expect there to be ways to iron out the details, and I overall think the position and the argument are pretty well fleshed-out. I’d say I espouse total hedonistic utilitarianism (while also being open to further development).
If other people endorse other things and don’t consider hedonic valence important, then we should have good decision-making mechanisms for handling this kind of conflict as we shape the long-term future. I mentioned above (point 2 at the bottom of the original post) that I want such a decision process to be truth-seeking and prosocial. It should seek to figure out whether those other things actually matter and whether hedonic valence actually matters. If hedonic valence actually matters and nothing else does (as I suspect), then hedonic valence should be prioritized in decisions. In the prosocial part I’m including the idea that we should probably try ease the blow of this decision to deprioritize someone’s preferences. Maybe one example conflict is between factory farmers and people who want to eliminate factory farming. I’d want to eliminate factory farming, while offering new ways for the people currently reliant on factory farming to live good lives.
It should seek to figure out whether those other things actually matter and whether hedonic valence actually matters. If hedonic valence actually matters and nothing else does (as I suspect), then hedonic valence should be prioritized in decisions.
This veers into moral realism. My point is primarily that different people might have different values, and I expect it’s plausible that values-on-reflection can move quite far (conceptually) from any psychological drives (or biological implementation details) encoded by evolution, in different ways for different people. This makes moral common ground much less important pragmatically for setting up the future than some largely morality-agnostic framework that establishes boundaries and coordination, including on any common ground or moral disagreements (while providing options for everyone individually as they would choose). And conversely, any scheme for setting up the future that depends on nontrivial object level moral considerations (at the global level) risks dystopia.
It should be an issue even for the sake of a single extremely unusual person who doesn’t conform to some widespread moral principles. If a system of governance in the long future can handle that well, there doesn’t seem to be a reason to do anything different for anyone else.
That’s not an accident, I do lean pretty strongly realist :). But that’s another thing I don’t want to hardcode into AGIs, I’d rather maintain some uncertainty about it and get AGI’s help in trying to continue to navigate realism vs antirealism.
I think I agree about the need for a morality-agnostic framework that establishes boundaries and coordination, and about the risks of dystopia if we attempt to commit to any positions on object-level morality too early in our process of shaping the future. But my hope is that our meta-approach helps achieve moral progress (perhaps towards an end state of moral progress, which I think is probably well-applied total hedonistic utilitarianism). So I still care a lot about getting the object-level moral considerations involved in shaping the future at some point. Without that, you might miss out on some really important features of great futures (like abolishing suffering).
Perhaps relatedly, I’m confused about your last paragraph. If a single highly unusual person doesn’t conform to the kinds of moral principles I want to have shaping the future, that’s probably because that person is wrong, and I’m fine with their notions of morality being ignored in the design of the future. Hitler comes to mind for this category, idk what comes to mind for you.
(I’ve always struggled to understand reasons for antirealists not to be nihilists, but haven’t needed to do so as a realist. This may hurt my ability to properly model your views here, though I’d be curious what you want your morality-agnostic framework to achieve and why you think that matters in any sense.)
(I realize I’m saying lots of controversial things now, so I’ll flag that the original post depended relatively little on my total hedonistic utilitarian views and much of it should remain relevant to people who disagree with me.)
In a framing that permits orthogonality, moral realism is not a useful claim, it wouldn’t matter for any practical purposes if it’s true in some sense. That is the point of the extremely unusual person example, you can vary the degree of unusualness as needed, and I didn’t mean to suggest repugnance of the unusualness, more like its alienness with respect to some privileged object level moral position.
Object level moral considerations do need to shape the future, but I don’t see any issues with their influence originating exclusively from all the individual people, its application at scale arising purely from coordination between the influence these people exert. So if we take that extremely unusual person as one example, their influence wouldn’t be significant because there’s only one of them, but it’s not diminished beyond that under the pressure of others. Where it’s in direct opposition to others, the boundaries aspect of coordination comes into play, some form of negotiation. But if instead there are many people who share some object level moral principles, their collective influence should result in global outcomes that are not in any way inferior to what you imagine a top down object level moral guidance might be able to achieve.
So I don’t see any point to a top down architecture, once superintelligence enables practical considerations to be tracked in sufficient detail at the level of individual people, only disadvantages. The relevance of object level morality (or alignment of the superintelligence managing the physical world substrate level) is making it so that it doesn’t disregard particular people, that it does allocate influence to their volition. The alternatives are that some or all people get zero or minuscule influence (extinction or permanent disempowerment), compared to AIs or (in principle, though this seems much less likely) to other people.
What would it look like for AI to go extremely well?
Here’s a long list of things that I intuitively want AI to do. It’s meant to gesture at an ambitiously great vision of the future, rather than being a precise target. (In fact, there are obvious tradeoffs between some of these things, and I’m just ignoring those for now.)
I want AGI to:
end factory farming
end poverty
enable space exploration and colonization
solve governance & coordination problems
execute the abolitionist project
cultivate amazing and rich lives for everyone
tile a large chunk of the future lightcone with hedonium?
be motivating and fun to interact with
promote the best and strongest kinds of social bonds
figure out how consciousness works as well as possible and how to make it extremely positive
counteract climate change and racism and sexual abuse and sexism and crime and scamming and depression
make transportation amazing
make healthcare amazing (sickness, long-lasting body pain, aging, disease, cancer; speed and quality and cost of healthcare)
make incredible art and games and sports and get the right people to engage in them
make delicious food
take over menial tasks
maintain opportunities for immersion in natural beauty
figure out the right way to interact with aliens and interact with them that way (assuming we ever interact with aliens, which I think is quite plausible, even for generally intelligent ones)
maintain opportunities for intellectual stimulation and discovery?
defend against x-risks (other AGIs, nuclear, bio, climate, aliens)
reverse entropy?
help humans be happier w/ less (e.g. via meditation)
satisfy Maslow’s full hierarchy of needs for everyone
improve wild animal welfare (drastically)
make moral progress consistently (if needed)
maintain human agency (where appropriate)
continuously work on a balance between figuring out what is best for the universe and acting according to its best current understanding of what is best for the universe (like I try to do)
(Relevant context is that I’m a pretty confident total hedonistic utilitarian.)
One possibly important takeaway is that a lot of this has to do with AI applications, and may continue to be hard to achieve with intent aligned AI for the same reasons they’re hard to achieve today: lack of consensus, going against local incentives for some powerful people and groups, etc. A couple (not novel) ideas for improving this:
Maybe AI startups that work on these applications are actually counterfactually important
Maybe we should try to make AIs be truth-seeking and prosocial in competitive envs (including when competing with each other), which may be better than e.g. the highly partisan US Congress as a mechanism for pursuing these goals on a large scale
What does a “pretty confident total hedonistic utilitarian” do with the people who endorse other kinds of things and don’t consider hedonic valence important (in the long future, outside the context of the modern world)? Exploring or developing a philosophical position is distinct from espousing it.
I’m not so sure about total hedonistic utilitarianism that I want to directly stick it into future-shaping AIs, I’d rather have them “continuously work on a balance between figuring out what is best for the universe and acting according to [their] best current understanding of what is best for the universe (like I try to do)”
I think other people can be wrong about morality, in which case I don’t think their notion of a good future is something I need to try to promote
“Exploring or developing a philosophical position is distinct from espousing it.” If I understand correctly, this is mostly a matter of how well fleshed out I think the position is and how confident I am in it. I think there may be small pieces of the argument and the position that aren’t perfectly fleshed out, but I expect there to be ways to iron out the details, and I overall think the position and the argument are pretty well fleshed-out. I’d say I espouse total hedonistic utilitarianism (while also being open to further development).
If other people endorse other things and don’t consider hedonic valence important, then we should have good decision-making mechanisms for handling this kind of conflict as we shape the long-term future. I mentioned above (point 2 at the bottom of the original post) that I want such a decision process to be truth-seeking and prosocial. It should seek to figure out whether those other things actually matter and whether hedonic valence actually matters. If hedonic valence actually matters and nothing else does (as I suspect), then hedonic valence should be prioritized in decisions. In the prosocial part I’m including the idea that we should probably try ease the blow of this decision to deprioritize someone’s preferences. Maybe one example conflict is between factory farmers and people who want to eliminate factory farming. I’d want to eliminate factory farming, while offering new ways for the people currently reliant on factory farming to live good lives.
This veers into moral realism. My point is primarily that different people might have different values, and I expect it’s plausible that values-on-reflection can move quite far (conceptually) from any psychological drives (or biological implementation details) encoded by evolution, in different ways for different people. This makes moral common ground much less important pragmatically for setting up the future than some largely morality-agnostic framework that establishes boundaries and coordination, including on any common ground or moral disagreements (while providing options for everyone individually as they would choose). And conversely, any scheme for setting up the future that depends on nontrivial object level moral considerations (at the global level) risks dystopia.
It should be an issue even for the sake of a single extremely unusual person who doesn’t conform to some widespread moral principles. If a system of governance in the long future can handle that well, there doesn’t seem to be a reason to do anything different for anyone else.
That’s not an accident, I do lean pretty strongly realist :). But that’s another thing I don’t want to hardcode into AGIs, I’d rather maintain some uncertainty about it and get AGI’s help in trying to continue to navigate realism vs antirealism.
I think I agree about the need for a morality-agnostic framework that establishes boundaries and coordination, and about the risks of dystopia if we attempt to commit to any positions on object-level morality too early in our process of shaping the future. But my hope is that our meta-approach helps achieve moral progress (perhaps towards an end state of moral progress, which I think is probably well-applied total hedonistic utilitarianism). So I still care a lot about getting the object-level moral considerations involved in shaping the future at some point. Without that, you might miss out on some really important features of great futures (like abolishing suffering).
Perhaps relatedly, I’m confused about your last paragraph. If a single highly unusual person doesn’t conform to the kinds of moral principles I want to have shaping the future, that’s probably because that person is wrong, and I’m fine with their notions of morality being ignored in the design of the future. Hitler comes to mind for this category, idk what comes to mind for you.
(I’ve always struggled to understand reasons for antirealists not to be nihilists, but haven’t needed to do so as a realist. This may hurt my ability to properly model your views here, though I’d be curious what you want your morality-agnostic framework to achieve and why you think that matters in any sense.)
(I realize I’m saying lots of controversial things now, so I’ll flag that the original post depended relatively little on my total hedonistic utilitarian views and much of it should remain relevant to people who disagree with me.)
In a framing that permits orthogonality, moral realism is not a useful claim, it wouldn’t matter for any practical purposes if it’s true in some sense. That is the point of the extremely unusual person example, you can vary the degree of unusualness as needed, and I didn’t mean to suggest repugnance of the unusualness, more like its alienness with respect to some privileged object level moral position.
Object level moral considerations do need to shape the future, but I don’t see any issues with their influence originating exclusively from all the individual people, its application at scale arising purely from coordination between the influence these people exert. So if we take that extremely unusual person as one example, their influence wouldn’t be significant because there’s only one of them, but it’s not diminished beyond that under the pressure of others. Where it’s in direct opposition to others, the boundaries aspect of coordination comes into play, some form of negotiation. But if instead there are many people who share some object level moral principles, their collective influence should result in global outcomes that are not in any way inferior to what you imagine a top down object level moral guidance might be able to achieve.
So I don’t see any point to a top down architecture, once superintelligence enables practical considerations to be tracked in sufficient detail at the level of individual people, only disadvantages. The relevance of object level morality (or alignment of the superintelligence managing the physical world substrate level) is making it so that it doesn’t disregard particular people, that it does allocate influence to their volition. The alternatives are that some or all people get zero or minuscule influence (extinction or permanent disempowerment), compared to AIs or (in principle, though this seems much less likely) to other people.