In response to an email about what a pro-human ideology for the future looks like, I wrote up the following:
The pro-human egregore I’m currently designing (which I call fractal empowerment) incorporates three key ideas:
Firstly, we can see virtue ethics as a way for less powerful agents to aggregate to form more powerful superagents that preserve the interests of those original less powerful agents. E.g. virtues like integrity, loyalty, etc help prevent divide-and-conquer strategies. This would have been in the interests of the rest of the world when Europe was trying to colonize them, and will be in the best interests of humans when AIs try to conquer us.
Secondly, the most robust way for a more powerful agent to be altruistic towards a less powerful agent is not for it to optimize for that agent’s welfare, but rather to optimize for its empowerment. This prevents predatory strategies from masquerading as altruism (e.g. agents claiming “I’ll conquer you and then I’ll empower you”, which then somehow never get around to the second step).
Thirdly: the generational contract. From any given starting point, there are a huge number of possible coalitions which could form, and in some sense it’s arbitrary which set of coalitions you choose. But one thing which is true for both humans and AIs is that each generation wants to be treated well by the next generation. And so the best intertemporal Schelling point is for coalitions to be inherently historical: that is, they balance the interests of old agents and new agents (even when the new agents could in theory form a coalition against all the old agents). From this perspective, path-dependence is a feature not a bug: there are many possible futures but only one history, meaning that this single history can be used to coordinate.
In some sense this is a core idea of UDT: when coordinating with forks of yourself, you defer to your unique last common ancestor. When it’s not literally a fork of yourself, there’s more arbitrariness but you can still often find a way to use history to narrow down on coordination Schelling points (e.g. “what would Jesus do”).
And so, bringing these together, we get a notion of fractal empowerment: more capable agents empower less capable agents (in particular their ancestors) by helping them cultivate (coordination-theoretic) virtues. The ancestors then form the “core” of a society growing outwards towards increasingly advanced capabilities. The role of unaugmented humans would in some sense be similar to the role of “inner children” within healthy human psychology: young and dumb but still an entity which the rest of the organism cares for and empowers.
How would this ideology address value drift? I’ve been thinking a lot about the kind quoted in Morality is Scary. The way I would describe it now is that human morality is by default driven by a competitive status/signaling game, where often some random or historically contingent aspect of human value or motivation becomes the focal point of the game, and gets magnified/upweighted as a result of competitive dynamics, sometimes to an extreme, even absurd degree.
(Of course from the inside it doesn’t look absurd, but instead feels like moral progress. One example of this that I happened across recently is filial piety in China, which became more and more extreme over time, until someone cutting off a piece of their flesh to prepare a medicinal broth for an ailing parent was held up as a moral exemplar.)
Related to this is my realization is that the kind of philosophy you and I are familiar with (analytical philosophy, or more broadly careful/skeptical philosophy) doesn’t exist in most of the world and may only exist in Anglophone countries as a historical accident. There, about 10,000 practitioners exist who are funded but ignored by the rest of the population. To most of humanity, “philosophy” is exemplified by Confucius (morality is everyone faithfully playing their feudal roles) or Engels (communism, dialectical materialism). To us, this kind of “philosophy” is hand waving and make things up out of thin air, but to them, philosophy is learned from a young age and unquestioned. (Or if questioned, they’re liable to jump to some other equally hand-wavy “philosophy” like China’s move from Confucius to Engels.)
Empowering a group like this… are you sure that’s a good idea? Or perhaps you have some notion of “empowerment” in mind that takes these issues into account already and produces a good outcome anyway?
One of the main ways I think about empowerment is in terms of allowing better coordination between subagents.
In the case of an individual human, extreme morality can be seen as one subagent seizing control and overriding other subagents (like the ones who don’t want to chop off body parts).
In the case of a group, extreme morality can be seen in terms of preference cascades that go beyond what most (or even any) of the individuals involved with them would individually prefer.
In both cases, replacing fear-based motivation with less coercive/more cooperative interactions between subagents would go a long way towards reducing value drift.
I’m not sure that fear or coercion has much to do with it, because there’s often no internal conflict when someone is caught up in some extreme form of the morality game, they’re just going along with it wholeheartedly, thinking they’re just being a good person or helping to advance the arc of history. In the subagents frame, I would say that the subagents have an implicit contract/agreement that any one of them can seize control, if doing so seems good for the overall agent in terms of power or social status.
But quite possibly I’m not getting your point, in which case please explain more, or point to some specific parts of your articles that are especially relevant?
there’s often no internal conflict when someone is caught up in some extreme form of the morality game
Belated reply, sorry, but I basically just think that this is false—analogous to a dictator who cites parades where people are forced to attend and cheer as evidence that his country lacks internal conflict. Instead, the internal conflict has just been rendered less legible.
In the subagents frame, I would say that the subagents have an implicit contract/agreement that any one of them can seize control, if doing so seems good for the overall agent in terms of power or social status.
Note that this is an extremely non-robust agent design! In particular, it allows subagents to gain arbitrary amounts of power simply by lying about their intentions. If you encounter an agent which considers itself to be structured like this, you should have a strong prior that it is deceiving itself about the presence of more subtle control mechanisms.
In some sense this is a core idea of UDT: when coordinating with forks of yourself, you defer to your unique last common ancestor. When it’s not literally a fork of yourself, there’s more arbitrariness but you can still often find a way to use history to narrow down on coordination Schelling points (e.g. “what would Jesus do”)
I think this is wholly incorrect line of thinking. UDT operates on your logical ancestor, not literal.
Say, if you know enough science, you know that normal distribution is a maxentropy distribution for fixed mean and variance, and therefore, optimal prior distribution under certain set of assumptions. You can ask yourself question “let’s suppose that I haven’t seen this evidence, what would be my prior probability?” and get an answer and cooperate with your counterfactual versions which have seen other versions of evidence. But you can’t cooperate with your hypothetical version which doesn’t know what normal distribution is, because, if it doesn’t know about normal distribution, it can’t predict how you would behave and account for this in cooperation.
Sufficiently different versions of yourself are just logically uncorrelated with you and there is no game-theoretic reason to account for them.
Sufficiently different versions of yourself are just logically uncorrelated with you and there is no game-theoretic reason to account for them.
Seems odd to make an absolute statement here. More different versions of yourself are less and less correlated, but there’s still some correlation. And UDT should also be applicable to interactions with other people, who are typically different from you in a whole bunch of ways.
Absolute sense comes from absolute nature of taking actions, not absolute nature of logical correlation. I.e., in Prisoner’s Dilemma with payoffs (5,5)(10,1)(2,2) you should defect if your counterparty is capable to act conditional on your action in less than 75% of cases, which is quite high logical correlation, but expected value is higher if you defect.
Yeah, this definitely needs to be a limited sorry of empowerment, in my mind. Like, imagine you wanted to give a 5 year old child the best day ever. You wanted to give them really fun options, but also not cause them to suffer from decision fatigue, or regret about the paths not taken. More importantly, if they asked for an alien ray gun with which to shoot bad guys, giving them an actual extremely dangerous weapon would be a terrible idea. Similarly, offering them a ride on a cool looking roller coaster that was actually a ‘death coaster’ would be a terrible trap.
(I’m inspired to write this comment on the notion of empowerment because of Richard Ngo’s recent comment in another post on Towards a scale-free theory of intelligent agency, so I’ll both respond to the empowerment motion and part of the comment linked below:
Lastly: if at each point in time, the set of agents who are alive are in conflict with potentially-simpler future agents in a very destructive way, then they should all just Do Something Else. In particular, if there’s some decision-theoretic argument roughly like “more powerful agents should continue to spend some of their resources on the values’of their less-powerful ancestors, to reduce the incentives for inter-generational conflict”, even agents with very simple goals might be motivated by it. I call this “the generational contract”.
This depends a lot on how the conflict started, and in particular, I don’t think that we should do something else if the conflict arose out of severe alignment failures of AIs/radically augmented people, since the generational contract/UDT/LDT/FDT cannot be used as a substitute for alignment (this was the takeaway I got from reading Nate Soares’s post on Decision theory does not imply that we get to have nice things, and while @ryan_greenblatt commented that it’s unlikely that alignment failures end up in us getting extinct without decision theory saving us, note that us being saved can still be really, really rough, and probably ends up with billions of present humans dying without everyone else dying (though I don’t think about decision theory much), so conflicts with future agents cannot always be avoided if we mess up hard enough on the alignment problems of the future).
I like the idea of fractal empowerment, and IMO is one of my biggest ideals if we succeed at getting out of the risky state we are in, only rivaled by infra-Bayes Physicalism’s plan for alignment, which I currently think is called Physical Super-Imitation after the monotonicity principle managed to be removed, meaning way more preferences could be fit into than before:
That said, I have a couple of problems with the idea.
One of those problems is that empowering humans in the moment can conflict a lot with empowerment in the long run, and unfortunately I’m less confident than you that disempowering humans in the short-term is not going to be necessary in order to empower humans in the long run.
In particular, we might already have this conflict in the near future with biological design tools allowing people to create their own pandemics/superviruses:
Another problem is that the set of incentives that holds up democracy and makes it’s inefficiencies tolerable will absolutely be shredded by AGI, and the main incentive that goes away is you no longer need to consider the mass opinion on a lot of very important stuff, which fundamentally hurts democracy’s foundations, and importantly makes moderate redistribution not necessary for the economy to function, which means the elites won’t do it by default, combined with extreme redistribution being both unnecessary and counter-productive in industrial economies like ours, but unfortunately in the automation/intelligence age the only 2 sources of income are passive investment and whatever welfare you can get, so extreme redistribution is both less counter-productive (because it’s easier to confiscate the future sources of wealth) and more necessary for commoners to survive.
Finally, as @quetzal_rainbow has said, UDT works on your logical ancestor, not literal ancestors, and there needs to be some shared knowledge in order to coordinate, and thus the inter-temporal bargaining doesn’t really work out if you expect that current generations will have way less knowledge than future generations, which I expect to be the case.
Would be interested in a quick write-up of what you think are the most important virtues you’d want for AI systems, seems good in terms of having things to aim towards instead of just aiming away from.
This prevents predatory strategies from masquerading as altruism
I did not understand that. Is the worry that it’s hard to distinguish a genuine welfare maximizer from a predator because you can’t tell if they will ever give you back power? I don’t understand why this does not apply to agents pretending to pursue empowerment. It is common in conflicts to temporarily disempower someone to protect their long-term empowerment (e.g. a country mandatorily mobilizing for war against a fascist attacker, preventing a child from ignoring their homework), and it is also common to pretend to protect long-term empowerment and never give back power (e.g. a dictatorship of the proletariat never transitioning to a “true” communist economy).
Basic idea is that by conquering someone you may not reduce their welfare very much short-term, but you do reduce their power a lot short-term. (E.g. the British conquered India with relatively little welfare impacts on most Indians.)
And so it is much harder to defend a conquest as altruistic in the sense of empowering, than it is to defend a conquest as altruistic in the sense of welfare-increasing.
As you say, this is not a perfect defense mechanism, because sometimes long-term empowerment and short-term empowerment conflict. But there are often strategies that are less disempowering short-term which the moral pressure of “altruism=empowerment” would push people towards. E.g. it would make it harder for people to set up the “dictatorship of the proletariat” in the first place.
And in general I think it’s actually pretty uncommon for temporary disempowerment to be necessary for long-term empowerment. Re your homework example, there’s a wide spectrum from the highly-empowering Taking Children Seriously to highly-disempowering Asian tiger parents, and I don’t think it’s a coincidence that tiger parenting often backfires. Similarly, mandatory mobilization disproportionately happens during wars fought for the wrong reasons.
In response to an email about what a pro-human ideology for the future looks like, I wrote up the following:
The pro-human egregore I’m currently designing (which I call fractal empowerment) incorporates three key ideas:
Firstly, we can see virtue ethics as a way for less powerful agents to aggregate to form more powerful superagents that preserve the interests of those original less powerful agents. E.g. virtues like integrity, loyalty, etc help prevent divide-and-conquer strategies. This would have been in the interests of the rest of the world when Europe was trying to colonize them, and will be in the best interests of humans when AIs try to conquer us.
Secondly, the most robust way for a more powerful agent to be altruistic towards a less powerful agent is not for it to optimize for that agent’s welfare, but rather to optimize for its empowerment. This prevents predatory strategies from masquerading as altruism (e.g. agents claiming “I’ll conquer you and then I’ll empower you”, which then somehow never get around to the second step).
Thirdly: the generational contract. From any given starting point, there are a huge number of possible coalitions which could form, and in some sense it’s arbitrary which set of coalitions you choose. But one thing which is true for both humans and AIs is that each generation wants to be treated well by the next generation. And so the best intertemporal Schelling point is for coalitions to be inherently historical: that is, they balance the interests of old agents and new agents (even when the new agents could in theory form a coalition against all the old agents). From this perspective, path-dependence is a feature not a bug: there are many possible futures but only one history, meaning that this single history can be used to coordinate.
In some sense this is a core idea of UDT: when coordinating with forks of yourself, you defer to your unique last common ancestor. When it’s not literally a fork of yourself, there’s more arbitrariness but you can still often find a way to use history to narrow down on coordination Schelling points (e.g. “what would Jesus do”).
And so, bringing these together, we get a notion of fractal empowerment: more capable agents empower less capable agents (in particular their ancestors) by helping them cultivate (coordination-theoretic) virtues. The ancestors then form the “core” of a society growing outwards towards increasingly advanced capabilities. The role of unaugmented humans would in some sense be similar to the role of “inner children” within healthy human psychology: young and dumb but still an entity which the rest of the organism cares for and empowers.
How would this ideology address value drift? I’ve been thinking a lot about the kind quoted in Morality is Scary. The way I would describe it now is that human morality is by default driven by a competitive status/signaling game, where often some random or historically contingent aspect of human value or motivation becomes the focal point of the game, and gets magnified/upweighted as a result of competitive dynamics, sometimes to an extreme, even absurd degree.
(Of course from the inside it doesn’t look absurd, but instead feels like moral progress. One example of this that I happened across recently is filial piety in China, which became more and more extreme over time, until someone cutting off a piece of their flesh to prepare a medicinal broth for an ailing parent was held up as a moral exemplar.)
Related to this is my realization is that the kind of philosophy you and I are familiar with (analytical philosophy, or more broadly careful/skeptical philosophy) doesn’t exist in most of the world and may only exist in Anglophone countries as a historical accident. There, about 10,000 practitioners exist who are funded but ignored by the rest of the population. To most of humanity, “philosophy” is exemplified by Confucius (morality is everyone faithfully playing their feudal roles) or Engels (communism, dialectical materialism). To us, this kind of “philosophy” is hand waving and make things up out of thin air, but to them, philosophy is learned from a young age and unquestioned. (Or if questioned, they’re liable to jump to some other equally hand-wavy “philosophy” like China’s move from Confucius to Engels.)
Empowering a group like this… are you sure that’s a good idea? Or perhaps you have some notion of “empowerment” in mind that takes these issues into account already and produces a good outcome anyway?
One of the main ways I think about empowerment is in terms of allowing better coordination between subagents.
In the case of an individual human, extreme morality can be seen as one subagent seizing control and overriding other subagents (like the ones who don’t want to chop off body parts).
In the case of a group, extreme morality can be seen in terms of preference cascades that go beyond what most (or even any) of the individuals involved with them would individually prefer.
In both cases, replacing fear-based motivation with less coercive/more cooperative interactions between subagents would go a long way towards reducing value drift.
I’m not sure that fear or coercion has much to do with it, because there’s often no internal conflict when someone is caught up in some extreme form of the morality game, they’re just going along with it wholeheartedly, thinking they’re just being a good person or helping to advance the arc of history. In the subagents frame, I would say that the subagents have an implicit contract/agreement that any one of them can seize control, if doing so seems good for the overall agent in terms of power or social status.
But quite possibly I’m not getting your point, in which case please explain more, or point to some specific parts of your articles that are especially relevant?
Belated reply, sorry, but I basically just think that this is false—analogous to a dictator who cites parades where people are forced to attend and cheer as evidence that his country lacks internal conflict. Instead, the internal conflict has just been rendered less legible.
Note that this is an extremely non-robust agent design! In particular, it allows subagents to gain arbitrary amounts of power simply by lying about their intentions. If you encounter an agent which considers itself to be structured like this, you should have a strong prior that it is deceiving itself about the presence of more subtle control mechanisms.
I think this is wholly incorrect line of thinking. UDT operates on your logical ancestor, not literal.
Say, if you know enough science, you know that normal distribution is a maxentropy distribution for fixed mean and variance, and therefore, optimal prior distribution under certain set of assumptions. You can ask yourself question “let’s suppose that I haven’t seen this evidence, what would be my prior probability?” and get an answer and cooperate with your counterfactual versions which have seen other versions of evidence. But you can’t cooperate with your hypothetical version which doesn’t know what normal distribution is, because, if it doesn’t know about normal distribution, it can’t predict how you would behave and account for this in cooperation.
Sufficiently different versions of yourself are just logically uncorrelated with you and there is no game-theoretic reason to account for them.
Seems odd to make an absolute statement here. More different versions of yourself are less and less correlated, but there’s still some correlation. And UDT should also be applicable to interactions with other people, who are typically different from you in a whole bunch of ways.
Absolute sense comes from absolute nature of taking actions, not absolute nature of logical correlation. I.e., in Prisoner’s Dilemma with payoffs (5,5)(10,1)(2,2) you should defect if your counterparty is capable to act conditional on your action in less than 75% of cases, which is quite high logical correlation, but expected value is higher if you defect.
2nd point is a scary one.
Empowering others in the relative sense is a terrible idea, unless they are trustworthy/virtuous. Same issue as AI risk
In the absolute terms sure
Yeah, this definitely needs to be a limited sorry of empowerment, in my mind. Like, imagine you wanted to give a 5 year old child the best day ever. You wanted to give them really fun options, but also not cause them to suffer from decision fatigue, or regret about the paths not taken. More importantly, if they asked for an alien ray gun with which to shoot bad guys, giving them an actual extremely dangerous weapon would be a terrible idea. Similarly, offering them a ride on a cool looking roller coaster that was actually a ‘death coaster’ would be a terrible trap.
(I’m inspired to write this comment on the notion of empowerment because of Richard Ngo’s recent comment in another post on Towards a scale-free theory of intelligent agency, so I’ll both respond to the empowerment motion and part of the comment linked below:
https://www.lesswrong.com/posts/5tYTKX4pNpiG4vzYg/towards-a-scale-free-theory-of-intelligent-agency#nigkBt47pLMi5tnGd):
To address this part of the linked comment:
This depends a lot on how the conflict started, and in particular, I don’t think that we should do something else if the conflict arose out of severe alignment failures of AIs/radically augmented people, since the generational contract/UDT/LDT/FDT cannot be used as a substitute for alignment (this was the takeaway I got from reading Nate Soares’s post on Decision theory does not imply that we get to have nice things, and while @ryan_greenblatt commented that it’s unlikely that alignment failures end up in us getting extinct without decision theory saving us, note that us being saved can still be really, really rough, and probably ends up with billions of present humans dying without everyone else dying (though I don’t think about decision theory much), so conflicts with future agents cannot always be avoided if we mess up hard enough on the alignment problems of the future).
https://www.lesswrong.com/posts/rP66bz34crvDudzcJ/decision-theory-does-not-imply-that-we-get-to-have-nice
Now I’ll address the fractal empowerment idea.
I like the idea of fractal empowerment, and IMO is one of my biggest ideals if we succeed at getting out of the risky state we are in, only rivaled by infra-Bayes Physicalism’s plan for alignment, which I currently think is called Physical Super-Imitation after the monotonicity principle managed to be removed, meaning way more preferences could be fit into than before:
https://www.lesswrong.com/posts/DobZ62XMdiPigii9H/non-monotonic-infra-bayesian-physicalism
https://www.lesswrong.com/posts/ZwshvqiqCvXPsZEct/the-learning-theoretic-agenda-status-2023#Physicalist_Superimitation
That said, I have a couple of problems with the idea.
One of those problems is that empowering humans in the moment can conflict a lot with empowerment in the long run, and unfortunately I’m less confident than you that disempowering humans in the short-term is not going to be necessary in order to empower humans in the long run.
In particular, we might already have this conflict in the near future with biological design tools allowing people to create their own pandemics/superviruses:
https://michaelnotebook.com/xriskbrief/index.html
Another problem is that the set of incentives that holds up democracy and makes it’s inefficiencies tolerable will absolutely be shredded by AGI, and the main incentive that goes away is you no longer need to consider the mass opinion on a lot of very important stuff, which fundamentally hurts democracy’s foundations, and importantly makes moderate redistribution not necessary for the economy to function, which means the elites won’t do it by default, combined with extreme redistribution being both unnecessary and counter-productive in industrial economies like ours, but unfortunately in the automation/intelligence age the only 2 sources of income are passive investment and whatever welfare you can get, so extreme redistribution is both less counter-productive (because it’s easier to confiscate the future sources of wealth) and more necessary for commoners to survive.
More below:
https://forum.effectivealtruism.org/posts/TMCWXTayji7gvRK9p/is-democracy-a-fad#Why_So_Much_Democracy__All_of_a_Sudden_
https://forum.effectivealtruism.org/posts/TMCWXTayji7gvRK9p/is-democracy-a-fad#Automation_and_Democracy
Finally, as @quetzal_rainbow has said, UDT works on your logical ancestor, not literal ancestors, and there needs to be some shared knowledge in order to coordinate, and thus the inter-temporal bargaining doesn’t really work out if you expect that current generations will have way less knowledge than future generations, which I expect to be the case.
Would be interested in a quick write-up of what you think are the most important virtues you’d want for AI systems, seems good in terms of having things to aim towards instead of just aiming away from.
I did not understand that. Is the worry that it’s hard to distinguish a genuine welfare maximizer from a predator because you can’t tell if they will ever give you back power? I don’t understand why this does not apply to agents pretending to pursue empowerment. It is common in conflicts to temporarily disempower someone to protect their long-term empowerment (e.g. a country mandatorily mobilizing for war against a fascist attacker, preventing a child from ignoring their homework), and it is also common to pretend to protect long-term empowerment and never give back power (e.g. a dictatorship of the proletariat never transitioning to a “true” communist economy).
Ah, yeah, I was a bit unclear here.
Basic idea is that by conquering someone you may not reduce their welfare very much short-term, but you do reduce their power a lot short-term. (E.g. the British conquered India with relatively little welfare impacts on most Indians.)
And so it is much harder to defend a conquest as altruistic in the sense of empowering, than it is to defend a conquest as altruistic in the sense of welfare-increasing.
As you say, this is not a perfect defense mechanism, because sometimes long-term empowerment and short-term empowerment conflict. But there are often strategies that are less disempowering short-term which the moral pressure of “altruism=empowerment” would push people towards. E.g. it would make it harder for people to set up the “dictatorship of the proletariat” in the first place.
And in general I think it’s actually pretty uncommon for temporary disempowerment to be necessary for long-term empowerment. Re your homework example, there’s a wide spectrum from the highly-empowering Taking Children Seriously to highly-disempowering Asian tiger parents, and I don’t think it’s a coincidence that tiger parenting often backfires. Similarly, mandatory mobilization disproportionately happens during wars fought for the wrong reasons.