Not sure I fully understand this comment, but I think it is similar to option 4 or 6?
Also you shouldn’t claim to put $Googolpex utility on anything until you’re at least seconds old.
Why is seconds the relevant unit of measure here?
Not sure I fully understand this comment, but I think it is similar to option 4 or 6?
Also you shouldn’t claim to put $Googolpex utility on anything until you’re at least seconds old.
Why is seconds the relevant unit of measure here?
Good point, this combines the iteratability justification for EV plus the fact that we have finite resources with which to bet. But doesn’t this break down if you are unsure how much wealth you have (particularly if the “wealth” being gambled is non-monetary, for example years of life)? Suppose the devil comes to you and says “if you take my bet you can live out your full lifespan, but there will be a 1 in 1 million chance I will send you to Hell at the end for 100 billion years. If you refuse, you will cease to exist right now.” Well, the wealth you are gambling with is years of life, but it’s unclear how many you have to gamble with. We could use whatever our expected number of years is (conditional on taking the bet) but of course, then we run back into the problem that our expectations can be dominated by tiny probabilities of extreme outcomes. This isn’t just a thought experiment since we all make gambles that may affect our lifespan, and yet we don’t know how long we would have lived by default.
Edit: realized that the devil example has the obvious flaw that as the expected default lifespan increases, so does the amount of years that you’re wagering, so you should always take the bet based on Kelly betting, but this point is more salient with less Pascalian lifespan-affecting gambles. I guess the question that remains is that the gamble is all or nothing, so what do we do if Kelly betting says we should wager 5% of our lifespan? Maybe the answer is: bet your life 5% of the time, or make gambles that will end your life with no more than 5% probability.
Oops, I was sleepy when I wrote this and used sloppy wording. Meant to say “what makes the marginal unit of value (e.g. happy lives, days of happiness, etc.) provide less and less utility.”
I think the last point can also apply in the positive direction or at least does not require weighting negative value more heavily.
The introspective assessment is what is most persuasive to me here because
(1) it seems like we need some reason for what makes the marginal unit of value (e.g. happy lives, days of happiness, etc.) provide less and less utility marginal unit of utility less and less valuable, independent of the fact that it lets us out of pascalianism and is logically necessary to have a bound somewhere.
(2) bounded utility functions can also lead to counterintuitive conclusions including violating ex ante pareto (Kosonen, 2022, Ch. 1) and falling prey to an Egyptology objection (Wilkinson, 2020, section 6; the post, “How do bounded utility functions work if you are uncertain how close to the bound your utility is?”). But the Egyptology objection may be less significant in practical cases where we are adding value at the margin and can see that we are getting less and less utility out of it, rather than the bound being something we have to think about in advance because we are considering some large amount of value which may hit the bound in one leap (but maybe this isn’t so crazy when thinking about AI). And also I guess money pumping is worse than these other conclusions.
(3) bounded utility functions do not seem necessary to avoid pascalianism nor the most obvious option. Someone could easily have an unbounded utility function with regards to sure bets but reject pascalian bets due to probability discounting (or to prevent exploitation in the literal mugging scenario as you mention). But other people bring this up often as a natural response to pascalianism, so I may be missing a reason why you would not want to e.g. value a sure chance of saving 1 billion lives 1,000 times more than saving 1,000,000 lives for sure, but not value a 0.000001 chance at saving 1 billion lives ~at all.
(4) your reasoning makes sense that things cannot get better and better without bound. For a given individual over finite time, it seems like there will be a point where you are just experiencing pleasure all the time / have all your preferences satisfied / have everything on your objective list checked off, and then if you increase utility via time or population, you run into the thing your prior post was about. But if we endorse hedonic utilitarianism, I wonder if this intuition of mine is just reifying the hedonic treadmill and neglecting ways utility may be unbounded, particularly in the negative direction.
Nice, just in time to inform my own Pascal’s wager post.
Good post. IIUC this applies only to interpersonal aggregation, and so if you can have unboundedly high utility in one individual, your utility function is not truly bounded, right? I.e., it would get you out of Pascal’s muggings of the form, “Pay me five dollars and I will create 3^^^3 happy copies of Alice” but not of the form “Pay me five dollars and I will create one copy of Alice and give her 3^^^3 utils.” (If it took this copy a very long time to experience 3^^^3 utils, something similar to your point would apply in that her experiences would start to overlap with her own and those of other beings, but suppose we can dial up intensity such that arbitrarily high amounts of value can be experienced in short periods of time.)
I agree that this is probably right in terms of mental health and social dynamics. Do you believe it is also right as a matter of actual morality? Do you agree with the drowning child analogy? Do you think it applies to broader social problems like AI or factory farming? If not, why?
Been telling LLMs to behave as if they and the user really like this Kierkegaard quote (to reduce sycophancy). Giving decent results so far.
If I understand your section “Avoid Being Seen As ‘Not Serious’” correctly—that the reason policymakers don’t want to support “wierd” policies is not because they’re concerned about their reputation but rather that they just are too busy to do something that probably won’t work—this seems like it should meaningfully change how many people outside of politics think about advocating for AI policy. It was outside my model anyway.
The question to me is, what, if anything, is the path to change if we don’t get a crisis before it is too late? Or, do we just have to place our chips on that scenario and wait for it to happen?
I agree that mass unemployment may spark policy change, but why do you see that change as being relevant to misalignment vs. specific to automation?
AGI timelines post-GPT-3 exhibit reverse Hofstadter’s law: AI advances quicker than predicted, even when taking into account reverse Hofstadter’s law.
https://x.com/wintonARK/status/1742979090725101983/photo/1
Makes sense. What probability do you place on this? It would require solving alignment, a second AI being created before the first can create a singleton, and then the misaligned AI choosing this kind of blackmail over other possible tactics. If the blackmail involves sentient simulations (as is sometimes suggested, although not in your comment), it would seem that the misaligned AI would have to solve the hard problem of consciousness and be able to prove this to the other AI (not a valid blackmail if the simulations are not known to be sentient).
Thanks for your comment.
The misuse risks seem much more important, both as real risks, and in their saliency to ordinary people.
I agree that it may be easier to persuade the general public about misuse risks and that these risks are likely to occur if we achieve intent alignment, but in terms of assessing the relative probability: “If we solve alignment” is a significant “if.” I take it you view solving intent alignment as not all that unlikely? If so, why? Specifically, how do you expect we will figure out how to prevent deceptive alignment and goal misgeneralization by the time we reach AGI?
Also, in the article you linked, you base your scenario on the assumption of a slow takeoff. Why do you expect this will be the case?
I don’t think we should adopt an ignorance prior over goals. Humans are going to try to assign goals to AGI. Those goals will very likely involve humans somehow.
Of course humans will try to assign human-related goals to AGI, but how likely is it that, if the AI is misaligned, the attempt to instill human-related goals will actually lead to consequences that involve conscious humans and not molecular smiley faces?
This is a comprehensive, nuanced, and well-written post. A few questions:
How likely do you think it is that, under a Harris administration, AI labs will successfully lobby Democrats to kill safety-oriented policies, as happened with SB 1047 on the state level? Even if Harris is on net better than Trump this could greatly reduce the expected value of her presidency from an x-risk perspective.
Related to the above, is it fair to say that under either party, there will need to be advocacy/lobbying for safety-focused policies on AI? If so, how do you make tradeoffs between this and the election? i.e. if someone has $x to donate, what percentage should they give to the election vs. other AI safety causes?
How much of your assessment of the difference in AI risk between Harris and Trump is due to the concrete AI policies you expect each of them to push, vs. how much is due to differences in competence and respect for democracy?
I can’t find much information about the Movement Labs quiz and how it helps Harris win. Could you elaborate, privately if needed? If the quiz is simply matching voters with the candidate who best matches their values, is it because it will be distributed to voters who lean Democrat, or does its effectiveness come through a different path?
Your reasoning makes sense with regards to how a more authoritarian government would make it more likely that we can avoid x-risk, but how do you weigh that against the possibility that an AGI that is intent-aligned (but willing to accept harmful commands) would be more likely to create s-risks in the hands of an authoritarian state, as the post author has alluded to?
Also, what do you make of the author’s comment below?
In general, the public seems pretty bought-in on AI risk being a real issue and is interested in regulation. Having democratic instincts would perhaps push in the direction of good regulation (though the relationship here seems a little less clear).
Great post. I agree with almost all of this. What I am uncertain about is the idea that AI existential risk is a rights violation under the most strict understanding of libertarianism.
As another commenter has suggested, we can’t claim that any externality creates rights to stop or punish a given behavior, or libertarianism turns into safetyism.[1] If we take the Non-Aggression Principle as a common standard for a hardline libertarian view of what harms give you a right to restitution or retaliation, it seems that x-risk does not fit this definition.
1.The most clear evidence seems to be that Murray Rothbard wrote the following:
“lt is important to insist [...] that the threat of aggression be palpable, immediate, and direct; in short, that it be embodied in the initiation of an overt act. [...] Once we bring in “threats” to person and property that are vague and future—i.e., are not overt and immediate—then all manner of tyranny becomes excusable.” (The Ethics of Liberty p. 78)
X-risk by its very nature falls into the category of “vague and future.”
2. To take your specific example of flying planes over someone’s house, a follower of Rothbard, Walter Block, has argued that this exact risk is not a violation of the non-aggression principle. He also states that risks from nuclear power are “legitimate under libertarian law.” (p. 295)[2] If we consider AI analogous to these two risks, it would seem Block would not agree that there is a right to seek compensation for x-risk.
3. Matt Zwolinski criticized the NAP for having an “all-or-nothing attitude toward risk” as it does not indicate what level of risk constitutes aggression. Another libertarian writer responded that a risk that constitutes a direct “threat” is aggression, (i.e. pointing a pistol at someone, even if this doesn’t result in the victim being shot) but risks of accidental damage are not aggression unlese these risks are imposed with threats of violence:
“If you don’t wish to assume the risk of driving, then don’t drive. And if you don’t want to run the risk of an airplane crashing into your house, then move to a safer location. (You don’t own the airspace used by planes, after all.)”
This implies to me that Zwolinski’s criticism is accurate with regards to accidents, which would rule out x-risk as a NAP violation.
This shows that at least some libertarians’ understanding of rights does not include x-risk as a violation. I consider this to be a point against their theory of rights, not an argument against pursuing AI safety. The most basic moral instinct suggests that creating a significant risk of destroying all of humanity and its light-cone is a violation of the rights of each member of humanity.[3]
While I think that not including AI x-risk (and other risks/accidental harms) in its definition of proscribable harms means that the NAP is too narrow, the question still stands as to where to draw the line as to what externalities or risks give victims a right to payment, and which do not. I’m curious where you draw the line.
It is possible that I am misunderstanding something about libertarianism or x-risk that contradicts the interpretation I have drawn here.
Anyway, thanks for articulating this proposal.
See also this argument by Alexander Volokh:
“Some people’s happiness depends on whether they live in a drug-free world, how income is distributed, or whether the Grand Canyon is developed. Given such moral or ideological tastes, any human activity can generate externalities […] Free expression, for instance, will inevitably offend some, but such offense generally does not justify regulation in the libertarian framework for any of several reasons: because there exists a natural right of free expression, because offense cannot be accurately measured and is easy to falsify, because private bargaining may be more effective inasmuch as such regulation may make government dangerously powerful, and because such regulation may improperly encourage future feelings of offense among citizens.”
Block argues that it would be wrong for individuals to own nuclear weapons, but he does not make clear why this is a meaningful distinction.
And any extraterrestrials in our light-cone, if they have rights. But that’s a whole other post.
Hi, I’m interested in attending but a bit unclear about the date and time based on how it is listed. Is the spring ACX meetup taking place starting at 2:30 p.m. on May 11?
Pascal’s reverse-mugging
One dark evening, Pascal is walking down the street, and a stranger slithers out of the shadows.
“Let me tell you something,” the stranger says. “There is a park on the route you walk every day, and in the park is an apple tree. The apples taste very good; I would say they have a value of $5, and no one will stop you from taking them. However—I am a matrix lord, and I have claimed these apples for myself. I will create and kill 3^^^3 people if you take any of these apples.”
On similar reasoning to that which leads most people to reject the standard Pascal’s mugging, it seems reasonable to ignore the apple-man’s warning and take an apple (provided that the effort involved in picking it is trivial, assuming that Pascal knows that the apples are safe and legal to pick, etc.). However, I suggest that it intuitively seems more reasonable for Pascal to avoid taking the apples than it does for him to pay the mugger $5. However, this constitutes an act-omission distinction. I raise 3 possibilities: Either
some of the counterintuitiveness of the idea that Pascal should pay the mugger derives from an act-omission distinction, which is not rational, and so we should reduce our incredulity at pascalianism accordingly. In other words, however crazy we think it would be for Pascal would be to not pick an apple, we should reduce our estimation of the craziness of giving in to a standard mugging to be no more significant than the former
Our intuitions about the apple-picking are less correct than those about the standard mugging, and we should raise our estimation of the craziness of avoiding taking the apple to match that of paying the mugger.
An act-omission distinction is valid, and it is indeed more reasonable to refrain from picking the apple than it is to pay the mugger.
There was never an intuitive difference between the two situations in the first place.
Some fifth thing.