Chantiel

Karma: 88

Chantiel 4 May 2022 4:54 UTC
1 point
0
in reply to: MackGopherSena’s comment on: Chantiel’s Shortform

It doesn’t matter how many fake versions of you hold the wrong conclusion about their own ontological status, since those fake beliefs exist in fake versions of you. The moral harm caused by a single real Chantiel thinking they’re not real is infinitely greater than infinitely many non-real Chantiels thinking they are real.

Interesting. When you say “fake” versions of myself, do you mean simulations? If so, I’m having a hard time seeing how that could be true. Specifically, what’s wrong about me thinking I might not be “real”? I mean, if I though I was in a simulation, I think I’d do pretty much the same things I would do if I thought I wasn’t in a simulation. So I’m not sure what the moral harm is.

Do you have any links to previous discussions about this?

Chantiel 3 May 2022 5:36 UTC
1 point
0
in reply to: MackGopherSena’s comment on: Chantiel’s Shortform

“If the real Chantiel is so correlated with you that they will do what you will do, then you should believe you’re real so that the real Chantiel will believe they are real, too. This holds even if you aren’t real.”

By “real”, do you mean non-simulated? Are you saying that even if 99% of Chantiels in the universe are in simulations, then I should still believe I’m not in one? I don’t know how I could convince myself of being “real” if 99% of Chantiels aren’t.

Do you perhaps mean I should act as if I were non-simulated, rather than literally being non-simulated?

Chantiel 3 May 2022 5:31 UTC
1 point
0
in reply to: gwern’s comment on: Chantiel’s Shortform
Thanks for the response, Gwern.

he is explicit that the minds in the simulation may be only tenuously related to ‘real’/historical minds;

Oh, I guess I missed this. Do you know where Bostrom said the “simulations” can only tenuously related to real minds? I was rereading the paper but didn’t see mention of this. I’m just surprised, because normally I don’t think zoo-like things would be considered simulations.

This falls under either #1 or #2, since you don’t say what human capabilities are in the zoo or explain how exactly this zoo situation matters to running simulations; do we go extinct at some time long in the future when our zookeepers stop keeping us alive (and “go extinct before reaching a “posthuman” stage”), having never become powerful zookeeper-level civs ourselves, or are we not permitted to (“extremely unlikely to run a significant number of simulations”)?

In case I didn’t make it clear, I’m saying that even if a significant proportion of civilization reach a post-human stage and a significant proportion of these run simulations, there would still potentially be a non-small chance of actually not being in a simulation an instead being in a game or zoo. For example, suppose each post-human civilization makes 100 proper simulations and 100 zoos. Then even if parts 1 and 2 of the simulation argument are true, you still have a 50% chance of ending up in a zoo.

Does this make sense?

Chantiel 30 Apr 2022 21:56 UTC
2 points
0
on: Chantiel’s Shortform
I’ve realized I’m somewhat skeptical of the simulation argument.

The simulation argument proposed by Bostrom argued, roughly, that either almost exactly all Earth-like worlds don’t reach a posthuman level, almost exactly all such civilizations don’t go on to build many simulations, or that we’re almost certainly in a simulation.

Now, if we knew that the only two sorts of creatures that experience what we experience are either in simulations or the actual, original, non-simulated Earth, then I can see why the argument would be reasonable. However, I don’t know how we could know this.

For example, consider zoos: Perhaps advanced aliens create “zoos” featuring humans in an Earth-like world, for their own entertainment or other purposes. These wouldn’t necessarily be simulations of any actual other planet, but might merely have been inspired by actual planets. Similarly, lions in the zoo are similar to lions in the wild, and their enclosure features plants and other environmental feature similar to what they would experience in the wild. But I wouldn’t call lions in zoos simulations of wild lions, even if the developed parts where humans could view them was completely invisible to them and their enclosure was arbitrarily large.

Similarly, consider games: Perhaps aliens create games or something like them set in Earth-like worlds that aren’t actually intended to be simulations of any particle world. Similarly, human fantasy RPGs often have a medieval theme, so maybe aliens would create games set in a modern-Earth-like world, without having in mind any actual planet to simulate.

Now, you could argue that in an infinite universe, these things are all actually simulations, because there must be some actual, non-simulated world that’s just like the “zoo” or game. However, by that reasoning, you could argue that a rock you pick up is nothing but a “rock simulation” because you know there is at least one other rock in the universe with the exact same configuration and environment as the rock you’re holding. That doesn’t seem right to me.

Similarly, you could say, then, that I’m actually in a simulation right now. Because even if I’m in the original Earth, there is some other Chantiel in the universe in a situation identical to my current one, who is logically constrained to do the same thing I do, so thus I am a simulation of her. And my environment is thus a simulation of hers.

Chantiel 21 Apr 2022 19:35 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: Chantiel’s Shortform

For robustness, you have a dataset that’s drawn from the wrong distribution, and you need to act in a way that you would’ve acted if it was drawn from the correct distribution. If you have an amplification dynamic that moves models towards few attractors, then changing the starting point (training distribution compared to target distribution) probably won’t matter. At that point the issue is for the attractor to be useful with respect to all those starting distributions/models. This doesn’t automatically make sense, comparing models by usefulness doesn’t fall out of the other concepts.

Interesting. Do you have any links discussing this? I read Paul Christiano’s post on reliability amplification, but couldn’t find mention of this. And, alas, I’m having trouble finding other relevant articles online.

Amplification induces a dynamic in the model space, it’s a concept of improving models (or equivalently in this context, distributions). This can be useful when you don’t have good datasets, in various ways. Also it ignores independence when talking about recomputing things

Yes, that’s true. I’m not claiming that iterated amplification doesn’t have advantages. What I’m wondering is if non-iterated amplification is a viable alternative. I haven’t seen non-iterated amplification proposed before for creating algorithm AI. Amplification without iteration has the disadvantage that it may not have the attractor dynamic iterated amplification has, but it also doesn’t have the exponentially increasing unreliability iterated amplification has. So, to me at least, it’s not clear to me if pursuing iterated amplification is a more promising strategy than amplification without iteration.

Chantiel 31 Mar 2022 8:53 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: Chantiel’s Shortform
I’ve been thinking about what you’ve said about iterated amplification, and there are some things I’m unsure of. I’m still rather skeptical of the benefit of iterated amplification, so I’d really appreciate a response.

You mentioned that iterated amplification can be useful when you have only very limited, domain-specific models of human behavior, where such models would be unable to come up with the ability to create code. However, there are two things I’m wondering about. The first is that it seems to me that, for a wide range of situations, you need a general and robustly accurate model of human behavior to perform well. The second is that, even if you don’t have a general model of human behavior, it seems to me that it’s sufficient to only have one amplification step, which I suppose isn’t iterated amplification. And the big benefit to avoiding iterated amplification is that iterated amplification results in exponential decreases in reliability from compounding errors on each distillation step, but with a single amplification step, this exponential decrease in reliability wouldn’t occur.

For the first topic, suppose your AI is trained to make movies. I think just about every human value is relevant to the creation of movies, because humans usually like movies with a happy ending, and to make an ending happy you need to understand what humans consider a “happy ending”.

Further, you would need an accurate model of human cognitive capabilities. To make a good movie, it needs to be easy enough for humans to understand. But sometimes it also shouldn’t be too easy, because that can remove the mystery of it.

And the above is not just true for movies: I think creating other forms of entertainment would involve the same things as above.

Could you do the above with only some domain-limited model of what counts as confusing or a good or bad ending in the context of movies? It’s not clear to me that this is possible. Movies involve a very wide variety of situations, and you need to keep things understandable and resulting in a happy ending in all of those circumstances. I don’t see how could you robustly do the above without a general model of what people people find confusing or otherwise bad.

Further, whenever an AI needs to explain something to humans, it seems to me that it’s important that it has an accurate model of what humans can understand and not understand. Is there any way to do this with purely domain-specific models rather than with a general understanding of what people find confusing? It’s not clear to me that this is possible. For example, imagine an AI that needs to explain many different things. Maybe it’s tasked with creating learning materials or making the news. With such a broad category of things the AI needs to explain, it’s really not clear to me how an AI could do this without a general model of what makes things confusing or not.

Also more generally, it seems to me that whenever the AI is involved with human interaction in novel circumstances, it will need an accurate model of what people like and dislike. For example, consider an AI tasked with coming up with a plan for human workers. Doing so has the potential to involve an extremely wide range of values. For example, humans generally value novelty, autonomy, not feeling embarrassed, not being bored, not being overly pressured, not feeling offended, and not seeing disgusting or ugly things.

Could you have an AI learn to avoid things things with only domain-specific models, rather than a general understanding of what people value and disvalue? I’m not sure how to do this. Maybe you could learn models that work for reflecting people’s values in limited circumstances. However, I think an essential component of intelligence is to come up with novel plans involving novel situations. And I don’t see how an agent could do this without a general understanding of values. For example, the AI might create entire new industries, and it would be important that any human workers in those industries would have satisfactory conditions.

Now, for the second topic: using amplification without iteration.

First off, I want to note that, even without a general model of humans, it’s still not really clear to me that you need any amplification at all. As I’ve said before, even mere human imitation the potential to result in extremely high intelligence simply by doing the same things humans do, but much faster. As I mentioned previously, consider the human output to be published research papers from top researchers, and the AI is tasked with mimicking it. Then the AI could take the research papers as the human output and use this to create future papers but far far faster.

But suppose you do still need amplification. Then I don’t see why one amplification step wouldn’t be enough. I think that if you put together a sufficiently large number of intelligent humans and give them unlimited time to think, they’d be able to solve pretty much anything that iterated amplification with HCH would be able to solve. So, instead of having multiple amplification and distillation steps, you could instead just have one very large amplification step that would involve a large enough number of humans models interacting that it could solve pretty much anything.

If the amplification step involve a sufficiently large number of people, you might be concerned that it would be intractable to emulate them all.

I’m not sure if this would be a problem. Consider again the AI designed to mimic the research papers of top researchers. I think that often a small number of top researchers are responsible for a large proportion of research progress, so the AI could potentially just see that output of the top, say, 100 or 1000 researchers working together would be. And the AI would potentially be able to produce the outputs of each researcher with far less computation. That sounds plausibly like enough to me.

But suppose that’s not enough, and emulating every human individually during the amplification step is intractable. Then here’s how I think you can get around this: train not only a human model, but also a system of approximating the output of an expensive computation with much lower computational cost. Then, for the amplification step, you can define an computing involving an extremely large number of interacting emulated humans, and then allow the approximation system to come up with approximations to this without needing to directly emulate every human.

To give a sense of how this might work, note that in a computation, often a small amount of the parts of the computation account for a large part of the output. For example, if you are trying to approximate a computation about gravity, commonly only the closest, most massive objects have significant gravitational effect on something, and you can ignore the rest. Similarly, rather than simulate individual atoms, it’s much more efficient to come up with groups of large number of atoms, and consider their effect as a group. The same is true for other computations involving many small components.

To emulate humans, you could potentially do the same things as you would when simulating gravity. Specifically, an AI may be able to consider groups of humans and infer what the final output of that group will be, without actually needing to emulate each one individually. Further, for very challenging topics, many people may fail to contribute anything to the final result, so the could potentially avoid emulating them at all.

So I still can’t really see the benefit of iterated amplification. Of course, I could be missing something, so I’m interesting in hearing what you think.

One potential problem is that it might be hard to come up with good training data for an arbitrary-function-approximator, since finding the exact output of expensive functions would be expensive. However, it’s not clear to me how big of a problem this would be. As I’ve said before, even the output of a 100 or 1000 humans interacting could potentially be all the AI ever needs, and with sufficient fast approximations of individual humans, this could be tractable to create training data for.

Further, I bet the AI could learn a lot about arbitrary-function approximation just by training on approximating functions that are already reasonably fast the compute. I think the basic techniques to quickly approximating functions are what I mentioned before: come up with abstract objects that involve groups of individual components, and know when to stop performing the computation on a certain object because it’s clear it will have little effect on the final result.

Chantiel 29 Mar 2022 10:50 UTC
1 point
0
in reply to: Charlie Steiner’s comment on: Alignment via manually implementing the utility function
I hadn’t fully appreciated to difficultly that could result from AIs having alien concepts, so thanks for bringing it up.

However, it seems to me that this would not be a big problem, provided the AI is still interpretable. I’ll provide two ways to handle this.

For one, you could potentially translate the human concepts you care about into statements using the AI’s concepts. Even if the AI doesn’t use the same concepts people do, AIs are still incentivized to form a detailed model of the world. If you can have access to all the AI’s world model, but still can’t figure out basic things like if the model means the world gets destroyed or the AI takes over the world, then that model doesn’t seem very interperable. So I’m skeptical that this would really be a problem.

But, if it is, it seems to me that there’s a way to get the AI to have non-alien concepts.

In a comment with another person, made a modification to the system by saying that the people outputting utilities should be able to refuse to output one in a given query, for example because the situation is too complicated or to vague for humans to understand that desirability of. This could potentially allow for people to avoid having the AI from having very aliens concepts.

To deal with alien concepts, you can just have the people refuse to provide an answer to the utility of a possible for description if the description is described. This way, the AI would need to come up with sufficiently non-alien concepts before it can understand the utility of things. The AI would have to come up with reasonably non-alien concepts in order to get any of its calls to its utility function to work.

Chantiel 23 Mar 2022 14:07 UTC
1 point
0
in reply to: ViktoriaMalyasova’s comment on: Alignment via manually implementing the utility function

Another problem is that the system cannot represent and communicate the whole predicted future history of the universe to us.

This is a good point and one that I, foolishly, hadn’t considered.

However, it seems to me that there is a way to get around this. Specifically, just provide the query-answerers the option to refuse to evaluate the utility of a description of a possible future. If this happens, the AI won’t be able to have its utility function return a value for such a possible future.

To see how to do this, note that if a description of a possible future world is too large for the human to understand, then the human can refuse to provide a utility for it.

Similarly, if the description of the future doesn’t specify the future with sufficient detail that the person can clearly tell if the described outcome would be good, then the person can also refuse to return a value.

For example, suppose you are making an AI designed to make paperclips. And suppose the AI queries the person asking for the utility of the possible future described by, “The AI makes a ton of paperclips”. Then the person could refuse to answer, because the description is insufficient to specify the quality of the outcome, for example, because it doesn’t say whether or not Earth got destroyed.

Instead, a possible future would only be rated as high utility if it says something like,”The AI makes a ton of paperclips, and the world isn’t destroyed, and the AI doesn’t take over the world, and no creatures get tortured anywhere in our Hubble sphere, and creatures in the universe are generally satisfied”.

Does this make sense?

I, of course, could always be missing something.

(Sorry for the late response)

Chantiel 23 Mar 2022 11:23 UTC
1 point
0
in reply to: Charlie Steiner’s comment on: Alignment via manually implementing the utility function
Sorry for taking a ridiculously long time to get back to you. I was dealing with some stuff.

This works great when you can recognize good things within the represention the AI uses to think about the world. But what if that’s not true?

Yes, that is correct. As I said in the article, a high degree of interpretability is necessary to use the idea.

It’s true that interpretability is required, but the key point of my scheme is this: interpretability is all you need for intent alignment, provided my scheme is correct. I don’t know of any other alignment strategies for which which this is the case. So, my scheme, if correct, basically allows you to bypass what is plausibly the hardest part of AI safety: robust value-loading.

I know of course that I could be wrong about this, but if the technique is correct, it seems like a quite promising AI safety technique to me.

Does this seem reasonable? I may very well be just be misunderstanding or missing something.

Chantiel 26 Nov 2021 22:04 UTC
15 points
0
on: Chantiel’s Shortform
I’ve made a few posts that seemed to contain potentially valuable ideas related to AI safety. However, I got almost no feedback on them, so I was hoping some people could look at them and tell me what they think. They still seem valid to me, and if they are, they could potentially be very valuable contributions. And if they aren’t valid, then I think knowing the reason for this could potentially help me a lot in my future efforts towards contributing to AI safety.

The posts are:

Chantiel 22 Nov 2021 23:21 UTC
2 points
0
in reply to: conchis’s comment on: A system of infinite ethics
FWIW, this conclusion is not clear to me. To return to one of my original points: I don’t think you can dodge this objection by arguing from potentially idiosyncratic preferences, even perfectly reasonable ones; rather, you need it to be the case that no rational agent could have different preferences. Either that, or you need to be willing to override otherwise rational individual preferences when making interpersonal tradeoffs.

Yes, that’s correct. It’s possible that there are some agents with consistent preferences that really would wish to get extraordinarily uncomfortable to avoid the torture. My point was just that this doesn’t seem like it would would be a common thing for agents to want.

Still, it is conceivable that there are at least a few agents out their that would consistently want to opt for the 0.5 chance of being extremely uncomfortable option, and I do suppose it would be best to respect their wishes. This is a problem that I hadn’t previously fully appreciated, so I would like to thank you for brining it up.

Luckily, I think I’ve finally figured out a way to adapt my ethical system to deal with this. That is, the adaptation will allow for agents to choose the extreme-discomfort-from-dust-specks option if that is what they wish for my my ethical system to respect their preferences. To do this, allow for the measure to satisfaction to include infinitesimals. Then, to respect the preferences of such agents, you just need need to pick the right satisfaction measure.

Consider the agent that for which each 50 years of torture causes a linear decrease in their utility function. For simplicity, imagine torture and discomfort are the only things the agent cares about; they have no other preferences; also assume that the agent dislike torture more than it dislikes discomfort, but only be a finite amount. Since the agent’s utility function/satisfaction measure is linear, I suppose being tortured for an eternity would be infinitely worse for the agent than being tortured for a finite amount of time. So, assign satisfaction 0 to the scenario in which the agent is tortured for eternity. And if the agent is instead tortured for $n \in R$ years, let the agent’s satisfaction be $1 - n ϵ$ , where $ϵ$ is whatever infinitesimal number you want. If my understanding of infinitesimals is correct, I think this will do what we want it to do in terms having agents using my ethical system respect the agent’s preferences.

Specifically, since being tortured forever would be infinitely worse than being tortured for a finite amount of time, any finite amount of torture would be accepted to decrease the chance of infinite torture. And this is what maximizing this satisfaction measure does: for any lottery, changing the chance of infinite torture has finite affect on expected satisfaction, whereas changing the chance of finite torture only has infinitesimal effect, so so avoiding infinite torture would be prioritized.

Further, among lotteries involving finite amounts of torture, it seems the ethical system using this satisfaction measure continues to do what what it’s supposed to do. For example, consider the choice between the previous two options:
1. A 0.5 chance of being tortured for 3^^^^3 years and a 0.5 chance of being fine.
2. A 0.5 chance of 3^^^^3 − 9999999 years of torture and 0.5 chance of being extraordinarily uncomfortable.
If I’m using my infinitesimal math right, the expected satisfaction of taking option 1 would be $(0.5 * 3 ↑ ↑ ↑ ↑ 3 ϵ + 0.5 * ϵ)$ , and the expected satisfaction of taking option 2 would be $0.5 * (3 ↑ ↑ ↑ ↑ 3 - 9999999) ϵ * 0.5 * m ϵ$ , for some $m << 3 ↑ ↑ ↑ ↑ 3$ . Thus, to maximize this agent’s satisfaction measure, my moral system would indeed let the agent give infinite priority to avoiding infinite torture, the ethical system would itself consider the agent to get infinite torture infinitely-worse than getting finite torture, and would treat finite amounts of torture as decreasing satisfaction in a linear manner. And, since the utility measure is still technically bounded, it would still avoid the problem with utility monsters.

(In case it was unclear, $↑$ is Knuth’s up-arrow notion, just like “^”.)

Chantiel 22 Nov 2021 22:13 UTC
1 point
0
in reply to: Carmex’s comment on: Chantiel’s Shortform
If the impact measure was poorly implemented, then I think such an impact-reducing AI could indeed result in the world turning out that way. However, note that the technique in the paper is intended to, for a very wide range of variables, make the world if the AI wasn’t turned on as similar as possible to what it would be like if it was turned on. So, you can potentially avoid the AI-controlled-drone scenario by including the variable “number of AI-controlled drones in the world” or something correlated with it, as these variables could be have quite different values between a possible world in which the AI was turned on and a possible world in which the AI wasn’t.

Coming up with a set of variables wide enough to include that might seem a little difficult, but I’m not sure it would be. One option is to, for every definable function of the world, include the value of the function as one of the variables the AI considers and tries to avoid interfering with.

Chantiel 19 Nov 2021 22:37 UTC
1 point
0
on: Chantiel’s Shortform
I have some concerns about an impact measure proposed here. I’m interested on working on impact measures, and these seem like very serious concerns to me, so it would be helpful seeing what others think about them. I asked Stuart, one of the authors, about these concerns, but he said it was too busy to work on dealing with them.

First, I’ll give a basic description of the impact measure. Have your AI be turned on from some sort of stochastic process that may or may not result in the AI being turned on. For example, consider sending a photo through a semi-silvered mirror. Let X be the statement, “The AI was turned on”. For a wide range of variables V, try to make the probability distribution $P (V_{i} | X)$ as similar as possible to the probability distribution $P (V_{i} | \neg X)$ .

Here are my concerns:

It’s not clear to me that we can find any verifiably-stochastic process. Some interpretations of quantum physics say that there are some latent variables that determine the outcome of seemingly stochastic processes. In this case, just knowing the AI was or was not turned on may have thus implications on latent variables that possibly have large effects on the world.

Further, I can’t find a way to define X and not-X that works, even if you find a truly stochastic process. You could try to specify X as, “the photon makes it through the semi-silvered mirror” and specify not-X as the negation of this. But conditioning on this and nothing else is insufficient. To see why, not that X and not-X don’t specify anything about the mechanics about how the photon and mirror interact. For example, X, above doesn’t rule out universes in which there is no quantum physics as we know it, and the mirror and photon interact in some completely different way that makes the photon more likely to pass through than it would in our universe. Thus, conditioning on X would make it more likely the AI would be in that universe, which could have large effects on the values of variables.

You could try to get around this by also including in X and not-X mechanics describing how the photon and semi-silvered mirror interact. The issue is, we don’t actually know exactly how the photon and semi-silvered mirror interact. We have models of quantum physics that have extremely high predictive accuracy in practice, but we still don’t have a theory of everything. So we could include in X and not-X our current understanding of quantum physics, but that understanding might be somewhat wrong.

This could potentially be dangerous. Suppose the AI has invented some technology that, according to the actual, correct theory of everything, would destroy the world. But according to our slightly wrong model of quantum physics, it would do nothing in particular. Then the AI, having X and not-X condition on our current model of quantum physics, would see no problem with creating the technology.

Further still, I still have the concern about what to do if the AI in fact doesn’t turn on. You could decide to just turn on some AI identical to it, and keep trying turning on AIs until you get an one to turn on. But then the world would pretty much be identical whether X or not-X turns out to be true, because the other AI would do the same thing it would have.

You could instead turn on some AI that uses the same reduced-impact technique, but is different from the current AI in some way that significantly effects the world. However, if you do this, then I think we might arrive in a very bad equilibrium in the AIs’ behavior. Specifically, suppose all AIs decide to destroy the world if they’re turned on. Then that means that if one of the AIs isn’t turned on, some other AI will be turned on, which would also lead to the world being destroyed. If all the destroyed worlds look pretty much the same, then the AI deciding to destroy the world would have low impact according to your definition.

So suppose instead that if the AI doesn’t turn on, we rely on some other AI safety technique. Suppose this AI safety technique successfully makes safe AI. In this case, we had no need to create the reduced-impact AI you described, because the other AI design would have worked fine. Suppose the alternative AI safety technique in fact doesn’t work, and the result of using it is that our world gets destroyed. In that case, the conditioning on not-X would imply the world would be destroyed. So if an AI tries to make the world conditioning on X similar to it conditioning on not-X, then it would again act to destroy the world.

Luke responded to one of my points, saying, “Yes. [what to do if the AI doesn’t get turned on] needs to be handled with care. A time-delay might be sufficient for this, though. My response was:

My concern with time-delays is that the AI might make some sort of time-bomb-esque device. For example, suppose the AI knows that if it wasn’t turned on, another AI like it would have been turned on three months later. Then both AIs could decide to make a “time bomb” set to go off four months after the first AI was scheduled to potentially turn on. And when it goes off, the “time bomb” might destroy the world, convert it into a mass of AI-controlled nanotechnology, or do some other bad thing. This way, neither AI would actually change the world relative to if it was never turned on.
What links here?
- Chantiel's comment on Chantiel’s Shortform by Chantiel (26 Nov 2021 22:04 UTC; 15 points)

Chantiel 19 Nov 2021 19:51 UTC
1 point
0
in reply to: conchis’s comment on: A system of infinite ethics

I think this framing muddies the intuition pump by introducing sadistic preferences, rather than focusing just on unboundedness below. I don’t think it’s necessary to do this: unboundedness below means there’s a sense in which everyone is a potential “negative utility monster” if you torture them long enough. I think the core issue here is whether there’s some point at which we just stop caring, or whether that’s morally repugnant.

Fair enough. So I’ll provide a non-sadistic scenario. Consider again the scenario I previously described in which you have a 0.5 chance of being tortured for 3^^^^3 years, but also have the repeated opportunity to cause yourself minor discomfort in the case of not being tortured and as a result get your possible torture sentence reduced by 50 years.

If you have an unbounded below utility function in which each 50 years causes a linear decrease in satisfaction or utility, then to maximize expected utility or life satisfaction, it seems you would need to opt for living in extreme discomfort in the non-torture scenario to decrease your possible torture time be an astronomically small proportion, provided the expectations are defined.

To me, at least, it seems clear that you should not take the opportunities to reduce your torture sentence. After all, if you repeatedly decide to take them, you will end up with a 0.5 chance of being highly uncomfortable and a 0.5 chance of being tortured for 3^^^^3 years. This seems like a really bad lottery, and worse than the one that lets me have a 0.5 chance of having an okay life.

Sorry, sloppy wording on my part. The question should have been “does this actually prevent us having a consistent preference ordering over gambles over universes” (even if we are not able to represent those preferences as maximising the expectation of a real-valued social welfare function)? We know (from lexicographic preferences) that “no-real-valued-utility-function-we-are-maximising-expectations-of” does not immediately imply “no-consistent-preference-ordering” (if we’re willing to accept orderings that violate continuity). So pointing to undefined expectations doesn’t seem to immediately rule out consistent choice.

Oh, I see. And yes, you can have consistent preference orderings that aren’t represented as a utility function. And such techniques have been proposed before in infinite ethics. For example, one of Bostrom’s proposals to deal with infinite ethics is the extended decision rule. Essentially, it says to first look at the set of actions you could take that would maximize P(infinite good) - P(infinite bad). If there is only one such action, take it. Otherwise, take whatever action among these that has highest expected moral value given a finite universe.

As far as I know, you can’t represent the above as a utility function, despite it being consistent.

However, the big problem with the above decision rule is that it suffers from the fanaticism problem: people would be willing to bear any finite cost, even 3^^^3 years of torture, to have even an unfathomably small chance of increasing the probability of infinite good or decreasing the probability of infinite bad. And this can get to pretty ridiculous levels. For example, suppose you are sure you can easily design a world that makes every creature happy and greatly increases the moral value of the world in a finite universe if implemented. However, you know that coming up with such a design would take one second of computation on your supercomputer, which means one less second to keep thinking about astronomically-improbable situations in which you could cause infinite good. Thus would have some minuscule chance of avoiding infinite good or causing infinite bad. Thus, you decide to not help anyone, because you won’t spare the one second of computer time.

More generally, I think the basic property of non-real-valued consistent preference orderings is that they value some things “infinitely more” than others. The issue is, if you really value some property infinitely more than some other property of lesser importance, it won’t be worth your time to even consider pursuing the property of lesser importance, because it’s always possible you could have used the extra computation to slightly increase your chances of getting the property of greater importance.

Chantiel 19 Nov 2021 19:50 UTC
2 points
0
in reply to: conchis’s comment on: A system of infinite ethics

I think this framing muddies the intuition pump by introducing sadistic preferences, rather than focusing just on unboundedness below. I don’t think it’s necessary to do this: unboundedness below means there’s a sense in which everyone is a potential “negative utility monster” if you torture them long enough. I think the core issue here is whether there’s some point at which we just stop caring, or whether that’s morally repugnant.

Fair enough. So I’ll provide a non-sadistic scenario. Consider again the scenario I previously described in which you have a 0.5 chance of being tortured for 3^^^^3 years, but also have the repeated opportunity to cause yourself minor discomfort in the case of not being tortured and as a result get your possible torture sentence reduced by 50 years.

If you have an unbounded below utility function in which each 50 years causes a linear decrease in satisfaction or utility, then to maximize expected utility or life satisfaction, it seems you would need to opt for living in extreme discomfort in the non-torture scenario to decrease your possible torture time be an astronomically small proportion, provided the expectations are defined.

To me, at least, it seems clear that you should not take the opportunities to reduce your torture sentence. After all, if you repeatedly decide to take them, you will end up with a 0.5 chance of being highly uncomfortable and a 0.5 chance of being tortured for 3^^^^3 years. This seems like a really bad lottery, and worse than the one that lets me have a 0.5 chance of having an okay life.

Sorry, sloppy wording on my part. The question should have been “does this actually prevent us having a consistent preference ordering over gambles over universes” (even if we are not able to represent those preferences as maximising the expectation of a real-valued social welfare function)? We know (from lexicographic preferences) that “no-real-valued-utility-function-we-are-maximising-expectations-of” does not immediately imply “no-consistent-preference-ordering” (if we’re willing to accept orderings that violate continuity). So pointing to undefined expectations doesn’t seem to immediately rule out consistent choice.

Oh, I see. And yes, you can have consistent preference orderings that aren’t represented as a utility function. And such techniques have been proposed before in infinite ethics. For example, one of Bostrom’s proposals to deal with infinite ethics is the extended decision rule. Essentially, it says to first look at the set of actions you could take that would maximize P(infinite good) - P(infinite bad). If there is only one such action, take it. Otherwise, take whatever action among these that has highest expected moral value given a finite universe.

As far as I know, you can’t represent the above as a utility function, despite it being consistent.

However, the big problem with the above decision rule is that it suffers from the fanaticism problem: people would be willing to bear any finite cost, even 3^^^3 years of torture, to have even an unfathomably small chance of increasing the probability of infinite good or decreasing the probability of infinite bad. And this can get to pretty ridiculous levels. For example, suppose you are sure you can easily design a world that makes every creature happy and greatly increases the moral value of the world in a finite universe if implemented. However, you know that coming up with such a design would take one second of computation on your supercomputer, which means one less second to keep thinking about astronomically-improbable situations in which you could cause infinite good. Thus would have some minuscule chance of avoiding infinite good or causing infinite bad. Thus, you decide to not help anyone, because you won’t spare the 1 second of computer time.

More generally, I think the basic property of non-real-valued consistent preference orderings is that they value some things “infinitely more” than others. The issue is, if you really value some property infinitely more than some other property of lesser importance, it won’t be worth your time to even consider pursuing the property of lesser importance, because it’s always possible you could have used the extra computation to slightly increase your chances of getting the property of greater importance.

Chantiel 16 Nov 2021 0:41 UTC
1 point
0
in reply to: conchis’s comment on: A system of infinite ethics
Also, in addition to my previous response, I want to note that the issues with unbounded satisfaction measures are not unique to my infinite ethical system. Instead, they are common potential problems with a wide variety of aggregate consequentialist theories.

For example, imagine suppose your a classical utilitarianism with an unbounded utility measure per person. And suppose you know that the universe is finite will consist of a single inhabitant with a utility whose probability distributions follows a Cauchy distribution. Then your expected utilities are undefined, despite the universe being knowably finite.

Similarly, imagine if you again used classical utilitarianism but instead you have a finite universe with one utility monster and 3^^^3 regular people. Then, if your expected utilities are defined, you would need to give the utility monster what it wants, to the expense of of everyone else.

So, I don’t think your concern about keeping utility functions bounded is unwarranted; I’m just noting that they are part of a broader issue with aggregate consequentialism, not just with my ethical system.

Chantiel 15 Nov 2021 23:12 UTC
1 point
0
in reply to: conchis’s comment on: A system of infinite ethics

Thanks. I’ve toyed with similar ideas perviously myself. The advantage, if this sort of thing works, is that it conveniently avoids a major issue with preference-based measures: that they’re not unique and therefore incomparable across individuals. However, this method seems fragile in relying on a finite number of scenarios: doesn’t it break if it’s possible to imagine something worse than whatever the currently worst scenario is? (E.g. just keep adding 50 more years of torture.) While this might be a reasonable approximation in some circumstances, it doesn’t seem like a fully coherent solution to me.

As I said, you can allow for infinitely-many scenarios if you want; you just need to make it so the supremum of them their value is 1 and the infimum is 0. That is, imagine there’s an infinite sequence of scenarios you can come up with, each of which is worse than the last. Then just require that the infimum of the satisfaction of those sequences is 0. That way, as you consider worse and worse scenarios, the satisfaction continues to decrease, but never gets below 0.

IMO, the problem highlighted by the utility monster objection is fundamentally a prioritiarian one. A transformation that guarantees boundedness above seems capable of resolving this, without requiring boundedness below (and thus avoiding the problematic consequences that boundedness below introduces).

One issue with only having boundedness above is that is that the expected of life satisfaction for an arbitrary agent would probably often be undefined or $- \infty$ in expectation. For example, consider if an agent had a probability distribution like a Cauchy distribution, except that it assigns probability 0 to anything about the maximize level of satisfaction, and is then renormalized to have probabilities sum to 1. If I’m doing my calculus right, the resulting probability distribution’s expected value doesn’t converge. You could either interpret this as the expected utility being undefined or being $- \infty$ , since the Rienmann sum approaches $- \infty$ as the width of the column approaches zero.

That said, even if the expectations are defined, it doesn’t seem to me that keeping the satisfaction measure bounded above but not bellow would solve the problem of utility monsters. To see why, imagine a new utility monster as follows. The utility monster feels an incredibly strong need to have everyone on Earth be tortured. For the next hundred years, its satisfaction will will decrease by 3^^^3 for every second there’s someone on Earth not being tortured. Thus, assuming the expectations converge, the moral thing to do, according to maximizing average, total, or expected-value-conditioning-on-being-in-this-universe life satisfaction is to torture everyone. This is a problem both in finite and infinite cases.

A final random thought/question: I get that we can’t expected utility maximise unless we can take finite expectations, but does this actually prevent us having a consistent preference ordering over universes, or is it potentially just a representation issue?

If I understand what you’re asking correctly, you can indeed have consistent preferences over universes, even if you don’t have a bounded utility function. The issue is, in order to act, you need more than just a consistent preference order over possible universe. In reality, you only get to choose between probability distributions over possible worlds, not specific possible worlds. And this, with an unbounded utility function, will tend to result in undefined expected utilities over possible actions and thus is not informative of what action you should take. Which is the whole point of utility theory and ethics.

Now, according to some probability distributions, can have well-defined expected values even with an unbounded utility function. But, as I said, this is not robust, and I think that in practice expected values of an unbounded utility function would be undefined.

Chantiel 15 Nov 2021 22:27 UTC
1 point
0
in reply to: conchis’s comment on: A system of infinite ethics
For the record, according to my intuitions, average consequentialism seems perfectly fine to me in a finite universe.

That said, if you don’t like using average consequentialism in a finite case, I don’t personally see what’s wrong with just having a somewhat different ethical system for finite cases. I know it seems ad-hoc, but I think there really is an important distinction between finite and infinite scenarios. Specifically, people have the moral intuition that larger numbers of satisfied lives are more valuable than smaller numbers of them, which average utilitarianism conflicts with. But in an infinite universe, you can’t change the total amount of satisfaction or dissatisfaction.

But, if you want, you could combine both the finite ethical system and infinite ethical system so that a single principle is used for moral deliberation. This might make it feel less ad-hocy. For example, you could have a moral value function that of the form, f(total amount of satisfaction and dissatisfaction in the universe) * expected value of life satisfaction for an arbitrary agent in this universe. And let f be some bounded function that’s maximized by $\infty$ and approaches this value very slowly.

For those who don’t want this, they are free to use my total-utilitarian-infinite-ethical system. I think that it just ends up as regular total utilitarian in a finite world, or close to it.

Chantiel 14 Nov 2021 22:25 UTC
1 point
0
in reply to: Slider’s comment on: A system of infinite ethics

In P(old probability of being in first group) * 1 = (P(old probability of being in first group) + $\epsilon) * u the epsilon is smaller than any real number and there is no real small enough that it could characterise the difference between 1 and u.

Could you explain why you think so? I had already explained why $ϵ$ would be real, so I’m wondering if you had an issue with my reasoning. To quote my past self:

Remember that if you decide to take a certain action, that implies that other agents who are sufficiently similar to you and in sufficiently similar circumstances also take that action. Thus, you can acausally have non-infinitesimal impact on the satisfaction of agents in situations of the form, “An agent in a world with someone just like Slider who is also in very similar circumstances to Slider’s.” The above scenario is of finite complexity and isn’t ruled out by evidence. Thus, the probability of an agent ending up in such a situation, conditioning only only on being some agent in this universe, is nonzero [and non-infinitesimal].

If you have some odds or expectations that deal with groups and you have other considerations that deal with a finite amount of individuals you either have the finite people not impact the probabilities at all or the probabilities will stay infinidesimally close (for which is see a~b been used as I am reading up on infinities) which will conflict with the desarata...

Just to remind you, my ethical system basically never needs to worry about finite impacts. My ethical system doesn’t worry about causal impacts, except to the extent that the inform you about the total acausal impact of your actions on the moral value of the universe. All things you do have infinite acausal impact, and these are all my system needs to consider. To use my ethical system, you don’t even need a notion of causal impact at all.

Chantiel 12 Nov 2021 21:23 UTC
1 point
0
in reply to: conchis’s comment on: A system of infinite ethics

It’s possible that (a) is true, and much of your response seems like it’s probably (?) targeted at that claim, but FWIW, I don’t think this case can be convincingly made by appealing to contingent personal values: e.g. suggesting that another 50 years of torture wouldn’t much matter to you personally won’t escape the objection, as long as there’s a possible agent who would view their life-satisfaction as being materially reduced in the same circumstances.

To some extent, whether or not life satisfaction is bounded just comes down to how you want to measure it. But it seems to me that any reasonable measure of life satisfaction really would be bounded.

I’ll clarify the measure of life satisfaction I had in mind. Imagine if you showed an agent finitely-many descriptions of situations they could end up being in, and asked the agent to pick out the worst and the best of all of them. Assign the worst scenario satisfaction 0 and the best scenario satisfaction 1. For any other outcome w set the satisfaction to p, where p is the probability in which the agent would be indifferent between getting satisfaction 1 with probability p and satisfaction 0 with probability 1 - p. This is very much like a certain technique for constructing a utility function from elicited preferences. So, according to my definition, life satisfaction is bounded by definition.

(You can also take the limit of the agent’s preferences as the number of described situations approaches infinite, if you want and if it converges. If it doesn’t, then you could instead just ask the agent about its preferences with infinitely-many scenarios and require the infimum of satisfactions to be 0 and the supremum to be 1. Also you might need to do something special to deal with agents with preferences that are inconsistent even given infinite reflection, but I don’t think this is particularly relevant to the discussion.)

Now, maybe you’re opposed to this measure. However, if you reject it, I think you have a pretty big problem you need to deal with: utility monsters.

To quote Wikipedia:

A hypothetical being, which Nozick calls the utility monster, receives much more utility from each unit of a resource they consume than anyone else does. For instance, eating a cookie might bring only one unit of pleasure to an ordinary person but could bring 100 units of pleasure to a utility monster. If the utility monster can get so much pleasure from each unit of resources, it follows from utilitarianism that the distribution of resources should acknowledge this. If the utility monster existed, it would justify the mistreatment and perhaps annihilation of everyone else, according to the mandates of utilitarianism, because, for the utility monster, the pleasure they receive outweighs the suffering they may cause.

If you have some agents with unbounded measures satisfaction, then I think that would imply you would need to be willing cause arbitrary large amounts of suffering of agents with bounded satisfaction in order to increase the satisfaction of a utility monster as much as possible.

This seems pretty horrible to me, so I’m satisfied with keeping the measure of life satisfaction to be bounded.

In principle, you could have utility monster-like creatures in my ethical system, too. Perhaps all the agents other than the monster really have very little in the way of preferences, and so their life satisfaction doesn’t change much at all by you helping them. Then you could potentially give resources to the monster. However, the effect of “utility monsters” is much more limited in my ethical system, and it’s an effect that doesn’t seem intuitively undesirable to me. Unlike if you had an unbounded satisfaction measure, my ethical system doesn’t allow a single agent to cause arbitrarily large amounts of suffering to arbitrarily large numbers of other agents.

Further, suppose you do decide to have an unbounded measure of life satisfaction and aggregate it to allow even a finite universe to have arbitrarily high or low moral value. Then the expected moral values of the world would be undefined, just like how to expected value of unbounded utility functions are undefined. Specifically, just consider having a Cauchy distribution over the moral value of the universe. Such a distribution has no expected value. So, if you’re trying to maximize the expected moral value of the universe, you won’t be able to. And, as a moral agent, what else are you supposed to do?

Also, I want to mention that there’s a trivial case in which you could avoid having my ethical system torture the agent for 50 years. Specifically, maybe there’s some certain 50 years that decreases the agent’s life satisfaction a lot, even though the other 50 years don’t. For example, maybe the agent dreads the idea of having more than a million years of torture, so specifically adding those last 50 years would be a problem. But I’m guessing you aren’t worrying about this specific case.