# cubefox

Karma: 91
• Yeah, you are right. I used the fact that . This makes use of the fact that and are both mutually exclusive and exhaustive, i.e. and . For , where and are mutually exclusive but not exhaustive, is not equivalent to . Since can be true without either of or being true.

It should however work if , since then . So for to hold, would have to be a “partition” of , exhaustively enumerating all the incompatible ways it can be true.

Regarding conditional utility, I agree. This would mean that if . I found an old paper by a someone who analyzes conditional utility in detail, though with zero citations according to Google scholar. Unfortunately the paper is hard to read because of eccentric notation, and since the author, an economist, was apparently only aware of Savage’s more complicated utility theory (which has acts, states of the world, and prospects), so he doesn’t work in Jeffrey’s simpler and more general theory. But his conclusions seem intriguing, since he e.g. also says that , despite, as far as I know, Savage not having an axiom which demands utility 0 for certainty. Unfortunately I really don’t understand his notation and I’m not quite an expert on Savage either...

• Not only do humans not directly care about increasing IGF, the vast majority does hardly even care about the proxy of maximizing the number of their direct offspring. That’s something natural selection could have optimized for, but mostly didn’t. Most couples in first world countries could have more than five children, yet they have less than 1.5 on average, far below replacement. The fact that this happens in pretty much all developed countries, despite politicians’ effort to counteract this trend, shows how weak the preference for offspring really is.

It also seems that particularly men hardly care about having children, even though few are directly against it when their wives want them. And women, especially educated women, largely lose their desire for children as they go to work, particularly full-time. That’s at least something which poorer and past societies suggest.

One theory to explain this is the theory of female opportunity cost. Women in modern society, especially educated ones, perceive having children as a large opportunity cost, since the alternative to rearing children is having a career. Women in the past and in current poorer countries lived in more “patriarchal” societies where women pursuing a career was not a social norm, and thus pursuing a career was not perceived by most women as a live option, i.e. not an alternative to having children. Thus their perceived opportunity of having children was much lower than for women in non-patriarchal societies.

In any case, any explanation of this kind must assume that women’s innate desire for children is so weak that it is easily outweighed by a desire for a career.

This is all to say: Most people are even more misaligned relative to IGF than one may realize.

• But we have my result above, i.e.

This proof could also be extended to longer disjunctions between mutually exclusive propositions apart from and . Hence, for a set of mutually exclusive propositions ,

which does not rely on the assumption of being equal to . After all, I only used the desirability axiom for the derivation, not the assumption . So we get a “nice” expression anyway as long as our disjunction is mutually exclusive. Right? (Maybe I misunderstood your point.)

Regarding , I am now no longer sure that is the right definition. Maybe we instead have In which case it would follow that They are both compatible with , and I’m not sure which further plausible conditions would have to be met and which could decide which is the right definition.

• Oh yes, of course! (I probably thought this was supposed to be valid for our as well, which is assumed to be mutually exclusive, but, unlike , not exhaustive.)

• I don’t understand what you mean in the beginning here, how is the same as ?

• Regarding the time stamp: Yeah, this is the right way to think about it, at least in the case of subjective utility theory, where utilities represent desires, and probabilities represent beliefs, and it also the right way to think about for Bayesianism (subjective probability theory). and only represent the subjective state of an agent at a particular point in time. They don’t say anything how they should be changed over time. They only say that at any point in time, these functions (the agents) should satisfy the axioms.

Rules for change over time would need separate assumptions. In Bayesian probability theory this is usually the rule of classical conditionalization or the more general rule of Jeffrey conditionalization. (Bayes’ theorem alone doesn’t say anything about updating. Bayes’ rule = classical conditionalization + Bayes’ theorem)

Regarding the utility of , you write the probability part in the sum is . But it is actually just !

To see this, start with the desirability axiom: This doesn’t tell us how to calculate , only . But we can write as the logically equivalent . This is a disjunction, so we can apply the desirability axiom: This is equal to Since , we have Since was chosen arbitrarily, it can be any proposition whatsoever. And since in Jeffrey’s framework we only consider propositions, all actions are also described by propositions. Presumably of the form “I now do x”. Hence, for any .

This proof could also be extended to longer disjunctions between mutually exclusive propositions apart from and . Hence, for a set of mutually exclusive propositions , The set , the “set of all outcomes”, is a special case of where the mutually exclusive elements of sum to 1. One interpretation is to regard each as describing one complete possible world. So, But of course this holds for any proposition, not just an action . This is the elegant thing about Jeffrey’s decision theory which makes it so general: He doesn’t need special types of objects (acts, states of the world, outcomes etc) and definitions associated with those.

Regarding the general formula for . Your suggestion makes sense, I also think it should be expressible in terms of , , and . I think I’ve got a proof.

Consider The disjunctions are exclusive. By the expected utility hypothesis (which should be provable from the desirability axiom) and by the assumption, we have Then subtract the last term: Now since for any , we have . Hence, By De Morgan, . Therefore Now add to both sides: Notice that and . Therefore we can write Now subtract and we have which is equal to So we have and hence our theorem which we can also write as Success!

Okay, now with solved, what about the definition of ? I think I got it: This correctly predicts that . And it immediately leads to the plausible consequence . I don’t know how to further check whether this is the right definition, but I’m pretty sure it is.

• Interesting! I have a few remarks, but my reply will have to wait a few days as I have to finish something.

• The way I think about it: The utility maximizer looks for the available action with the highest utility and only then decides to do that action. A decision is the event of setting the probability of the action to 1, and, because of that, its utility to 0. It’s not that an agent decides for an action (sets it to probability 1) because it has utility 0. That would be backwards.

There seems to be some temporal dimension involved, some “updating” of utilities. Similar to how assuming the principle of conditionalization formalizes classical Bayesian updating when something is observed. It sets to a new value, and (or because?) it sets to 1.

A rule for utility updating over time, on the other hand, would need to update both probabilities and utilities, and I’m not sure how it would have to be formalized.

• I’m not perfectly sure what the connection with Bayesian updates is here. In general it is provable from the desirability axiom that This is because any (e.g. ) is logically equivalent to for any (e.g. ), which also leads to the “law of total probability”. Then we have a disjunction which we can use with the desirability axiom. The denominator cancels out and gives us in the nominator instead of , which is very convenient because we presumably don’t know the prior probability of an action . After all, we want to figure out whether we should do (= make ) by calculating first. It is also interesting to note that a utility maximizer (an instrumentally rational agent) indeed chooses the actions with the highest utility, not the actions with the highest expected utility, as is sometimes claimed.

Yes, after you do an action you become certain you have done it; its probability becomes 1 and its utility 0. But I don’t see that as counterintuitive, since “Doing it again”, or “continuing to do it” would be a different action which has not utility 0. Is that what you meant?

• Well, the “expected value” of something is just the value multiplied by its probability. It follows that, if the thing in question has probability 1, its value is equal to the expected value. Since is a tautology, it is clear that .

Yes, this fact is independent of , but this shouldn’t be surprising I think. After all, we are talking about the utility of a tautology here, not about the utility of itself! In general, is usually not 1 ( and are only presumed to be mutually exclusive, not necessarily exhaustive), so its utility and expected utility can diverge.

In fact, in his book “The Logic of Decision” Richard Jeffrey proposed for his utility theory that the utility of any tautology is zero: This should make sense, since learning a tautology has no value for us, neither positive not negative. This assumption also has other interesting consequences. Consider his “desirability axiom”, which he adds to the usual axioms of probability to obtain his utility theory:

If and are mutually exclusive, then (Alternatively, this axiom is provable from the expected utility hypothesis I posted a few days ago, by dividing both sides of the equation by .)

If we combine this axiom with the assumption (tautologies have utility zero), it is provable that if then . Jeffrey explains this as follows: Interpreting utility subjectively as degree of desire, we can only desire things we don’t have, or more precisely, things we are not certain are true. If something is certain, the desire for it is already satisfied, for better or for worse. Another way to look at it is that the “news value” of a certain proposition is zero. If the utility of a proposition is how good or bad it would be if we learned that it is true, then learning a certain proposition doesn’t have any value, positive or negative, since we knew it all along. So it should be assigned the value 0.

Another provable consequence is this: If (with not necessarily being certain), then . In other words, if we don’t care whether is true or not, if we are indifferent between and , then the utility of is zero. This seems highly plausible.

Yet another provable consequence is that we actually obtain a negation rule for utilities: In other words, the utility of the negation of is the utility of times its negative odds.

I also wondered whether it is then possible to also derive other rules for utility theory, such as for where and are not presumed to be mutually exclusive, or for . It would also be helpful to have a definition of conditional utility , i.e. the utility of under the assumption that is satisfied (certain). Presumably we would then have facts like .

Regarding the problem with the random variable : Since I believe probabilities of the values of a random variable sum to 1, I think we would have to assign all random variables probability 1 if we interpret the probability of a random variable as the probability of the disjunction of its values, and consequently utility zero if we accept that tautologies have utility zero.

But I’m not very familiar with random variables, and I’m not sure we even need them in subjective utility theory, a theory of instrumental rationality where we deal with propositions (“events”) which can be believed and desired (assigned a probability and a utility). A random variable does not straightforwardly correspond to a proposition, except the binary random variable which has the two values and .

• Ah, thanks. I still find this strange, since in your case and are events, which can be assigned specific probabilities and utilities, while is apparently a random variable. A random variable is, as far as I understand, basically a set of mutually exclusive and exhaustive events. E.g. = The weather tomorrow = {good, neutral, bad}. Each of those events can be assigned a probability (and they must sum to 1, since they are mutually exclusive and exhaustive) and a utility. So it seems it doesn’t make sense to assign itself a utility (or a probability). But I might be just confused here...

Edit: It would make more sense, and in fact agree with the formula I posted in my last comment, if a random variable would correspond to an event that is the disjunction of its possible values. E.g. = weather will be good or neutral or bad. In which case the probability of a random variable will be always 1, such that the expected utility of the disjunction is just its utility, and my formula above is identical to yours.

• I’m probably missing something here, but how is a defined expression? I thought takes as inputs events or outcomes or something like that, not a real number like something which could be multiplied with ? It seems you treat not as an event but as some kind of number? (I get of course, since returns a real number.)

The thing I would have associated with “expected utility hypothesis”: If and are mutually exclusive, then

• Could you explain the “expected utility hypothesis”? Where does this formula come from? Very intriguing!

• In Jeffrey’s desirability formula you write . But isn’t this value always 1 for any i? Which would mean the term can be eliminated since multiplying with 1 makes no difference? Assume p = “the die comes up even”. So the partition of p is (the die comes up...) {2,4,6}. And for all i. E.g. P(even|2)=1.

I guess you (Jeffrey) rather meant ?

• Similar recommendation to blog post writers: Try to include only relatively important links, since littering your post with links will increase effective reading time for many readers. Which will cause fewer people to read the (whole) post.

This is similar to post length: There is an urge to talk about everything somewhat relevant to the topic, respond to all possible objections and the like. But longer posts will, on average, be read by fewer people. There is a trade-off between being concise and being thorough.

• 6 Sep 2022 22:42 UTC
3 points
1 ∶ 0
in reply to: ChristianKl’s comment

To add to this: Expressing belief in the Christian god will be still relatively harmless. It would cost you some professional status because people would think you are not very smart. But expressing other beliefs outside the Overton window may make people think you are actively evil or at least very immoral. As a historical example, expressing disbelief in God was once such a case. For such (supposedly) immoral beliefs you may lose a lot more status, and not just status. You might get cancelled or excluded from your social circles, lose job opportunities etc.

Pushing the Overton window is a delicate game: It is only rational to infrequently push it a little and no more. Otherwise the risks will outweigh the rewards.

• 6 Sep 2022 13:29 UTC
2 points
3 ∶ 0
in reply to: Quintin Pope’s comment

I agree that value in the sense of goodness is not relevant for alignment. Relevant is what the AI is motivated to do, not what it believes to be good. I’m just saying that your usage of “value” would be called “desire” by philosophers.

Often it seems that using the term “values” suffers a bit from this ambiguity. If someone says an agent A “values” an outcome O, do they mean A believes that O is good, i.e. that A believes O has a high value, a high degree of goodness? Or do they mean that A wants O to obtain, i.e. that A desires O? That seems often ambiguous.

A solution would be to taboo the term “values” and instead talk directly about desires, or what is good or believed to be good.

But in your post you actually clarified in the beginning that you mean value in the sense of desire, so this usage seems fine in your case.

The term “desire” actually has one problem itself—it arguably implies consciousness. We like to talk about anything an AI might “want”, in the sense of being motivated to realize, without necessarily being conscious.

• Ethical truths are probably different from empirical truths. An advanced AI may learn empirical truths on its own from enough data, but it seems unlikely that it will automatically converge on the ethical truth. Instead, it seems that any degree of intelligence can be combined with any kind of goal. (Orthogonality Thesis)

I think the main point of the orthogonality thesis is less about an advanced AI not being able to figure out the true ethics, but the AI not being motivated to be ethical in this way even if it figures out the correct theory. If there is a true moral theory and the orthogonality thesis is true, the thesis of moral internalism (true moral beliefs are intrinsically motivating) is false. See here https://​​arbital.com/​​p/​​normative_extrapolated_volition/​​ section “Unrescuability of moral internalism”.

• 4 Sep 2022 13:54 UTC
13 points
4 ∶ 2

A bit tangential: Regarding the terminology, what you here call “values” would be called “desires” by philosophers. Perhaps also by psychologists. Desires measure how strongly an agent wants an outcome to obtain. Philosophers would mostly regard “value” as a measure of how good something is, either intrinsically or for something else. There appears to be no overly strong connection between values in this sense and desires, since you may believe that something is good without being motivated to make it happen, or the other way round.