An alternative to always having a precise distribution over outcomes is imprecise probabilities: You represent your beliefs with a *set* of distributions you find plausible.

And if you have imprecise probabilities, expected value maximization isn’t well-defined. One natural generalization of EV maximization to the imprecise case is maximality:^{[1]} You prefer A to B iff EV_p(A) > EV_p(B) with respect to every distribution p in your set. (You’re permitted to choose any option that you don’t disprefer to something else.)

If you don’t endorse either (1) imprecise probabilities or (2) maximality given imprecise probabilities, I’m interested to hear why.

- ^
I think originally due to Sen (1970); just linking Mogensen (2020) instead because it’s non-paywalled and easier to find discussion of Maximality there.

Here are some brief reasons why I dislike things like imprecise probabilities and maximality rules (somewhat strongly stated, medium-strongly held because I’ve thought a significant amount about this kind of thing, but unfortunately quite sloppily justified in this comment; also, sorry if some things below approach being insufficiently on-topic):

I like the canonical arguments for bayesian expected utility maximization ( https://www.alignmentforum.org/posts/sZuw6SGfmZHvcAAEP/complete-class-consequentialist-foundations ; also https://web.stanford.edu/~hammond/conseqFounds.pdf seems cool (though I haven’t read it properly)). I’ve never seen anything remotely close for any of this other stuff — in particular, no arguments that pin down any other kind of rule compellingly. (I associate with this the vibe here (in particular, the paragraph starting with “To the extent that the outer optimizer” and the paragraph after it), though I guess maybe that’s not a super helpful thing to say.)

The arguments I’ve come across for these other rules look like pointing at some intuitive desiderata and saying these other rules sorta meet these desiderata whereas canonical bayesian expected utility maximization doesn’t, but I usually don’t really buy the desiderata and/or find that bayesian expected utility maximization also sorta has those desired properties, e.g. if one takes the cost of thinking into account in the calculation, or thinks of oneself as choosing a policy.

When specifying alternative rules, people often talk about things like default actions, permissibility, and preferential gaps, and these concepts seem bad to me. More precisely, they seem unnatural/unprincipled/confused/[I have a hard time imagining what they could concretely cache out to that would make the rule seem non-silly/useful]. For some rules, I think that while they might be psychologically different than ‘thinking like an expected utility maximizer’, they give behavior from the same distribution — e.g., I’m pretty sure the rule suggested here (the paragraph starting with “More generally”) and here (and probably elsewhere) is equivalent to “act consistently with being an expected utility maximizer”, which seems quite unhelpful if we’re concerned with getting a differently-behaving agent. (In fact, it seems likely to me that a rule which gives behavior consistent with expected utility maximization basically had to be provided in this setup given https://web.stanford.edu/~hammond/conseqFounds.pdf or some other canonical such argument, maybe with some adaptations, but I haven’t thought this through super carefully.) (A bunch of other people (Charlie Steiner, Lucius Bushnaq, probably others) make this point in the comments on https://www.lesswrong.com/posts/yCuzmCsE86BTu9PfA/there-are-no-coherence-theorems; I’m aware there are counterarguments there by Elliott Thornley and others; I recall not finding them compelling on an earlier pass through these comments; anyway, I won’t do this discussion justice in this comment.)

I think that if you try to get any meaningful mileage out of the maximality rule (in the sense that you want to “get away with knowing meaningfully less about the probability distribution”), basically everything becomes permissible, which seems highly undesirable. This is analogous to: as soon as you try to get any meaningful mileage out of a maximin (infrabayesian) decision rule, every action looks really bad — your decision comes down to picking the least catastrophic option out of options that all look completely catastrophic to you — which seems undesirable. It is also analogous to trying to find an action that does something or that has a low probability of causing harm ‘regardless of what the world is like’ being imo completely impossible (leading to complete paralysis) as soon as one tries to get any mileage out of ‘regardless of what the world is like’ (I think this kind of thing is sometimes e.g. used in davidad’s and Bengio’s plans https://www.lesswrong.com/posts/pKSmEkSQJsCSTK6nH/an-open-agency-architecture-for-safe-transformative-ai?commentId=ZuWsoXApJqD4PwfXr , https://www.youtube.com/watch?v=31eO_KfkjRQ&t=1946s ). In summary, my inside view says this kind of knightian thing is a complete non-starter. But outside-view, I’d guess that at least some people that like infrabayesianism have some response to this which would make me view it at least slightly more favorably. (Well, I’ve only stated the claim and not really provided the argument I have in mind, but that would take a few paragraphs I guess, and I won’t provide it in this comment.)

To add: it seems basically confused to talk about

theprobability distribution on probabilities or probability distributions, as opposed to some joint distribution on two variables oraprobability distribution on probability distributions or something. It seems similarly ‘philosophically problematic’ to talk abouttheset of probability distributions, to decide in a way that depends a lot on how uncertainty gets ‘partitioned’ into the set vs the distributions. (I wrote about this kind of thing a bit more here: https://forum.effectivealtruism.org/posts/Z7r83zrSXcis6ymKo/dissolving-ai-risk-parameter-uncertainty-in-ai-future#vJg6BPpsG93iyd7zo .)I think it’s plausible there’s some (as-of-yet-undeveloped) good version of probabilistic thinking+decision-making for less-than-ideal agents that departs from canonical bayesian expected utility maximization; I like approaches to finding such a thing that take aspects of existing messy real-life (probabilistic) thinking seriously but also aim to define a precise formal setup in which some optimality result could be proved. I have some very preliminary thoughts on this and a feeling that it won’t look at all like the stuff I’ve discussed disliking above. Logical induction ( https://arxiv.org/abs/1609.03543 ) seems cool; a heuristic estimator ( https://arxiv.org/pdf/2211.06738 ) would be cool. That said, I also assign significant probability to nothing very nice being possible here (this vaguely relates to the claim: “while there’s a single ideal rationality, there are many meaningfully distinct bounded rationalities” (I’m forgetting whom I should attribute this to)).

Thanks for the detailed answer! I won’t have time to respond to everything here, but:

But the CCT only says that if you satisfy [blah], your policy is consistentwith precise EV maximization. This doesn’t imply your policy is

inconsistentwith Maximality, nor (as far as I know) does it tell you what distribution with respect to which you should maximize precise EV in order to satisfy [blah] (or even that such a distribution is unique). So I don’t see a positive case here for precise EV maximization [ETA: as a procedure to guide your decisions, that is]. (This is my also response to your remark below about “equivalent to “act consistently with being an expected utility maximizer”.”)Could you expand on this with an example? I don’t follow.

Maximality and imprecision don’t make any reference to “default actions,” so I’m confused. I also don’t understand what’s unnatural/unprincipled/confused about permissibility or preferential gaps. They seem quite principled to me: I have a strict preference for taking action A over B (/ B is impermissible) only if I’m justified in beliefs according to which I expect A to do better than B.

This is a much longer conversation, but briefly: I think it’s ad hoc / putting the cart before the horse to shape our epistemology to fit our intuitions about what decision guidance we should have.

I agree that any precise EV maximization (which imo = any good policy) is consistent with some corresponding maximality rule — in particular, with the maximality rule with the very same single precise probability distribution and the same utility function (at least modulo some reasonable assumptions about what ‘permissibility’ means). Any good policy is also consistent with any maximality rule that includes its probability distribution as one distribution in the set (because this guarantees that the best-according-to-the-precise-EV-maximization action is always permitted), as well as with any maximality rule that makes anything permissible. But I don’t see how any of this connects much to whether there is a positive case for precise EV maximization? If you buy the CCT’s assumptions, then you literally do have an argument that anything other than precise EV maximization is bad, right, which does sound like a positive case for precise EV maximization (though not directly in the psychological sense)?

Ok, maybe you’re saying that the CCT doesn’t obviously provide an argument for it being good to restructure your thinking into literally maintaining some huge probability distribution on ‘outcomes’ and explicitly maintaining some function from outcomes to the reals and explicitly picking actions such that the utility conditional on these actions having been taken by you is high (or whatever)? I agree that trying to do this very literally is a bad idea, eg because you can’t fit all possible worlds (or even just one world) in your head, eg because you don’t know likelihoods given hypotheses as you’re not logically omniscient, eg because there are difficulties with finding yourself in the world, etc — when taken super literally, the whole shebang isn’t compatible with the kinds of good reasoning we actually can do and do do and want to do. I should say that I didn’t really track the distinction between the psychological and behavioral question carefully in my original response, and had I recognized you to be asking only about the psychological aspect, I’d perhaps have focused on that more carefully in my original answer. Still, I do think the CCT has something to say about the psychological aspect as well — it provides some pro tanto reason to reorganize aspects of one’s reasoning to go some way toward assigning coherent numbers to propositions and thinking of decisions as having some kinds of outcomes and having a schema for assigning a number to each outcome and picking actions that lead to high expectations of this number. This connection is messy, but let me try to say something about what it might look like (I’m not that happy with the paragraph I’m about to give and I feel like one could write a paper at this point instead). The CCT says that if you ‘were wise’ — something like ‘if you were to be ultimately content with what you did when you look back at your life’ — your actions would need to be a particular way (from the outside). Now, you’re pretty interested in being content with your actions (maybe just instrumentally, because maybe you think that has to do with doing more good or being better). In some sense, you know you can’t be fully content with them (because of the reasons above). But it makes sense to try to move toward being more content with your actions. One very reasonable way to achieve this is to incorporate some structure into your thinking that makes your behavior come closer to having these desired properties. This can just look like the usual: doing a bayesian calculation to diagnose a health problem, doing an EV calculation to decide which research project to work on, etc..

(There’s a chance you take there to be another sense in which we can ask about the reasonableness of expected utility maximization that’s distinct from the question that broadly has to do with characterizing behavior and also distinct from the question that has to do with which psychology one ought to choose for oneself — maybe something like what’s fundamentally principled or what one ought to do here in some other sense — and you’re interested in that thing. If so, I hope what I’ve said can be translated into claims about how the CCT would relate to that third thing.)

Anyway, If the above did not provide a decent response to what you said, then it might be worthwhile to also look at the appendix (which I ended up deprecating after understanding that you might only be interested in the psychological aspect of decision-making). In that appendix, I provide some more discussion of the CCT saying that [maximality rules which aren’t behaviorally equivalent to expected utility maximization are dominated]. I also provide some discussion recentering the broader point I wanted to make with that bullet point that CCT-type stuff is a big red arrow pointing toward expected utility maximization, whereas no remotely-as-big red arrow is known for [imprecise probabilities + maximality].

For example, preferential gaps are sometimes justified by appeals to cases like: “you’re moving to another country. you can take with you your Fabergé egg xor your wedding album. you feel like each is very cool, and in a different way, and you feel like you are struggling to compare the two. given this, it feels fine for you to flip a coin to decide which one (or to pick the one on the left, or to ‘just pick one’) instead of continuing to think about it. now you remember you have 10 dollars inside the egg. it still seems fine to flip a coin to decide which one to take (or to pick the one on the left, or to ‘just pick one’).”. And then one might say one needs preferential gaps to capture this. But someone sorta trying to maximize expected utility might think about this as: “i’ll pick a randomization policy for cases where i’m finding two things hard to compare. i think this has good EV if one takes deliberation costs into account, with randomization maybe being especially nice given that my utility is concave in the quantities of various things.”.

I mostly mentioned defaultness because it appears in some attempts to precisely specify alternatives to bayesian expected utility maximization. One concrete relation is that one reasonable attempt at specifying what it is that you’ll do when multiple actions are permissible is that you choose the one that’s most ‘default’ (more precisely, if you have a prior on actions, you could choose the one with the highest prior). But if a notion of defaultness isn’t relevant for getting from your (afaict) informal decision rule to a policy, then nvm this!

I’m not sure I understand. Am I right in understanding that permissibility is defined via a notion of strict preferences, and the rest is intended as an informal restatement of the decision rule? In that case, I still feel like I don’t know what having a strict preference or permissibility means — is there some way to translate these things to actions? If the rest is intended as an independent definition of having a strict preference, then I still don’t know how anything relates to action either. (I also have some other issues in that case: I anticipate disliking the distinction between justified and unjustified beliefs being made (in particular, I anticipate thinking that a good belief-haver should just be thinking and acting according to their beliefs); it’s unclear to me what you mean by being justified in some beliefs (eg is this a non-probabilistic notion); are individual beliefs giving you expectations here or are all your beliefs jointly giving you expectations or is some subset of beliefs together giving you expectations; should I think of this expectation that A does better than B as coming from another internal conditional expected utility calculation). I guess maybe I’d like to understand how an action gets chosen from the permissible ones. If we do not in fact feel that all the actions are equal here (if we’d pay something to switch from one to another, say), then it starts to seem unnatural to make a distinction between two kinds of preference in the first place. (This is in contrast to: I feel like I can relate ‘preferences’ kinda concretely to actions in the usual vNM case, at least if I’m allowed to talk about money to resolve the ambiguity between choosing one of two things I’m indifferent between vs having a strict preference.)

Anyway, I think there’s a chance I’d be fine with sometimes thinking that various options are sort of fine in a situation, and I’m maybe even fine with this notion of fineness eg having certain properties under sweetenings of options, but I quite strongly dislike trying to make this notion of fineness correspond to this thing with a universal quantifier over your probability distributions, because it seems to me that (1) it is unhelpful because it (at least if implemented naively) doesn’t solve any of the computational issues (boundedness issues) that are a large part of why I’d entertain such a notion of fineness in the first place, (2) it is completely unprincipled (there’s no reason for this in particular, and the split of uncertainties is unsatisfying), and (3) it plausibly gives disastrous behavior if taken seriously. But idk maybe I can’t really even get behind that notion of fineness, and I’m just confusing it with the somewhat distinct notion of fineness that I use when I buy two different meals to distribute among myself and a friend and tell them that I’m fine with them having either one, which I think is well-reduced to probably having a smaller preference than my friend. Anyway, obviously whether such a notion of fineness is desirable depends on how you want it to relate to other things (in particular, actions), and I’m presently sufficiently unsure about how you want it to relate to these other things to be unsure about whether a suitable such notion exists.

It seems to me like you were like: “why not regiment one’s thinking xyz-ly?” (in your original question), to which I was like “if one regiments one thinking xyz-ly, then it’s an utter disaster” (in that bullet point), and now you’re like “even if it’s an utter disaster, I don’t care”. And I guess my response is that you should care about it being an utter disaster, but I guess I’m confused enough about why you wouldn’t care that it doesn’t make a lot of sense for me to try to write a library of responses.

## Appendix with some things about CCT and expected utility maximization and [imprecise probabilities] + maximality that got cut

Precise EV maximization is a special case of [imprecise probabilities] + maximality (namely, the special case where your imprecise probabilities are in fact precise, at least modulo some reasonable assumptions about what things mean), so unless your class of decision rules turns out to be precisely equivalent to the class of decision rules which do precise EV maximization, the CCT does in fact say it contains some bad rules. (And if it did turn out to be equivalent, then I’d be somewhat confused about why we’re talking about it your way, because it’d seem to me like it’d then just be a less nice way to describe the same thing.) And at least on the surface, the class of decision rules does not appear to be equivalent, so the CCT indeed does speak against some rules in this class (and in fact, all rules in this class which cannot be described as precise EV maximization).

If you filled in the details of your maximality-type rule enough to tell me what your policy is — in particular, hypothetically, maybe you’d want to specify sth like the following: what it means for some options to be ‘permissible’ or how an option gets chosen from the ‘permissible options’, potentially something about how current choices relate to past choices, and maybe just what kind of POMDP, causal graph, decision tree, or whatever game setup we’re assuming in the first place — such that your behavior then looks like bayesian expected utility maximization (with some particular probability distribution and some particular utility function), then I guess I’ll no longer be objecting to you using that rule (to be precise: I would no longer be objecting to it for being dominated per the CCT or some such theorem, but I might still object to the psychological implementation of your policy on other grounds).

That said, I think the most straightforward ways [to start from your statement of the maximality rule and to specify some sequential setup and to make the rule precise and to then derive a policy for the sequential setup from the rule] do give you a policy which you would yourself consider dominated though. I can imagine a way to make your rule precise that doesn’t give you a dominated policy that ends up just being ‘anything is permissible as long as you make sure you looked like a bayesian expected utility maximizer at the end of the day’ (I think the rule of Thornley and Petersen is this), but at that point I’m feeling like we’re stressing some purely psychological distinction whose relevance to matters of interest I’m failing to see.

But maybe more importantly, at this point, I’d feel like we’ve lost the plot somewhat. What I intended to say with my original bullet point was more like: we’ve constructed this giant red arrow (i.e., coherence theorems; ok, it’s maybe not that giant in some absolute sense, but imo it is as big as presently existing arrows get for things this precise in a domain this messy) pointing at one kind of structure (i.e., bayesian expected utility maximization) to have ‘your beliefs and actions ultimately correspond to’, and then you’re like “why not this other kind of structure (imprecise probabilities, maximality rules) though?” and then my response was “well, for one, there is the giant red arrow pointing at this other structure, and I don’t know of any arrow pointing at your structure”, and I don’t really know how to see your response as a response to this.

Sets of distributions are the natural elements of Bayesian reasoning: each distribution corresponds to a hypothesis. Some people pretend that you can collapse these down to a single distribution by some prior (and then argue about “correct” priors), but the actual machinery of Bayesian reasoning produces changes in

relativehypothesis weightings. Those can be applied to any prior if you have reason to prefer a single one, or simply composed with future relative changes if you don’t.Partially ordering options by EV over all hypotheses is likely to be a very weak order with nearly all options being incomparable (and thus permissible). However, it’s quite reasonable to have

boundson hypothesis weightings even if you don’t have good reason to choose a specific prior.You can use prior bounds to form very much stronger partial orders in many cases.

My initial impulse is to treat imprecise probabilities like I treat probability distributions over probabilities: namely, I am not permanently opposed, but have promised myself that before I resort to one, I would first try a probability and a set of “indications” about how “sensitive” my probability is to changes: e.g., I would try something like

I agree that higher-order probabilities can be useful for representing (non-)resilience of your beliefs. But imprecise probabilities go further than that — the idea is that you just don’t know what higher-order probabilities over the first-order ones you ought to endorse, or the higher-higher-order probablities over those, etc. So the first-order probabilities remain imprecise.

For humans (and probably generally for embedded agents), I endorse acknowledging that probabilities are a wrong but useful model. For any given prediction, the possibility set is incomplete, and the weights are only estimations with lots of variance. I don’t think that a set of distributions fixes this, though in some cases it can capture the model variance better than a single summary can.

EV maximization can only ever be an estimate. No matter HOW you come up with your probabilities and beliefs about value-of-outcome, you’ll be wrong fairly often. But that doesn’t make it useless—there’s no better legible framework I know of. Illegible frameworks (heuristics embedded in the giant neural network in your head) are ALSO useful, and IMO best results come from blending intuition and calculation, and from being humble and suspicious when they diverge greatly.