Karma: 325

# In­vuln­er­a­ble In­com­plete Prefer­ences: A For­mal Statement

30 Aug 2023 21:59 UTC
126 points

# Ra­tional Unilat­er­al­ists Aren’t So Cursed

4 Jul 2023 12:19 UTC
47 points
• I appreciate the intention here but I think it would need to be done with considerable care, as I fear it may have already led to accidental vandalism of the epistemic commons. Just skimming a few of these Wikipedia pages, I’ve noticed several new errors. These can be easily spotted by domain experts but might not be obvious to casual readers.[1] I can’t know exactly which of these are due to edits from this community, but some very clearly jump out.[2]

I’ll list some examples below, but I want to stress that this list is not exhaustive. I didn’t read most parts of most related pages, and I omitted many small scattered issues. In any case, I’d like to ask whoever made any of these edits to please reverse them, and to triple check any I didn’t mention below.[3] Please feel free to respond to this if any of my points are unclear![4]

### False statements

• The page on Independence of Irrelevant Alternatives (IIA) claims that IIA is one of the vNM axioms, and that one of the vNM axioms “generalizes IIA to random events.”

Both are false. The similar-sounding Independence axiom of vNM is neither equivalent to, nor does it entail, IIA (and so it can’t be a generalisation). You can satisfy Independence while violating IIA. This is a not a technicality; it’s a conflation of distinct and important concepts. This is repeated in several places.

• The mathematical statement of Independence there is wrong. In the section conflating IIA and Independence, it’s defined as the requirement that

for any and any outcomes Bad, Good, and N satisfying BadGood. This mistakes weak preference for strict preference. To see this, set p=1 and observe that the line now reads NN. (The rest of the explanation in this section is also problematic but the reasons for this are less easy to briefly spell out.)

• The Dutch book page states that the argument demonstrates that “rationality requires assigning probabilities to events [...] and having preferences that can be modeled using the von Neumann–Morgenstern axioms.” This is false. It is an argument for probabilistic beliefs; it implies nothing at all about preferences. And in fact, the standard proof of the Dutch book theorem assumes something like expected utility (Ramsey’s thesis).

This is a substantial error, making a very strong claim about an important topic. And it’s repeated elsewhere, e.g. when stating that the vNM axioms “apart from continuity, are often justified using the Dutch book theorems.”

• The section ‘The theorem’ on the vNM page states the result using strict preference/​inequality. This is a corollary of the theorem but does not entail it.

• The decision theory page states that it’s “a branch of applied probability theory and analytic philosophy concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical consequences to the outcome.” This is a poor description. Decision theorists don’t simply assume this, nor do they always conclude it—e.g. see work on ambiguity or lexicographic preferences. And besides this, decision theory is arguably more central in economics than the fields mentioned.

• The IIA article’s first sentence states that IIA is an “axiom of decision theory and economics” whereas it’s classically one of social choice theory, in particular voting. This is at least a strange omission for the context-setting sentence of the article.

• It’s stated that IIA describes “a necessary condition for rational behavior.” Maybe the individual-choice version of IIA is, but the intention here was presumably to refer to Independence. This would be a highly contentious claim though, and definitely not a formal result. It’s misleading to describe Independence as necessary for rationality.

• The vNM article states that obeying the vNM axioms implies that agents “behave as if they are maximizing the expected value of some function defined over the potential outcomes at some specified point in the future.” I’m not sure what ‘specified point in the future’ is doing there; that’s not within the framework.

• The vNM article states that “the theorem assumes nothing about the nature of the possible outcomes of the gambles.” That’s at least misleading. It assumes all possible outcomes are known, that they come with associated probabilities, and that these probabilities are fixed (e.g., ruling out the Newcomb paradox).

Besides these problems, various passages in these articles and others are unclear, lack crucial context, contain minor issues, or just look prone to leave readers with a confused impression of the topic. (This would take a while to unpack, so my many omissions should absolutely not be interpreted as green lights.) As OP wrote: these pages are a mess. But I fear the recent edits have contributed to some of this.

So, as of now, I’d strongly recommend against reading Wikipedia for these sorts of topics—even for a casual glance. A great alternative is the Stanford Encyclopedia of Philosophy, which covers most of these topics.

1. ^

I checked this with others in economics and in philosophy.

2. ^

E.g., the term ‘coherence theorems’ is unheard of outside of LessWrong, as is the frequency of italicisation present in some of these articles.

3. ^

I would do it myself but I don’t know what the original articles said and I’d rather not have to learn the Wikipedia guidelines and re-write the various sections from scratch.

4. ^

Or to let me know that some of the issues I mention were already on Wikipedia beforehand. I’d be happy to try to edit those.

• I think it’ll be helpful to look at the object level. One argument says: if your beliefs aren’t probabilistic but you bet in a way that resembles expected utility, then you’re succeptible to sure loss. This forms an argument for probabilism.[1]

Another argument says: if your preferences don’t satisfy certain axioms but satisfy some other conditions, then there’s a sequence of choices that will leave you worse off than you started. This forms an agument for norms on preferences.

These are distinct.

These two different kinds of arguments have things in common. But they are not the same argument applied in different settings. They have different assumptions, and different conclusions. One is typically called a Dutch book argument; the other a money pump argument. The former is sometimes referred to as a special case of the latter.[2] But whatever our naming convensions, it’s a special case that doesn’t support the vNM axioms.

Here’s why this matters. You might read assumptions of the Dutch book theorem, and find them compelling. Then you read a article telling you that this implies the vNM axioms (or constitutes an argument for them). If you believe it, you’ve been duped.

1. ^

(More generally, Dutch books exist to support other Bayesian norms like conditionalisation.)

2. ^

This distinction is standard and blurring the lines leads to confusions. It’s unfortunate when dictionaries, references, or people make mistakes. More reliable would be a key book on money pumps (Gustafsson 2022) referring to a key book on Dutch books (Pettigrew 2020):

“There are also money-pump arguments for other requirements of rationality. Notably, there are money-pump arguments that rational credences satisfy the laws of probability. (See Ramsey 1931, p. 182.) These arguments are known as Dutch-book arguments. (See Lehman 1955, p. 251.) For an overview, see Pettigrew 2020.” [Footnote 9.]

• It may be worth thinking about why proponents of a very popular idea in this community don’t know of its academic analogues, despite them having existed since the early 90s[1] and appearing on the introductory SEP page for dynamic choice.

Academics may in turn ask: clearly LessWrong has some blind spots, but how big?

1. ^

And it’s not like these have been forgotton; e.g., McClennen’s (1990) work still gets cited regularly.

• (I learned from Sami’s post that this is called “trammelling” of incomplete preferences.)

Just for reference: this isn’t a standard term of art; I made it up. Though I do think it’s fitting.

• 6 Sep 2023 22:54 UTC
LW: 9 AF: 4
0
AF

Good question. They implicitly assume a dynamic choice principle and a choice function that leaves the agent non-opportunistic.

• Their dynamic choice principle is something like myopia: the agent only looks at their node’s immediate successors and, if a successor is yet another choice node, the agent represents it as some ‘default’ prospect.

• Their choice rule is something like this: the agent assigns some natural ‘default’ prospect and deviates from it iff it prefers some other prospect. (So if some prospect is incomparable to the default, it’s never chosen.)

These aren’t the only approaches an agent can employ, and that’s where it fails. It’s wrong to conclude that “non-dominated strategy implies utility maximization” since we know from section 2 that we can achieve non-domination without completeness—by using a different dynamic choice principle and choice function.

• I agree that there exists the dutch book theorem, and that that one importantly relates to probabilism

I’m glad we could converge on this, because that’s what I really wanted to convey.[1] I hope it’s clearer now why I included these as important errors:

• The statement that the vNM axioms “apart from continuity, are often justified using the Dutch book theorems” is false since these theorems only relate to belief norms like probabilism. Changing this to ‘money pump arguments’ would fix it.

• There’s a claim on the main Dutch book page that the arguments demonstrate that “rationality requires assigning probabilities to events [...] and having preferences that can be modeled using the von Neumann–Morgenstern axioms.” I wouldn’t have said it was false if this was about money pumps.[2] I would’ve said there was a terminological issue if the page equated Dutch books and money pumps. But it didn’t.[3] It defined a Dutch book as “a set of bets that ensures a guaranteed loss.” And the theorems and arguments relating to that do not support the vNM axioms.

Would you agree?

1. ^

The issue of which terms to use isn’t that important to me in this case, but let me speculate about something. If you hear domain experts go back and forth between ‘Dutch books’ and ‘money pumps’, I think that is likely either because they are thinking of the former as a special case of the latter without saying so explicitly, or because they’re listing off various related ideas. If that’s not why, then they may just be mistaken. After all, a Dutch book is named that way because a bookie is involved!

2. ^

Setting asside that “demonstrates” is too strong even then.

3. ^

It looks like OP edited the page just today and added ‘or money pump’. But the text that follows still describes a Dutch book, i.e. a set of bets. (Other things were added too that I find problematic but this footnote isn’t the place to explain it.)

• check the edit history yourself by just clicking on the “View History” button and then pressing the “cur” button

Great, thanks!

I hate to single out OP but those three points were added by someone with the same username (see first and second points here; third here). Those might not be entirely new but I think my original note of caution stands.

• This is a tricky topic to think about because it’s not obvious how trammelling could be a worry for Thornley’s Incomplete Preference Proposal. I think the most important thing to clarify is why care about ex-ante permissibility. I’ll try to describe that first (this should help with my responses to downstream concerns).

### Big picture

Getting terminology out of the way: words like “permissibility” and “mandatory” are shorthand for rankings of prospects. A prospect is permissible iff it’s in a choice set, e.g. by satisfying DSM. It’s mandatory iff it’s the sole element of a choice set.

To see why ex-ante permissibility matters, note that it’s essentially a test to see which prospects the agent is either indifferent between or has a preferential gap between (and are not ranked below anything else). When you can improve a permissible prospect along some dimension and yet retain the same set of permissible prospects, for example, you necessarily have a preferential gap between those remaining prospects. In short, ex-ante permissibility tells you which prospects the agent doesn’t mind picking between.

The part of the Incomplete Preference Proposal that carries much of the weight is the Timestep Near-Dominance (TND) principle for choice under uncertainty. One thing it does, roughly, is require that the agent does not mind shifting probability mass between trajectories in which the shutdown time differs. And this is where incompleteness comes in. You need preferential gaps between trajectories that differ in shutdown time for this to hold in general. If the agent had complete preferences over trajectories, it would have strict preferences between at least some trajectories that differ in shutdown time, giving it reason to shift probability mass by manipulating the button.

Why TND helps get you shutdownability is described in Thornley’s proposal, so I’ll refer to his description and take that as a given here. So, roughly, we’re using TND to get shutdownability, and we’re using incompleteness to get TND. The reason incompleteness helps is that we want to maintain indifference to shifting probability mass between certain trajectories. And that is why we care about ex-ante permissibility. We need the agent, when contemplating manipulating the button, not to want to shift probability mass in that direction. That’ll help give us TND. The rest of Thornley’s proposal includes further conditions on the agent such that it will in fact, ex-post, not manipulate the button. But the reason for the focus on ex-ante permissibility here is TND.

### Miscellany

For purposes of e.g. the shutdown problem, or corrigibility more generally, I don’t think I care about the difference between “mandatory” vs “actually chosen”?

The description above should help clear up why we care about multiple options being permissible and none mandatory: to help satisfy TND. What’s “actually chosen” in my framework doesn’t neatly connect to the Thornley proposal since he adds extra scaffolding to the agent to determine how it should act. But that’s a separate issue.

The rough mental model I have of DSM is: at time zero, the agent somehow picks between a bunch of different candidate plans (all of which are “permissible”, whatever that means), and from then on it will behave-as-though it has complete preferences consistent with that plan.
...
it sounds like the proposal in the post just frontloads all the trammelling—i.e. it happens immediately at timestep zero.

The notion of trammelling I’m using refers to the set of permissible options shrinking as a result of repeated choice. And I argued that there’s no trammelling under certainty or uncertainty, and that trammelling under unawareness is bounded. Here’s why I don’t think you can see it as the agent behaving as if its preferences were complete.

Consider the case of static choice. It’s meaningful to say that an agent has incomplete preferences. (I don’t think you disagree with that but just for the sake of completeness, I’ll give an example.) Suppose the agent has preferential gaps between all different-letter prospects. From {A,A+,B} the agent will pick either A+ or B. Suppose it picks B. That doesn’t imply, say, that the agent can be thought of as having a strict preference for B over A+. After all, if you offered it {A,A+,B} once again, it might just pick A+, a contradiction. And you can set up something similar with transitivity to get a contradiction from inferring indifference between A+ and B.

Onto dynamic choice. As you write, it’s reasonable to think of various dynamic choice principles as immediately, statically, choosing a trajectory at timestep zero. Suppose we do that. Then by the argument just above, it’s still not appropriate to model the agent as having complete preferences at the time of choosing. We’re not frontloading any trammelling; the set of ex-ante permissible prospects hasn’t changed. And that’s what we care about for TND.

I can kinda vaguely pattern-match the setup in this post to the problem: I want to have one “permissible” choice which involves the shutdown button not being pressed, and another “permissible” choice which involves the button being pressed, and I want these two choices to be incomparable to the agent. Now (my mental model of) the DSM rule says: when the agent is turned on, it somehow chooses between (two plans leading to) those two options, and from then on out acts as though it has complete preferences consistent with the choice—i.e. it either (follows a plan which) makes sure the button is pressed, or (follows a plan which) makes sure the button is not pressed, and actively prevents operators from changing it. Which sounds like not-at-all what I wanted for the shutdown problem!

Agreed! The ex-ante permissibility of various options is not sufficient for shutdownability. The rest of Thornley’s proposal outlines how the agent has to pick (lotteries over) trajectories, which involves more than TND.

• These are great. Though Sleeping Mary can tell that she’s colourblind on any account of consciousness. Whether or not she learns a phenomenal fact when going from ‘colourblind scientist’ to ‘scientist who sees colour’, she does learn the propositional fact that she isn’t colourblind.

So, if she sees no colour, she ought to believe that the outcome of the coin toss is Tails. If she does see colour, both SSA and SIA say P(Heads)=1/​2.

• I don’t apprecaite the hostility. I aimed to be helpful in spending time documenting and explaining these errors. This is something a heathy epistemic community is appreciative of, not annoyed by. If I had added mistaken passages to Wikipedia, I’d want to be told, and I’d react by reversing them myself. If any points I mentioned weren’t added by you, then as I wrote in my first comment:

...let me know that some of the issues I mention were already on Wikipedia beforehand. I’d be happy to try to edit those.

The point of writing about the mistakes here is to make clear why they indeed are mistakes, so that they aren’t repeated. That has value. And although I don’t think we should encourage a norm that those who observe and report a problem are responsible for fixing it, I will try to find and fix at least the pre-existing errors.

• I argued that the signal-theoretic[1] analysis of meaning (which is the most common Bayesian analysis of communication) fails to adequately define lying, and fails to offer any distinction between denotation and connotation or literal content vs conversational implicature.

In case you haven’t come accross this, here are two papers on lying by the founders of the modern economics literature on communication. I’ve only skimmed your discussion but if this is relevant, here’s a great non-technical discussion of lying in that framework. A common thread in these discussions is that the apparent “no-lying” implication of the analysis of language in the Lewis-Skyrms/​Crawford-Sobel signalling tradition relies importantly on common knowledge of rationality and, implicitly, on common knowledge of the game being played, i.e. of the available actions and all the players’ preferences.

• Thanks. Let me end with three comments. First, I wrote a few brief notes here that I hope clarify how Independence and IIA differ. Second, I want to stress that the problem with the use of Dutch books in the articles is a substantial one, not just a verbal one, as I explained here and here. Finally, I’m happy to hash out any remaining issues via direct message if you’d like—whether it’s about these points, others I raised in my initial comment, or any related edits.

• Great, I think bits of this comment help me understand what you’re pointing to.

the desired behavior implies a revealed preference gap

I think this is roughly right, together with all the caveats about the exact statements of Thornley’s impossibility theorems. Speaking precisely here will be cumbersome so for the sake of clarity I’ll try to restate what you wrote like this:

1. Useful agents satisfying completeness and other properties X won’t be shutdownable.

2. Properties X are necessary for an agent to be useful.

3. So, useful agents satisfying completeness won’t be shutdownable.

4. So, if a useful agent is shutdownable, its preferences are incomplete.

This argument would let us say that observing usefulness and shutdownability reveals a preferential gap.

I think the question I’m interested in is: “do trammelling-style issues imply that DSM agents will not have a revealed preference gap (under reasonable assumptions about their environment and capabilities)?”

A quick distinction: an agent can (i) reveal p, (ii) reveal ¬p, or (iii) neither reveal p nor ¬p. The problem of underdetermination of preference is of the third form.

We can think of some of the properties we’ve discussed as ‘tests’ of incomparability, which might or might not reveal preferential gaps. The test in the argument just above is whether the agent is useful and shutdownable. The test I use for my results above (roughly) is ‘arbitrary choice’. The reason I use that test is that my results are self-contained; I don’t make use of Thornley’s various requirements for shutdownability. Of course, arbitrary choice isn’t what we want for shutdownability. It’s just a test for incomparability that I used for an agent that isn’t yet endowed with Thornley’s other requirements.

The trammelling results, though, don’t give me any reason to think that DSM is problematic for shutdownability. I haven’t formally characterised an agent satisfying DSM as well as TND, Stochastic Near-Dominance, and so on, so I can’t yet give a definitive or exact answer to how DSM affects the behaviour of a Thornley-style agent. (This is something I’ll be working on.) But regarding trammelling, I think my results are reasons for optimism if anything. Even in the least convenient case that I looked at—awareness growth—I wrote this in section 3.3. as an intuition pump:

we’re simply picking out the best prospects in each class. For instance, suppose prospects were representable as pairs that are comparable iff the -values are the same, and then preferred to the extent that is large. Then here’s the process: for each value of , identify the options that maximise . Put all of these in a set. Then choice between any options in that set will always remain arbitrary; never trammelled.

That is, we retain the preferential gap between the options we want a preferential gap between.

[As an aside, the description in your first paragraph of what we want from a shutdownable agent doesn’t quite match Thornley’s setup; the relevant part to see this is section 10.1. here.]

• That makes sense, yeah.

Let me first make some comments about revealed preferences that might clarify how I’m seeing this. Preferences are famously underdetermined by limited choice behaviour. If A and B are available and I pick A, you can’t infer that I like A more than B — I might be indifferent or unable to compare them. Worse, under uncertainty, you can’t tell why I chose some lottery over another even if you assume I have strict preferences between all options — the lottery I choose depends on my beliefs too. In expected utility theory, beliefs and preferences together induce choice, so if we only observe a choice, we have one equation in two unknowns.[1] Given my choice, you’d need to read my mind’s probabilities to be able to infer my preferences (and vice versa).[2]

In that sense, preferences (mostly) aren’t actually revealed. Economists often assume various things to apply revealed preference theory, e.g. setting beliefs equal to ‘objective chances’, or assuming a certain functional form for the utility function.

But why do we care about preferences per se, rather than what’s revealed? Because we want to predict future behaviour. If you can’t infer my preferences from my choices, you can’t predict my future choices. In the example above, if my ‘revealed preference’ between A and B is that I prefer A, then you might make false predictions about my future behaviour (because I might well choose B next time).

Let me know if I’m on the right track for clarifying things. If I am, could you say how you see trammelling/​shutdown connecting to revealed preferences as described here, and I’ll respond to that?

1. ^

2. ^

The situation is even worse when you can’t tell what I’m choosing between, or what my preference relation is defined over.

• probabilities should correspond to expected observations and expected observations only

FWIW I think this is wrong. There’s a perfectly coherent framework—subjective expected utility theory (Jeffrey, Joyce, etc)—in which probabilities can correspond to many other things. Probabilities as credences can correspond to confidence in propositions unrelated to future observations, e.g., philosophical beliefs or practically-unobservable facts. You can unambiguously assign probabilities to ‘cosmopsychism’ and ‘Everett’s many-worlds interpretation’ without expecting to ever observe their truth or falsity.

However, there is another source of uncertainty: observational uncertainty. The other person might be uncertain whether they have all the facts that feed into their model, or whether their observations are correct.

This is reasonable. If a deterministic model has three free parameters, two of which you have specificied, you could just use your prior over the third parameter to create a distribution of model outcomes. This kind of situation should be pretty easy to clarify though, by saying something like “my model predicts event E iff parameter A is above A*” and “my prior P(A>A*) is 50% which implies P(E)=50%.”

But generically, the distribution is not coming from a model. It just looks like your all things considered credence that A>A*. I’d be hesitant calling a probability based on it your “inside view/​model” probability.

• Two nitpicks and a reference:

an agent’s goals might not be linearly decomposable over possible worlds due to risk-aversion

Risk aversion doesn’t violate additive separability. E.g., for we always get whether (risk neutrality) or (risk aversion). Though some alternatives to expected utility, like Buchak’s REU theory, can allow certain sources of risk aversion to violate separability.

when features have fixed marginal utility, rather than being substitutes

Perfect substitutes have fixed marginal utility. E.g., always has marginal utilities of 1 and 2.

I’ll focus on linearly decomposable goals which can be evaluated by adding together evaluations of many separate subcomponents. More decomposable goals are simpler

There’s an old literature on separability in consumer theory that’s since been tied to bounded rationality. One move that’s made is to grant weak separability accross goups of objects—features—to rationalise the behaviour of optimising accross groups first, and within groups second. Pretnar et al (2021) describe how this can arise from limited cognitive resources.

• The key question is whether the revealed preferences are immune to trammelling. This was a major point of confusion for me in discussion with Sami—his proposal involves a set of preferences passed into a decision rule, but those “preferences” are (potentially) different from the revealed preferences. (I’m still unsure whether Sami’s proposal solves the problem.)

I claim that, yes, the revealed preferences in this sense are immune to trammeling. I’m happy to continue the existing discussion thread but here’s a short motivation: what my results about trammelling show is that there will always be multiple (relevant) options between which the agent lacks a preference and the DSM choice rule does not mandate picking one over another. The agent will not try to push probability mass toward one of those options over another.