OK. It seems there are results for more than 2 goods, but the results are quite weak:
Thus, if both relative prices are below the relative prices in autarky, we can rule out the possibility that both goods 1 and 2 will be imported—but we cannot rule out the possibility that one of them will be imported. In other words, once we leave the two-good case, we cannot establish detailed predictive relations saying that if the relative price of a traded good exceeds the relative price of that good in autarky, then that good will be exported by the country in question. It follows that any search for a strong theorem along the lines of our first proposition earlier is bound to fail. The most one can hope for is a correlation between the pattern of trade and differences in autarky prices. Dixit, Avinash; Norman, Victor (1980). Theory of International Trade: A Dual, General Equilibrium Approach. Cambridge: Cambridge University Press. p. 8
Thus, if both relative prices are below the relative prices in autarky, we can rule out the possibility that both goods 1 and 2 will be imported—but we cannot rule out the possibility that one of them will be imported. In other words, once we leave the two-good case, we cannot establish detailed predictive relations saying that if the relative price of a traded good exceeds the relative price of that good in autarky, then that good will be exported by the country in question. It follows that any search for a strong theorem along the lines of our first proposition earlier is bound to fail. The most one can hope for is a correlation between the pattern of trade and differences in autarky prices.
Dixit, Avinash; Norman, Victor (1980). Theory of International Trade: A Dual, General Equilibrium Approach. Cambridge: Cambridge University Press. p. 8
Here’s something I don’t get about comparative advantage.
The implied advice, as far as I understand it, is to check which good you have a comparative advantage in producing, and offer that good to the market.
But suppose that there are a lot more goods and a lot more participants in the market.
For any one individual, given fixed prices and supply of everyone else, it sounds like we can formulate the production and trade strategy as a linear programming problem:
We have some maximum amount of time. That’s a linear constraint.
We can allocate time to different tasks.
The output of the tasks are assumed to be linear in time.
The tasks produce different goods.
These goods all have different prices on the market.
We might have some basic needs, like the 10 bananas and 10 coconuts. That’s a constraint.
We might also have desires, like not working, or we might desire some goods. That’s our linear programming objective.
OK. So we can solve this as a linear program.
But… linear programs don’t have some nice closed-form solution. The simplex algorithm can solve them efficiently in practice, but that’s very different from an easy formula like “produce the good with the highest comparative advantage”.
And that’s just solving the problem for one player, assuming the other players have fixed strategies. More generally, we have to anticipate the rest of the market as well. I don’t even know if that can be solved efficiently, via linear programming or some other technique.
Is “produce where you have comparative advantage” really very useful advice for more complex cases?
Wikipedia starts out describing comparative advantage as a law:
The law of comparative advantage describes how, under free trade, an agent will produce more of and consume less of a good for which they have a comparative advantage.
But no precise mathematical law is ever stated, and the law is only justified with examples (specifically, two-player, two-commodity examples). Furthermore, I only ever recall seeing comparative advantage explained with examples, rather than being stated as a theorem. (Although this may be because I never got past econ 101.)
This makes it hard to know what the claimed law even is, precisely. “produce more and consume less”? In comparison to what?
One spot on Wikipedia says:
Skeptics of comparative advantage have underlined that its theoretical implications hardly hold when applied to individual commodities or pairs of commodities in a world of multiple commodities.
Although, without citation, so I don’t know where to find the details of these critiques.
I’m not sure I follow that it has to be linear—I suspect higher-order polynomials will work just as well. Even if linear, there are a very wide range of transformation matrices that can be reasonably chosen, all of which are compatible with not blocking Pareto improvements and still not agreeing on most tradeoffs.
Well, I haven’t actually given the argument that it has to be linear. I’ve just asserted that there is one, referencing Harsanyi and complete class arguments. There are a variety of related arguments. And these arguments have some assumptions which I haven’t been emphasizing in our discussion.
Here’s a pretty strong argument (with correspondingly strong assumptions).
Suppose each individual is VNM-rational.
Suppose the social choice function is VNM-rational.
Suppose that we also can use mixed actions, randomizing in a way which is independent of everything else.
Suppose that the social choice function has a strict preference for every Pareto improvement.
Also suppose that the social choice function is indifferent between two different actions if every single individual is indifferent.
Also suppose the situation gives a nontrivial choice with respect to every individual; that is, no one is indifferent between all the options.
By VNM, each individual’s preferences can be represented by a utility function, as can the preferences of the social choice function.
Imagine actions as points in preference-space, an n-dimensional space where n is the number of individuals.
By assumption #5, actions which map to the same point in preference-space must be treated the same by the social choice function. So we can now imagine the social choice function as a map from R^n to R.
VNM on individuals implies that the mixed action p * a1 + (1-p) * a2 is just the point p of the way on a line between a1 and a2.
VNM implies that the value the social choice function places on mixed actions is just a linear mixture of the values of pure actions. But this means the social choice function can be seen as an affine function from R^n to R. Of course since utility functions don’t mind additive constants, we can subtract the value at the origin to get a linear function.
But remember that points in this space are just vectors of individual’s utilities for an action. So that means the social choice function can be represented as a linear function of individual’s utilities.
So now we’ve got a linear function. But I haven’t used the pareto assumption yet! That assumption, together with #6, implies that the linear function has to be increasing in every individual’s utility function.
Now I’m lost again. “you should have a preference over something where you have no preference” is nonsense, isn’t it? Either the someone in question has a utility function which includes terms for (their beliefs about) other agents’ preferences (that is, they have a social choice function as part of their preferences), in which case the change will ALREADY BE positive for their utility, or that’s already factored in and that’s why it nets to neutral for the agent, and the argument is moot.[...]
Now I’m lost again. “you should have a preference over something where you have no preference” is nonsense, isn’t it? Either the someone in question has a utility function which includes terms for (their beliefs about) other agents’ preferences (that is, they have a social choice function as part of their preferences), in which case the change will ALREADY BE positive for their utility, or that’s already factored in and that’s why it nets to neutral for the agent, and the argument is moot.
If you’re just saying “people don’t understand their own utility functions very well, and this is an intuition pump to help them see this aspect”, that’s fine, but “theorem” implies something deeper than that.
Indeed, that’s what I’m saying. I’m trying to separately explain the formal argument, which assumes the social choice function (or individual) is already on board with Pareto improvements, and the informal argument to try to get someone to accept some form of preference utilitarianism, in which you might point out that Pareto improvements benefit others at no cost (a contradictory and pointless argument if the person already has fully consistent preferences, but an argument which might realistically sway somebody from believing that they can be indifferent about a Pareto improvement to believing that they have a strict preference in favor of them).
But the informal argument relies on the formal argument.
Maybe it’s better phrased as “a CIRL agent has a positive incentive to allow shutdown iff it’s uncertain [or the human has a positive term for it being shut off]”, instead of “a machine” has a positive incentive iff.
I would further charitably rewrite it as:
“In chapter 16, we analyze an incentive which a CIRL agent has to allow itself to be switched off. This incentive is positive if and only if it is uncertain about the human objective.”
A CIRL agent should be capable of believing that humans terminally value pressing buttons, in which case it might allow itself to be shut off despite being 100% sure about values. So it’s just the particular incentive examined that’s iff.
Sure, but the theorem he proves in the setting where he proves it probably is if and only if. (I have not read the new edition, so, not really sure.)
It also seems to me like Stuart Russell endorses the if-and-only-if result as what’s desirable? I’ve heard him say things like “you want the AI to prevent its own shutdown when it’s sufficiently sure that it’s for the best”.
Of course that’s not technically the full if-and-only-if (it needs to both be certain about utility and think preventing shutdown is for the best), but it suggests to me that he doesn’t think we should add more shutoff incentives such as AUP.
Keep in mind that I have fairly little interaction with him, and this is based off of only a few off-the-cuff comments during CHAI meetings.
My point here is just that it seems pretty plausible that he meant “if and only if”.
Notice however that for Logical Counterfactual Mugging to be well defined, you need to define what Omega is doing when it is making its prediction. In Counterfactuals for Perfect Predictors, I explained that when dealing with perfect predictors, often the counterfactual would be undefined. For example, in Parfit’s Hitchhicker a perfect predictor would never give a lift to someone who never pays in town, so it isn’t immediately clear that predicting what such a person would do in town involves predicting something coherent.
Another approach is to change the example to remove the objection.
The poker-like game at the end of Decision Theory (I really titled that post simply “decision theory”? rather vague, past-me...) is isomorphic to counterfactual mugging, but removes some distractions, such as “how does Omega take the counterfactual”.
Alice receives a High or Low card. Alice can reveal the card to Bob. Bob then states a probability p for Alice’s card being Low. Bob’s incentives just encourage him to report honest beliefs. Alice loses p2.
When Alice gets a Low card, she can just reveal it to Bob, and get the best possible outcome. But this strategy means Bob will know if she has a high card, giving her the worst possible outcome in that case. In order to successfully bluff, Alice has to sometimes act like she has different cards than she has. And indeed, the optimal strategy for Alice in this case is to never show her cards.
This example will get less objections from people, because it is grounded in a very realistic game. Playing poker well requires this kind of reasoning. The powerful predictor is replaced with another player. We can still technically ask “how is the other player deal with undefined counterfactuals?”, but we can skip over that by just reasoning about strategies in the usual game-theoretic way—if Alice’s strategy were to reveal low cards, then Bob could always call high cards.
We can then insert logical uncertainty by stipulating that Alice gets her card pseudorandomly, but neither Alice nor Bob can predict the random number generator.
Not sure yet whether you can pull a similar trick with Counterfactual Prisoner’s Dilemma.
== nitpicks ==
Applying the Counterfactual Prisoner’s Dilemma to Logical Uncertainty
Why isn’t the title “applying logical uncertainty to the counterfactual prisoner’s dilemma”? Or “A Logically Uncertain Version of Counterfactual Prisoner’s Dilemma”? I don’t see how you’re applying CPD to LU.
The Counterfactual Prisoner’s Dilemma is a symmetric version of the original
Symmetric? The original is already symmetric. But “symmetric” is a concept which applies to multi-player games. Counterfactual PD makes PD into a one-player game. Presumably you meant “a one-player version”?
where regardless of whether the coin comes up heads or tails you are asked to pay $100 and you are then paid $10,000 if Omega predicts that you would have paid if the coin had come up the other way. If you decide updatelesly you will always received $9900, while if you decide updatefully, then you will receive $0.
This is only true if you use classical CDT, yeah? Whereas EDT can get $9900 in both cases, provided it believes in a sufficient correlation between what it does upon seeing heads vs tails.
So unlike Counterfactual Mugging, pre-committing to pay ensures a better outcome regardless of how the coin flip turns out, suggesting that focusing only on your particular probability branch is mistaken.
I don’t get what you meant by the last part of this sentence. Counterfactual Mugging already suggests that focusing only on your particular branch is mistaken. If someone bought that you should pay up in this problem but not in counterfactual mugging, I expect that person to say something like “because in this case that strategy is guaranteed better even in this branch”—hence, they’re not necessarily convinced to look at other branches. So I don’t think this example necessarily argues for looking at other branches.
Also, why is this posted as a question?
If it’s certain about the human objective, then it would be certain that it knows what’s best, so there would be no reason to let a human turn it off. (Unless humans have a basic preference to turn it off, in which case it could prefer to be shut off.)
I thought that the end result is that since any change would not be a pareto improvement the function can’t recommend any change so it must be completely ambivalent about everything thus is the constant function of every option being of utility 0.Pareto-optimality says that if there is a mass murderer that wants to kill as many people as possible then you should not do a choice that lessens the amount of people killed ie you should not oppose the mass murderer.
I thought that the end result is that since any change would not be a pareto improvement the function can’t recommend any change so it must be completely ambivalent about everything thus is the constant function of every option being of utility 0.
Pareto-optimality says that if there is a mass murderer that wants to kill as many people as possible then you should not do a choice that lessens the amount of people killed ie you should not oppose the mass murderer.
Ah, I should have made more clear that it’s a one-way implication: if it’s a Pareto improvement, then the social choice function is supposed to prefer it. Not the other way around.
A social choice function meeting that minimal requirement can still do lots of other things. So it could still oppose a mass murderer, so long as mass-murder is not itself a Pareto improvement.
Me too! I’m having trouble seeing how that version of the pareto-preference assumption isn’t already assuming what you’re trying to show, that there is a universally-usable social aggregation function.
I’m not sure what you meant by “universally usable”, but I don’t really argue anything about existence, only what it has to look like if it exists. It’s easy enough to show existence, though; just take some arbitrary sum over utility functions.
Or maybe I misunderstand what you’re trying to show—are you claiming that there is a (or a family of) aggregation function that are privileged and should be used for Utilitarian/Altruistic purposes?
Yep, at least in some sense. (Not sure how “privileged” they are in your eyes!) What the Harsanyi Utilitarianism Theorem shows is that linear aggregations are just such a distinguished class.
And now we have to specify which agent’s preferences we’re talking about when we say “support”.[...]The assumption I missed was that there are people who claim that a change is = for them, but also they support it. I think that’s a confusing use of “preferences”.
And now we have to specify which agent’s preferences we’re talking about when we say “support”.
The assumption I missed was that there are people who claim that a change is = for them, but also they support it. I think that’s a confusing use of “preferences”.
That’s why, in the post, I moved to talking about “a social choice function”—to avert that confusion.
So we have people, who are what we define Pareto-improvement over, and then we have the social choice function, which is what we suppose must > every Pareto improvement.
Then we prove that the social choice function must act like it prefers some weighted sum of the people’s utility functions.
But this really is just to avert a confusion. If we get someone to assent to both VNM and strict preference of Pareto improvements, then we can go back and say “by the way, the social choice function was secretly you” because that person meets the conditions of the argument.
There’s no contradiction because we’re not secretly trying to sneak in a =/> shift; the person has to already prefer for Pareto improvements to happen.
If it’s > for the agent in question, they clearly support it. If it’s =, they don’t oppose it, but don’t necessarily support it.
Right, so, if we’re applying this argument to a person rather than just some social choice function, then it has to be > in all cases.
If you imagine that you’re trying to use this argument to convince someone to be utilitarian, this is the step where you’re like “if it doesn’t make any difference to you, but it’s better for them, then wouldn’t you prefer it to happen?”
Yes, it’s trivially true that if it’s = for them then it must not be >. But humans aren’t perfectly reflectively consistent. So, what this argument step is trying to do is engage with the person’s intuitions about their preferences. Do they prefer to make a move that’s (at worst) costless to them and which is beneficial to someone else? If yes, then they can be engaged with the rest of the argument.
To put it a different way: yes, we can’t just assume that an agent strictly prefers for all Pareto-improvements to happen. But, we also can’t just assume that they don’t, and dismiss the argument on those grounds. That agent should figure out for itself whether it has a strict preference in favor of Pareto improvements.
When I say “Pareto optimality is min-bar for agreement”, I’m making a distinction between literal consensus, where all agents actually agree to a change, and assumed improvement, where an agent makes a unilateral (or population-subset) decision, and justifies it based on their preferred aggregation function. Pareto optimality tells us something about agreement. It tells us nothing about applicability of any possible aggregation function.
Ah, ok. I mean, that makes perfect sense to me and I agree. In this language, the idea of the Pareto assumption is that an aggregation function should at least prefer things which everyone agrees about, whatever else it may do.
In my mind, we hit the same comparability problem for Pareto vs non-Pareto changes. Pareto-optimal improvements, which require zero interpersonal utility comparisons (only the sign matters, not the magnitude, of each affected entity’s preference), teach us nothing about actual tradeoffs, where a function must weigh the magnitudes of multiple entities’ preferences against each other.
The point of the Harsanyi theorem is sort of that they say surprisingly much. Particularly when coupled with a VNM rationality assumption.
That’s not what Pareto-optimality asserts. It only talks about >= for all participants individually.
Given an initial situation, a Pareto improvement is a new situation where some agents will gain, and no agents will lose.
So a pareto improvement is a move that is > for at least one agent, and >= for the rest.
If you’re making assumptions about altruism, you should be clearer that it’s an arbitrary aggregation function that is being increased.
I stated that the setup is to consider a social choice function (a way of making decisions which would “respect everyone’s preferences” in the sense of regarding pareto improvements as strict preferences, ie, >-type preferences).
Perhaps I didn’t make clear that the social choice function should regard Pareto improvements as strict preferences. But this is the only way to ensure that you prefer the Pareto improvement and not the opposite change (which only makes things worse).
And then, Pareto-optimality is a red herring. I don’t know of any aggregation functions that would change a 0 to a + for a Pareto-optimal change, and would not give a + to some non-Pareto-optimal changes, which violate other agents’ preferences.
Exactly. That’s, like, basically the point of the Harsanyi theorem right there. If your social choice function respects Pareto optimality and rationality, then it’s forced to also make some trade-offs—IE, give a + to some non-Pareto changes.
(Unless you’re in a degenerate case, EG, everyone already has the same preferences.)
I feel as if you’re denying my argument by… making my argument.
My primary objection is that any given aggregation function is itself merely a preference held by the evaluator. There is no reason to believe that there is a justifiable-to-assume-in-others or automatically-agreeable aggregation function.
I don’t believe I ever said anything about justifying it to others.
I think one possible view is that every altruist could have their own personal aggregation function.
There’s still a question of which aggregation function to choose, what properties you might want it to have, etc.
But then, many people might find the same considerations persuasive. So I see nothing against people working together to figure out what “the right aggregation function” is, either.
This may be the crux. I do not assent to that. I don’t even think it’s common.
OK! So that’s just saying that you’re not interested in the whole setup. That’s not contrary to what I’m trying to say here—I’m just trying to say that if an agent satisfies the minimal altruism assumption of preferring Pareto improvements, then all the rest.
If you’re not at all interested in the utilitarian project, that’s fine, other people can be interested.
Pareto improvements are fine, and some of them actually improve my situation, so go for it! But in the wider sense, there are lots of non-Pareto changes that I’d pick over a Pareto subset of those changes.
Again, though, now it just seems like you’re stating my argument.
Weren’t you just criticizing the kind of aggregation I discussed for assenting to Pareto improvements but inevitably assenting to non-Pareto-improvements as well?
Pareto is a min-bar for agreement, not an optimum for any actual aggregation function.
My section on Pareto is literally titled “Pareto-Optimality: The Minimal Standard”
I’m feeling a bit of “are you trolling me” here.
You’ve both denied and asserted both the premises and the conclusion of the argument.
All in the same single comment.
I agree that this can create perverse incentives in practice, but that seems like the sort of thing that you should be handling as part of your decision theory, not your utility function.
I’m mainly worried about the perverse incentives part.
I recognize that there’s some weird level-crossing going on here, where I’m doing something like mixing up the decision theory and the utility function. But it seems to me like that’s just a reflection of the weird muddy place our values come from?
You can think of humans a little like self-modifying AIs, but where the modification took place over evolutionary history. The utility function which we eventually arrived at was (sort of) the result of a bargaining process between everyone, and which took some accounting of things like exploitability concerns.
In terms of decision theory, I often think in terms of a generalized NicerBot: extend everyone else the same cofrence-coefficient they extend to you, plus an epsilon (to ensure that two generalized NicerBots end up fully cooperating with each other). This is a pretty decent strategy for any game, generalizing from one of the best strategies for Prisoner’s Dilemma. (Of course there is no “best strategy” in an objective sense.)
But a decision theory like that does mix levels between the decision theory and the utility function!
I feel like the solution of having cofrences not count the other person’s cofrences just doesn’t respect people’s preferences—when I care about the preferences of somebody else, that includes caring about the preferences of the people they care about.
I totally agree with this point; I just don’t know how to balance it against the other point.
A crux for me is the coalition metaphor for utilitarianism. I think of utilitarianism as sort of a natural endpoint of forming beneficial coalitions, where you’ve built a coalition of all life.
If we imagine forming a coalition incrementally, and imagine that the coalition simply averages utility functions with its new members, then there’s an incentive to join the coalition as late as you can, so that your preferences get the largest possible representation. (I know this isn’t the same problem we’re talking about, but I see it as analogous, and so a point in favor of worrying about this sort of thing.)
We can correct that by doing 1/n averaging: every time the coalition gains members, we make a fresh average of all member utility functions (using some utility-function normalization, of course), and everybody voluntarily self-modifies to have the new mixed utility function.
But the problem with this is, we end up punishing agents for self-modifying to care about us before joining. (This is more closely analogous to the problem we’re discussing.) If they’ve already self-modified to care about us more before joining, then their original values just get washed out even more when we re-average everyone.
So really, the implicit assumption I’m making is that there’s an agent “before” altruism, who “chose” to add in everyone’s utility functions. I’m trying to set up the rules to be fair to that agent, in an effort to reward agents for making “the altruistic leap”.
First off, I’m not trying to illustrate the many-player game here. So imagine there’s just Alice and Bob. I agree that the many-player version is relevant, but I was just dealing with the complexities that arise from iteration.
Second, yeah, absolutely: strategies in iterated games can be any function of the history. But that’s a really complicated strategy space to try and draw. Essentially I’m showing you just a very high-level summary, focusing on frequency of cooperation as a salient feature.
The idea is that frequency is something each player can observe about the other. Alice can implement a Grim Trigger strategy to enforce any given frequency of cooperation from Bob. It needs to have some wiggle room, to allow chance fluctuations in frequency without pulling the Grim Trigger; but Alice can include wiggle room while enforcing tight enough a guarantee that Bob is forced to cooperate with the desired frequency in the limit, and Alice runs only a small risk of spuriously Grim Triggering.
The problem in your example is that you failed to identify a reasonable disagreement point.
Ahh, yeahh, that’s a good point.
Wait, what? Altruism has nothing to do with it. Everyone is supportive of (or indifferent to) any given Pareto improvement because it increases (or at least does not reduce) their utility.
I grant that this is not very altruistic at all, but it is possible to be even less altruistic: I could only support Pareto improvements which I benefit from. This is sorta the default.
The Pareto-optimality assumption isn’t that you’re “just OK” with Pareto-improvements, in a ≥ sense. The assumption is that you prefer them, ie, >.
I don’t refuse to have an opinion, I only refuse to claim that that my opinion is about their preferences/utility.
If you accept the Pareto-optimality assumption, and you accept the rationality assumptions with respect to your choices, then by Harsanyi’s theorem you’ve gotta make an implicit trade-off between other people’s preferences.
My opinion is about my (projected) utility from the saved or unsaved lives.
So you’ve got some way to trade off between saving different lives.
That _may_ include my perception of their satisfaction (or whatever observable property I choose), but it does not have any access to their actual preference or utility.
It sounds like your objection here is “I don’t have any access to their actual preferences”.
I agree that the formal model assumes access to the preferences. But I don’t think a preference utilitarian needs access. You can be coherently trying to respect other people’s preferences without knowing exactly what they are. You can assent to the concept of Pareto improvements as an idealized decision theory which you aspire to approximate. I think this can be a very fruitful way of thinking, even though it’s good to also track reality as distinct from the idealization. (We already have to make such idealizations to think “utility” is relevant to our decision-making at all.)
The point of the Harsanyi argument is that if you assent to Pareto improvements as something to aspire to, and also assent to VNM as something to aspire to, then you must assent to a version of preference utilitarianism as something to aspire to.
Yeah, I like your “consensus spoiler”. Maybe needs a better name, though… “Contrarian Monster”?
having a coference of −1 for everyone.
This way of defining the Consensus Spoiler seems needlessly assumption-heavy, since it assumes not only that we can already compare utilities in order to define this perfect antagonism, but furthermore that we’ve decided how to deal with cofrences.
A similar option with a little less baggage is to define it as having the opposite of the preferences of our social choice function. They just hate whatever we end up choosing to represent the group’s preferences.
A simpler option is just to define the Contrarian Monster as having opposite preferences from one particular member of the collective. (Any member will do.) This ensures that there can be no Pareto improvements.
If you have a community of 100 agents that would agree to pick some states over others and construct a new comunity of 101 with the consensus spoiler then they can’t form any choice function.
Actually, the conclusion is that you can form any social choice function. Everything is “Pareto optimal”.
The question whether it is warranted, allowed or forbidden that the coalition of 100 just proceeds with the policy choice that screws the spoiler over doesn’t seem to be a mathematical kind of claim.
If we think of it as bargaining to form a coalition, then there’s never any reason to include the Spoiler in a coalition (especially if you use the “opposite of whatever the coalition wants” version). In fact, there is a version of Harsanyi’s theorem which allows for negative weights, to allow for this—giving an ingroup/outgroup sort of thing. Usually this isn’t considered very seriously for definitions of utilitarianism. But it could be necessary in extreme cases.
(Although putting zero weight on it seems sufficient, really.)
And even in the less extreme degree I don’t get how you could use this setup to judge values that are in conflict.And if you encounter a unknown agent it seems it is ambigious whether you should take heed of its values in compromise or just treat it as a possible enemy and just adhere to your personal choices.
Pareto-optimality doesn’t really give you the tools to mediate conflicts, it’s just an extremely weak condition on how you do so, which says essentially that we shouldn’t put negative weight on anyone.
Granted, the Consensus Spoiler is an argument that Pareto-optimality may not be weak enough, in extreme situations.
Sorry to necro this here, but I find this topic extremely interesting and I keep coming back to this page to stare at it and tie my brain in knots.
(As for this, I think a major goal of LessWrong—and the alignment forum—is to facilitate sustained intellectual progress; a subgoal of that is that discussions can be sustained over long periods of time, rather than flitting about as would be the case if we only had attention for discussing the most recent posts!!)
If □ here instead means “discoverable by the agent’s proof search” or something to that effect, then the logic here seems to follow through (making the reasonable assumption that if the agent can discover a proof for A=cross->U=-10, then it will set its expected value for crossing to −10).
Right, this is what you have to do.
However, that would mean we are talking about provability in a system which can only prove finitely many things, which in particular cannot contain PA and so Löb’s theorem does not apply.
Hmm. So, a bounded theorem prover using PA can still prove Löb about itself. I think everything is more complicated and you need to make some assumptions (because there’s no guarantee a bounded proof search will find the right Löbian proof to apply to itself, in general), but you can make it go through.
I believe the technical details you’re looking for will be in Critch’s paper on bounded Löb.
Under typical game-theoretic assumptions, we would assume all players to be strategic. In that context, it seems much more natural to suppose that all evil people would also be liars.
For me, the main point of the post is that you can’t simultaneously (1) buy the picture in which the meaning of a signal is what it probabilistically implies, and (2) have a sensible definition of “lying”. So when you say “2% are evil liars”, Zack’s response could be something like, “what do you mean by that?”—you’re already assuming that words mean things independent of what they signal, which is the assumption Zack is calling out here (on my reading).
I still think this is one of the best recent posts. Well-researched and well-presented, and calls into question some of my tacit assumptions about how words work.