Alternate steelman—they’re worried that you’re intending to misleadingly quote their answer in a different context, and have rigged the question to get the quote you want.
Dweomite
My primary guess is that the 6% who like RSI but not ASI are answering based on vibes rather than coherent models, and ASI currently has worse vibes.
Though I could imagine some people thinking that RSI will stop before “superintelligence”, and other people thinking that orthogonality is wrong and RSI would continue beyond some window of “dangerous superintelligence” into “godlike benevolence”. I personally consider both of those possibilities to be so staggeringly implausible that they’re basically just wishful thinking, but I also think that more than 6% of people are engaging in some amount of wishful thinking about AI.
Could you clarify what you meant by this combination of remarks?
I looked at all of those questions and thought, “two years down the road that’s hell. No thank you.”
...
Let me say, for myself, I absolutely want such an outcome.
My personal take is that I can imagine there are people who really would be happier in a scenario where they’ll literally starve to death if they aren’t productive enough, and I guess it would be good if those people could experience the scenario where they thrive, but if there are also people who do fine with a permanent vacation then it doesn’t seem ethical to forcibly keep everyone in the “work or starve” scenario. If resources aren’t actually scarce, then that’s basically slavery, and arguably worse.
For some reason, I didn’t think of this when I read the results, but immediately thought of it when I read the actual question’s wording (even though the question doesn’t mention this).
Framing effects are scarily powerful.
How is that different from saying that you do not have an Earth unless you can point to it? If we require that you augment the recipe with some spacetime coordinates that tell you where in the resulting universe to look, why are those coordinates any longer for a BB than for an Earth?
Any given configuration of the universe either eventually will produce a BB, or won’t.
Because BBs are much smaller than Earth, the number of configurations that eventually lead to BBs should be much larger than the number that eventually lead to Earths, and therefore the number of variables you need to control to ensure you eventually get a BB should (in expectation) be much smaller than the number of variables you need to control to ensure you eventually get an Earth. BBs are a larger target in possibility space, and therefore (in expectation) easier to hit.
Doesn’t this argument prove too much? Why doesn’t similar reasoning rule out other mentally-impaired states like drunkenness, dreams, and brain injuries? In fact, why doesn’t it rule out being a flawed human being, rather than an ideal reasoner?
Even if all your memories are false, couldn’t you still reason validly about abstract concepts (like math) and hypothetical scenarios? This seems to show that BBs can do at least some valid reasoning.
Even if you might be in some state S that would totally prevent you from reasoning validly, if you conjectured you were in some state T that would permit you to reason validly, and your reasoning based on that conjecture leads to a contradiction, wouldn’t that allow you to rule out state T without ruling out state S? This seems to show that you can do at least some useful reasoning without completely ruling out that you are in an impaired state.
I’m confused about why you think BBs should be complexity-penalized for the difficulty of specifying which of all hypothetically-possible BBs we’re talking about, but you don’t think Earth should be complexity-penalized for the difficulty of specifying which of all hypothetically-possible Earths we’re talking about.
I get that you can (we think) specify a recipe that eventually produces Earth, rather than explicitly specifying every detail of the current Earth. But presumably you could also specify a recipe that eventually produces a BB. And since the final result you are aiming with for the BB is much simpler than the final result you are aiming for with Earth, I would intuitively expect the recipe will also be simpler (in expectation).
What makes you think the explanation for why you won the lottery won’t help you make useful predictions about what follow-up actions will fulfill your values? For example, if the explanation were something like “the lottery was rigged, and there’s about to be a criminal investigation targeting you”, that seems pretty relevant to your follow-up plans.
Explanations of previously-mysterious phenomena are often useful in ways that are hard to foresee before you know the explanation.
If you think that understanding “normal” things is typically useful, why single out this one specific thing to be incurious about?
This seems like a neat idea, but I’d like to flag that this strategy only seems applicable when the fact you are looking for already exists explicitly and at an appropriate explainer level. I’m not sure you can do anything equivalent if you want the LLM to explain, synthesize, summarize, or do original reasoning.
I would describe a critical try as one where the act of trying is likely to prevent further attempts. Launching an ASI is a critical try because the ASI itself could likely stop you from launching more ASIs later on (e.g. by killing you).
If it’s possible to send out missions to intercept the asteroid before it arrives, then it seems to me that the asteroid is better understood as a time limit than as a critical try. You could set the parameters of the asteroid scenario in such a way that you have time for exactly one try, but you could also set the parameters so that you have time to send up a mission to deflect the asteroid, observe its results, and then make a second try before the asteroid arrives. You could also set the parameters such that you have time for zero tries! The key consideration is how fast you can work vs how much time you have.
Contrariwise, if you assume that you are stopping the asteroid with a shield that is close to the earth, such that no matter how fast you build the shield you have to wait for the asteroid to arrive before you can see how well it works, then I’d call that a critical try, because the part of the plan where you wait for the asteroid to arrive severely depletes a critical resource (time) and makes that resource unavailable for later attempts. (Note similarity to the Maginot Line.)
By similar reasoning, I’d say your #2 (global warming) is also more of a time limit, but your #3 (creating a new type of human that potentially kills you) is a critical try (though compared to launching an ASI, it’s more likely to get a middle-ground outcome).
Yes, that is the sort of example I meant. Though of course this particular example does not prove that the game of Catan, in particular, has situations like this.
Based on his other reply, I expect James would want to point out that there is an equivalent equilibrium where player A, instead of saying “button N is blue”, says “either button N is blue or no button is”, which produces the same outcome without technically lying.
I’m coming to think that there should be some other distinction we can draw that rhymes with the truthful/lying distinction but that talks about consequences instead of semantics, and therefore can’t be dodged by relabeling the signals. Still thinking about it.
In principle, IF the norms are more important to you than to everyone else combined, then there should be some amount you can pay them that is higher than how much they care about the norms but lower than how much you care about them.
(In practice, finding that amount may be hard, and treating it too much like a transaction may have friendship-corroding effects.)
Your final sentence clarified some things for me:
In that sentence, you are not arguing that the lie is no better than silence, you are arguing that it is no better than some truthful message. (This is technically still a falsification of my previously-stated interpretation of your claim.)
This argument is based on the assumption that the other players already know all circumstances under which you would transmit this message, so there’s no harm in admitting them.
I now realize that if all players have perfect knowledge of the exact conditions under which you would transmit some message, then the actual informational payload of every message is that those conditions are true. (Even with a randomized strategy, you can just interpret the RNG output as part of the conditions.) You might as well literally say “message #27”. Classifying the message itself as truth or lie becomes academic, because no one is expected or intended to pay attention to its face-value claim, and in fact there’s no reason for it to make a face-value claim at all. (Under this very strong assumption.)
So if we’re going to assume players have perfect knowledge of each others’ strategies (including what messages they send under what circumstances), I no longer think it makes sense to distinguish “true” and “false” messages.
I note that “common knowledge that all players are perfectly rational” does not (I think) logically entail perfect knowledge of everyone’s strategy, since a game can have more than one Nash equilibrium. So technically neither of us stated “perfect knowledge of everyone’s strategy” as an assumption in the first place, though I admit I sort of hand-waved towards it when I talked about what players would infer from your failure to say something.
I still think that if we don’t assume “perfect knowledge of everyone’s strategy” then lying is potentially beneficial.
Given that clarification, I’m not sure if your numbered chain of reasoning is a crux for either of us, but for the record I found that chain extremely confusing to read, I think step 3 is invalid, and your final paragraph (after the numbered list) was the only part of the comment I found helpful.
In step 3, you seem to be trying to treat the groups
and as if they were each a single player so that you can apply the conclusions from two-player games, but I don’t think that’s valid. The two-player result was based on an implicit assumption that transmitting a message from Beth to Adam cannot have any effect on the game except through Adam’s reaction, but that’s not true here because isn’t a unified agent, so transmitting can change the game (by affecting other members of ) even if refuses to react to it. does not get a veto on changing the game, like Adam does. Chris does not need to be listening in order for Adam and Beth to strike a mutually-beneficial deal at his expense.So the inference that
as a group cannot be profiting is invalid.(Also note that your claim proves too much: If this were accepted, you haven’t proven that false messages are useless, you’ve proven that all messages are useless.)
Been thinking more about this claim:
Also, with rational agents silence is just as good as dishonesty.
I don’t think this claim particularly matters to the thrust of your post, since I think we agree that you’re not playing with perfectly rational agents, but I’m interested in the claim as a matter of game theory.
To be clear, I’m interpreting this as saying something at least as strong as: “In a game of Catan where there is common knowledge that all players are perfectly rational, speaking a falsehood is never more advantageous for the speaker than remaining silent.”
After pondering this for about 20 minutes, I’m pretty convinced the claim is false, and I suspect you are over-generalizing from two-player games.
If Adam and Beth are playing a two-player zero-sum game, and Adam knows that Beth is perfectly rational, then:
If Beth reacts in any way to anything Adam says, then that reaction must be beneficial to Beth (since she is assumed perfectly rational), which means it must be harmful to Adam (because the game is zero-sum), which means Adam shouldn’t have said it.
By similar reasoning, if Beth says anything (that’s not required by the rules), then the fact that she said it can’t be harmful to herself, which means it can’t be beneficial to Adam, which means whatever Adam does in response won’t be (predictably) better than what he would have done anyway.
Therefore, Adam can safely adopt a policy of never saying anything and ignoring whatever Beth says, and this will be no worse than any other policy.
But Catan is played with at least 3 players. The game as a whole is zero-sum, but it’s possible for an action to benefit both Adam and Beth at the same time, provided it harms Chris.[1]
In a non-zero-sum negotiation, it is sometimes helpful to share information in order to coordinate on a mutually-beneficial action. So silence is not, in general, a global optimum.
But if there are situations where you would share some information if it were true, and the other player is aware of this, then silence becomes a tacit admission that it’s not true. So it might become necessary to lie in order to avoid passively leaking secrets.
The lie will only be believable if it’s a claim you would have made if it were true, which sharply limits what lies you can tell. But it does not, in general, limit it to the empty set.
This is not a proof, since I have not constructed an example game position where I can mathematically demonstrate that all of the relevant properties apply at the same time. It is conceivable there’s some reason that hasn’t occurred to me that they can’t all apply at the same time. But I have no candidates for what such a reason would be, and my brief Internet searches have failed to turn up any known result that matches the original claim.
Do you think I’ve missed something?
- ^
I’m not sure if this is known art, but I’ve found it helpful to think of zero-sum-ness as applying to a set of players rather than to a game. In a 3-player Catan game with Adam, Beth, and Chris, the set (Adam, Beth, Chris) is zero-sum, but the set (Adam, Beth) is non-zero-sum.
Note that any non-zero-sum game can be converted to a strategically-equivalent zero-sum game by adding a dummy player whose score is the negative sum of all other players’ scores (or vice versa, by adding a dummy player whose score isn’t that), so it cannot be strategically important whether “the whole game” is zero-sum if we haven’t changed the zero-sum-ness of any particular subset of players.
Have you talked explicitly with them about the norms you’d like to have? I, for one, would not have assumed that “don’t try to manipulate other players to your own advantage” would be an expected norm, but would probably be willing to go along with it if the group asked me to.
You also might consider offering to play with a handicap, so that they don’t feel that they need to target you to prevent you from winning too often.
As a rule of thumb, I strongly approve of play groups mutually agreeing on whatever rules and norms work best for them. But I also think that trying to win (within the rules) is a pretty good default norm, and shouldn’t be interpreted as a defection if you haven’t agreed on something else. I don’t think “having honest conversations” is the primary value proposition that games offer to most gamers, and in fact I can think of several popular games with dynamics that preclude it.
I do notice that you seem quite confident that this is harming your enjoyment more than it’s helping anyone else’s, and this seems...plausible, but not self-evident to me, based on the information provided. Some people really like politicking in games. It’s also the sort of thing you’d be tempted to believe even if it weren’t true, which is cause for epistemic caution.
Supposing it’s true that this is more important to you than to everyone else combined, then I think they probably ought to be willing to negotiate to follow your norms, but that you should expect to give them something else in return (even if it’s just owing them one). Try to strike a deal that’s a positive for every individual, not merely positive-sum. You shouldn’t be able to demand people accommodate you just by being a utility monster. (Though you absolutely should be allowed to stop playing, if that’s your BATNA...and if they care more about having you play with them than they do about the norms, then playing with them could perhaps be the payment for changing the norms.)
Identical twins share the same genome, but are usually considered separate objects for most purposes.
A genetic chimera has different genomes in different parts of its body, but is usually considered a single object for most purposes. (Often, you can’t tell the organism is a chimera on a casual inspection.)
I submit that “same genome” often coincides with the natural object boundary, but isn’t usually a good criterion for the boundary. The common genome is not a significant part of what makes a squirrel a good object.
I think the thing we usually care about is something more like “it acts like a single agent” or “the parts are approximately aligned towards the same goal”. (Notice these criteria suggest that when an organization acts in concert, we care more about the boundaries of the organization than the boundaries of the individuals within it; I think I endorse that implication.)
Interestingly, according to Wikipedia there doesn’t seem to be an accepted scientific definition of “organism”.
Based on this and your other comments in this thread, I suspect you’re mixing up questions of
What’s good?
How do we know what’s good?
How should we treat other people who disagree with us about what’s good?
It’s possible to think someone is baking bread wrong without thinking that you should use violence to force them to do it differently. It’s possible to think that bakers should be allowed to pick their own baking methods without thinking that all methods produce equally tasty bread.
Civilization typically uses many different levels of coercion for different sorts of rules. Different offenses might get you jail, or fines, or social censure. This difference isn’t because some of those offenses are Wrong and others are Not Wrong; it’s because creating effective policies of enforced cooperation is more complicated than just asking whether some action is Wrong. (I think this essay from Scott Alexander significantly improved my thinking on this topic.)
I think your hypothetical baker has a good and sufficient answer to your hypothetical philosopher, which is that they actually make good bread. I broadly agree that you don’t need a theoretical understanding of why your practices are good if you have some other valid reason for thinking they’re good.
But while the bakers with the best recipes won’t necessarily have good theoretical explanations, that doesn’t mean that there are no bad recipes, or that you can identify good recipes by sheer intuition. You do still need actual entangled evidence of some kind to reach accurate conclusions.
If we accept (as you wrote in another comment) that “success in convincing is the sole criterion”, then the best conclusion is one enforced by a mind control ray. I think that’s nonsense. In real life, people frequently believe things for reasons that are not much correlated with accuracy.
Furthermore, using this as the defense of any specific conclusion is circular. We, who are questioning the conclusion, are included in the group of “people”, and we have not been convinced. Insofar as your proposed system works at all, it only works because you are depending on “people” to be more likely to be convinced of correct conclusions than incorrect conclusions. If everyone who might object to the decision allowed themselves to be convinced merely because others are convinced, we’d be removing the only element of this system that makes it better than chance.
I think you do a good job of arguing (in the earlier part of the article) that it is logically possible to drop the independence axiom without being money-pumped by giving up logical consequentialism but keeping dynamic consistency. However, I think you do a poor job of arguing (in the later parts) that we should give up consequentialism.
You examine 3 in-depth examples to try to show that we’d be fine if we dropped independence: ergodicity economics, the Allais Paradox, and the Ellsburg Paradox. In all 3 cases, it think your argument is missing a critical step that is required for its validity.
1.
In the section on ergodicity economics, you claim ergodicity follows resolute choice because it forms a plan based on the entire decision tree and then sticks to that plan. But this isn’t sufficient to carry your point, because agents that obey the independence axiom can also be described as sticking to their original plan. (In fact, any agent with dynamic consistency can be described this way, and you agreed we need dynamic consistency.)
What you’d need to show in order to carry your point is that ergodicity violates consequentialism. For example, you could show this by constructing an example where a local re-evaluation would deviate from the original plan, but ergodicity follows the original plan anyway. Without showing that, this example fails to support your case.
2.
In the section on the Allais Paradox, you give the following reasoning for why the common human answer is rational:
This is precisely the point we made with the example in the introduction to section 3. If the common component C is a large safety net, you can afford to take more risk on the remaining branch. If C is negligible, you should be more conservative. Your preference between A and B should depend on what else is in the package, because you are one agent facing the total distribution, not a collection of independent sub-agents each evaluating one branch in isolation.
But this reasoning seems to be exactly backwards from the actual result: When component C provides a safety net of $1M, humans choose the lower-risk option A, but when component C provides nothing, humans choose the higher-risk option B. Your argument in this paragraph undermines, rather than supports, the rationality of the choice you are defending.
And aside from this one backwards paragraph, you don’t seem to offer any basis at all for how the context ought to change the answer. You have several paragraphs of philosophical hand-waving about how it is good and appropriate that it should, but don’t appear to offer anything like an algorithm saying how we should take it into account. Without a model predicting the preference for A over B, you fail to win any Bayes points.
Nothing in this section sounds like a logical reason to consider the common human choice in the Allais Paradox more rational than I previously did.
Sidenote: Empirical Money Pumps?
This discussion also suggests the question: Can you actually, in real life, use the Allais Paradox to money-pump humans? If you can, then the behavior of humans does not provide evidence of the rationality of their choices in this scenario, regardless of any theoretical arguments about how we could avoid money pumps while keeping this preference. My brief Google search failed to immediately turn up any experiments involving actual money pumps, but I haven’t done a careful literature review.
Sidenote: Can the Allais Paradox result be justified in other ways?
There’s two defenses for this that I somewhat credit:
A.
Eliminating a possible outcome makes it cognitively cheaper to plan for what happens after the lottery, because you don’t need to consider as many distinct cases.
Notice this reasoning only applies if there is an “after”, which is usually true in real life but usually false in abstract formal examples.
B.
Suppose you are living among a population of similar agents that compete for resources, and all of the other agents get to make a similar choice between lotteries. Then, the outcome where you get nothing is always the same in terms of absolute resources, but not in terms of relative resources when comparing to other people.
If you choose between a 1% chance and a 0% chance of getting nothing, then the few agents who end up with nothing will be out-competed by almost everyone around them. They will lose approximately all competitions and will be the obvious choice for predators to target.
If you choose between a 90% chance and a 89% chance of getting nothing, then agents who win millions will still out-compete the ones who get nothing, but they’ll have a harder time monopolizing all opportunities because there won’t be as many winners. Many of the “losers” will still have a decent relative standing.
This reasoning doesn’t apply if you somehow know this lottery is a special one-time opportunity for you only, but it seems plausible that our instincts evolved mostly to deal with non-unique opportunities.
However, notice that these two reasons justify different things. Reason A justified zero-risk bias, i.e. paying a premium to reduce a risk to ~zero; it has a sharp change in your preferences at a specific probability. Contrariwise, reason B would remain nearly as strong if we changed “1% or 0%” to “2% or 1%”.
3.
In the section on the Ellsberg Paradox, I think you make some clever points about why the standard human answer might be rational, but I don’t see how any part of this section ties into logical consequentialism or the violation thereof. For example, you have not explained how a money pump could be constructed based on this scenario.
4.
The argument in favor of logical consequentialism is obvious: If you violate it, you are leaving money on the table. (Violating it implies that you are making a choice that satisfies your own preferences less than another choice you could have made in the current circumstances.)
In fact, this is essentially the same reason that we think that vulnerability to money pumps is bad (you end up with less money than you predictably could have). So it seems pretty weird to argue that we need to keep all the axioms that prevent money pumps but it’s somehow ok to drop consequentialism. I’m not sure what set of assumptions would validly lead to that combination of conclusions.
As I read the OP, I thought to myself: If I were to steelman the people the post is complaining about, I would guess that they are interpreting the complaint about the problem as an implied proposal for how to address the problem, and they are reacting to the perceived-implied-proposal.
It seems like you’re thinking along similar lines, but are about 3 assumptions further down that road:
Not only are you guessing that they might have thought this, you’re pretty sure this is what they were thinking
Not only are you sure they thought this, you’re sure they were justified in thinking this
Not only were they justified, but all of this is so totally obvious that you’re not even going to bother articulating it, and instead treat it as an unspoken presumption of your snarky comeback
It seems reasonable to me that some people are confused by your comment.