# Take heed, for it is a trap

If you have worked your way through most of the sequences you are likely to agree with the majority of these statements:

When people die we should cut off their heads so we can preserve those heads and make the person come back to life in the (far far) future.

It is possible to run a person on Conways Game of Life. This would be a person as real as you or me, and wouldn’t be able to tell he’s in a virtual world because it looks exactly like ours.

Right now there exist many copies/clones of you, some of which are blissfully happy and some of which are being tortured and we should not care about this at all.

Most scientists disagree with this but that’s just because it sounds counter-intuitive and scientists are biased against counterintuitive explanations.

Besides, the scientific method is wrong because it is in conflict with probability theory. Oh, and probability is created by humans, it doesn’t exist in the universe.

Every fraction of a second you split into thousands of copies of yourself. Of course you cannot detect these copies scientifically, but that because science is wrong and stupid.

In fact, it’s not just people that split but the entire universe splits over and over.

Time isn’t real. There is no flow of time from 0 to now. All your future and past selves just exist.

Computers will soon become so fast that AI researchers will be able to create an artificial intelligence that’s smarter than any human. When this happens humanity will probably be wiped out.

To protect us against computers destroying humanity we must create a super-powerful computer intelligence that won’t destroy humanity.

Ethics are very important and we must take extreme caution to make sure we do the right thing. Also, we sometimes prefer torture to dust-specs.

If everything goes to plan a super computer will solve all problems (disease, famine, aging) and turn us into super humans who can then go on to explore the galaxy and have fun.

And finally, the truth of all these statements is completely obvious to those who take the time to study the underlying arguments. People who disagree are just dumb, irrational, miseducated or a combination thereof.

I learned this all from this website by these guys who want us to give them our money.

In two words: crackpot beliefs.

These statements cover only a fraction of the sequences and although they’re deliberately phrased to incite kneejerk disagreement and ugh-fields I think most LW readers will find themselves in agreement with almost all of them. And If not then you can always come up with better examples that illustrate some of your non-mainstream beliefs.

Think back for a second to your pre-bayesian days. Think back to the time before your exposure to the sequences. Now the question is, what estimate would you have given that *any chain of arguments* could persuade you the statements above are true? In my case, it would be near zero.

You can take somebody who likes philosophy and is familiar with the different streams and philosophical dilemmas, who knows computation theory and classical physics, who has a good understanding of probability and math and somebody who is a naturally curious reductionist. And this person will still roll his eyes and will sarcastically dismiss the ideas enumerated above. After all, these are crackpot ideas, and people who believe them are so far “out there”, they cannot be reasoned with!

That is really the bottom line here. You cannot explain the beliefs that follow from the sequences because they have too many dependencies and even if you did have time to go through all the necessary dependencies explaining a belief is still an order of magnitude more difficult than following the explanation written down by somebody else because in order to explain something you have to juggle two mental models: your own and the one of the listener.

Some of the sequences touches on the concept of the cognitive gap (inferential distance). We have all learned this the hard way that we can’t expect people to just understand what we say and we can’t expect short inferential distances. In practice there is just no way to bridge the cognitive gap. This isn’t a big deal for most educated people, because people don’t expect to understand complex arguments in other people’s fields and all educated intellectuals are on the same team anyway (well, most of the time). For crackpot LW beliefs it’s a whole different story though. I suspect most of us have found that out the hard way.

Rational Rian: What do you think is going to happen to the economy?

Bayesian Bob: I’m not sure. I think Krugman believes that a bigger cash injection is needed to prevent a second dip.

Rational Rian: Why do you always say what other people think, what’s your opinion?

Bayesian Bob: I can’t really distinguish between good economic reasoning and flawed economic reasoning because I’m a lay man. So I tend to go with what Krugman writes, unless I have a good reason to believe he is wrong. I don’t really have strong opinions about the economy, I just go with the evidence I have.

Rational Rian: Evidence? You mean his opinion.

Bayesian Bob: Yep.

Rational Rian: Eh? Opinions aren’t evidence.

Bayesian Bob: (Whoops, now I have to either explain the nature of evidence on the spot or Rian will think I’m an idiot with crazy beliefs. Okay then, here goes.) An opinion reflects the belief of the expert. These beliefs can either be uncorrelated with reality, negatively correlated or positively correlated. If there is absolutely no relation between what an expert believes and what is true then, sure, it wouldn’t count as evidence. However, it turns out that experts mostly believe true things (that’s why they’re called experts) and so the beliefs of an expert are positively correlated with reality and thus his opinion counts as evidence.

Rational Rian: That doesn’t make sense. It’s still just an opinion. Evidence comes from experiments.

Bayesian Bob: Yep, but experts have either done experiments themselves or read about experiments other people have done. That’s what their opinions are based on. Suppose you take a random scientific statement, you have no idea what it is, and the only thing you know is that 80% of the top researchers in that field agree with that statement, would you then assume the statement is probably true? Would the agreement of these scientists be evidence for the truth of the statement?

Rational Rian: That’s just an argument ad populus! Truth isn’t governed by majority opinion! It is just religious nonsense that if enough people believe something then there must be some some truth to it.

Bayesian Bob: (Ad populum! Populum! Ah, crud, I should’ve phrased that more carefully.) I don’t mean that majority opinion *proves* that the statement is true, it’s just evidence in favor of it. If there is counterevidence the scale can tip the other way. In the case of religion there is overwhelming counterevidence. Scientifically speaking religion is clearly false, no disagreement there.

Rational Rian: There’s scientific counterevidence for religion? Science can’t prove non-existence. You know that!

Bayesian Bob: (Oh god, not this again!) Absence of evidence is evidence of absence.

Rational Rian: Counter-evidence is not the same as absence of evidence! Besides, stay with the point, science can’t prove a negative.

Bayesian Bob: The certainty of our beliefs should be proportional to amount of evidence we have in favor of the belief. Complex beliefs require more evidence than simple beliefs, and the laws of probability, Bayes specifically, tell us how to weigh new evidence. A statement, any statement, starts out with a 50% probability of being true, and then you adjust that percentage based on the evidence you come into contact with. (I shouldn’t have said that 50% part. There’s no way that’s going to go over well. I’m such an idiot.)

Rational Rian: A statement without evidence is 50% likely to be true!? Have you forgotten everything from math class? This doesn’t make sense on so many levels, I don’t even know where to start!

Bayesian Bob: (There’s no way to rescue this. I’m going to cut my losses.) I meant that in a vacuum we should believe it with 50% certainty, not that any arbitrary statement is 50% likely to accurately reflect reality. But no matter. Let’s just get something to eat, I’m hungry.

Rational Rian: So we should believe something even if it’s unlikely to be true? That’s just stupid. Why do I even get into these conversations with you? *sigh* … So, how about Subway?

The moral here is that crackpot beliefs are low status. Not just low-status like believing in a deity, but majorly low status. When you believe things that are perceived as crazy and when you can’t explain to people why you believe what you believe then the only result is that people will see you as “that crazy guy”. They’ll wonder, behind your back, why a smart person can have such stupid beliefs. Then they’ll conclude that intelligence doesn’t protect people against religion either so there’s no point in trying to talk about it.

If you fail to conceal your low-status beliefs you’ll be punished for it socially. If you think that they’re in the wrong and that you’re in the right, then you missed the point. This isn’t about right and wrong, this is about anticipating the consequences of your behavior. If you choose to to talk about outlandish beliefs when you know you cannot convince people that your belief is justified then you hurt your credibility and you get nothing for it in exchange. You cannot repair the damage easily, because even if your friends are patient and willing to listen to your complete reasoning you’ll (accidently) expose three even crazier beliefs you have.

An important life skill is the ability to get along with other people and to not expose yourself as a weirdo when this isn’t in your interest to do so. So take heed and choose your words wisely, lest you fall into the trap.

**EDIT—**Google Survey by Pfft

PS: intended for /main but since this is my first serious post I’ll put it in discussion first to see if it’s considered sufficiently insightful.

- 15 Mar 2012 2:22 UTC; 72 points) 's comment on Cult impressions of Less Wrong/Singularity Institute by (
- 17 Aug 2011 1:45 UTC; 26 points) 's comment on Humans: Not Carved from Marble by (
- 18 Aug 2011 18:55 UTC; 7 points) 's comment on Needing Better PR by (
- 15 Aug 2011 7:14 UTC; 0 points) 's comment on Remind Physicalists They’re Physicalists by (

To everyone who just read this and is about to argue with the specific details of the bullet points or the mock argument:

Don’t bother, they’re (hopefully) not really the point of this.

Focus on the conclusion and the point that LW beliefs have a large inferential distance. The summary of this post which is interesting to talk about is “some (maybe most) LW beliefs will appear to be crackpot beliefs to the general public” and “you can’t actually explain them in a short conversation in person because the inferential distance is too large”. Therefore, we should be very careful to not get into situations where we might need to explain things in short conversations in person.

Should I start staying indoors more?

You could. Or you could just refuse to get into arguments about politics/philosophy. Or you could find a social group such that these things aren’t problems.

I certainly don’t have amazing solutions to this particular problem, but I’m fairly sure they exist.

The solutions that I have so far are just finding groups of people who tend to be open-minded, and then discussing things from the perspective of “this is interesting, and I think somewhat compelling”.

When I get back from vacation I intend to do more wandering around and talking to strangers about LWy type stuff until I get the impression that I don’t sound like a crackpot. When I get good at talking about it with people with whom social mistakes are relatively cheap, I’ll talk about it more with less open-minded friends.

This comment makes the OP’s point effectively, in a fraction of its length and without the patronizing attitude. Comment upvoted, OP downvoted.

The way to have these conversations is to try to keep them as narrow as possible. You’re not trying to explain your worldview, you’re just trying to take the other person one step forward in inferential distance. There should be one point that you’re trying to make that you want the other person to take away from the conversation, and you should try to make that point as clearly and simply as possible, in a way that will be understandable to the other person. Maybe you can give them a glimpse that there’s more to your thinking than just this one point, but only if it doesn’t distract from that point.

Bob doesn’t do this. He feels that he needs to explain the nature of evidence, he uses an example which is controversial to Rian (and thus is a distraction from the point that Bob is trying to establish with the example), and he responds to every issue that Rian brings up instead of trying to bring the conversation back to the original point. Bob’s problem is not that he has particularly unusual or crazy beliefs, it’s that he has various views that are different from Rian’s and he lets the conversation bounce from one to another without ever sticking with one point of disagreement long enough to get a clear explanation of his views across.

If you hadn’t amplified the oddness of the beliefs on the list, this would be true. The trouble is, the way you amplified oddness is mostly by changing the substance of what was communicated, not just the style. Like by using over-general words so that people will hear one connotation when you might have been trying to say another. And so, why should we agree with statements that say the wrong thing?

Starting out with an incorrect guess about the reader is really bad for the rest of the post. You should start with your message instead, maybe even use personal experience—“I’ve had conversations where I brought up beliefs spread on LW, and people thought I was a crackpot.”

But I also disagree with the thesis that the solution is to try to “hide the crazy.” Bayesian Bob doesn’t break things down into small enough parts and tries to use too many “impressive” statements that are actually harmful to communication. So a first action might be to stop digging himself deeper, but ultimately I think Bob should try to get better at explaining.

Got any tips for Bob and the rest of us?

The only stratagem that occurs to me after reading Zed’s dialogue is that Bob should have spent more time motivating his solutions. I notice that Rian is the one asking all the questions, while Bob is the one offering short answers. Perhaps if Bob had been asking Rian why someone would believe in the opinions of experts and allowed him to offer possible solutions, and then guided Rian’s own questioning in the right direction with more questions, the exchange would have gone differently.

I’m a bad explainer in this sort of situation, too, but perhaps something like:

Once I’ve got a positive position staked out from Rian, I can much more easily show him the reasons that I think they’re wrong. I’m no longer at risk of appearing a credulous crackpot, but instead appear to be the level-headed skeptical one.

ETA: One more attempt at summarizing my idea: don’t offer your solutions until the problems are understood.

This shows a lack of understanding of signaling theory.

A poor kid wears middle class clothes so that people will think they’re middle class and not poor. A middle class person wears rich clothes so that people will think they’re rich and not middle class. A rich person wears whatever they want, because middle class people are already wearing ‘rich’ clothes and nobody’s going to confuse them for being poor while they’re matching ripped jeans with Rolex watches. If you and your beliefs are already low status, then having ‘crackpot’ beliefs will push your status lower. If you are already high status, then eccentric beliefs will increase your status. At the highest levels of status, people will automatically and unconsciously update their beliefs toward yours.

Your story sounds like Ryan is much higher status than Bob. Ryan’s got kung-fu master level rationality skills versus low level Bayesian judo. Ryan also sounds more articulate and intelligent than Bob, although that might be the halo effect talking since we already established he’s higher status. Bob is outgunned on every level and isn’t smart enough to extricate himself, so of course he’s going to be punished for it socially. It could have been an argument between any two ideological positions and Bob would have lost.

It says nothing about how most of us on Less Wrong should display our beliefs.

He may be more familiar with certain other internet communities and assume most LessWrong readers have low status.

I see what you did there.

http://lesswrong.com/lw/28w/aspergers_poll_results_lw_is_on_the_spectrum/

I’d say that’s pretty damning.

It is not “damning”. The test diagnoses a particular cognitive style, characterised by precision and attention to detail—this is of no great benefit in social settings, and in extreme cases can lead to difficulty in social interaction and peculiar behaviour. On the other hand, in sciences, engineering and probably philosophy, this style brings major benefits. The overall quality of LW site is a reflection of this.

Aspergers and anti-social tendencies are, as far as I can tell, highly correlated with low social status. I agree with you that the test also selects for people who are good at the sciences and engineering. Unfortunately scientists and engineers also have low social status in western society.

First Xachariah suggested I may have misunderstood signaling theory. Then Incorrect said that what I said would be correct assuming LessWrong readers have low status. Then I replied with evidence that I think supports that position. You probably interpreted what I said in a different context.

I classed this as a ‘why commonly held LW beliefs are wrong’ post when I first saw the list, then skipped to the conclusion (which made a really useful point, for which I upvoted the post.) I’m mentioning this because I think that the post would communicate better if you revealed your main point earlier.

Thank you, I’ll bear that in mind next time.

The conversation between Rational Rian and Bayesian Bob is uncannily reminiscent of several conversations I had when I first grew infatuated with some of EY’s writings and Lesswrong overall. This later led me to very quickly start wondering if the community would be willing to dedicated some intellectual effort and apply rationality to hiding bad signalling.

I think the OP is worth posting in the main section. But someone should write up

something, about how to raise the sanity waterline without damaging your own reputation after that. Now I know when people call onsomeoneto do something, this more or less meansno oneespecially notme. This is why I’ve been doing my own thinking on the matter, but I’d first like to know if people on LW are interestedat allin this line of thought.For an example: A basic stratagem seems to be to successfully diagnose, perhaps even affirm, some of your acquaintances beliefs then over time present some simple and powerful, if perhaps by now obsolete or superseded arguments that first started several of LW’s more prominent writers (or yourself) on a path to the current set of beliefs. This naturally isn’t rationality building (though it might happen in the process), just spreading beliefs, but the

objectivehere is to change the in group norms of your social circle.Then, you can start individually building rationality

skills.I would definitely be interested.

I’d also be interested in posts about raising your status (I’m thinking social skills) since status is really useful.

I think that’s a great idea, and if you have any ideas please share.

It is not nearly as bad as you make it out. Bayesian Bob just seems really bad at explaining.

Rian seems to not consider detectives investigating a crime to be gathering evidence, but Bob does not seem to notice this. We can come up with examples of socially categorized types of evidence and explain why the categories are socially useful.

Absence of Evidence is Evidence of Absence can be explained in scientific terms. If a scientific experiment looking for evidence of a theory produces no results, that is evidence against the theory. This is easier to deal with in a scientific experiment because its controlled nature allows you to know how hard it was looking for evidence, to calculate how likely it would be to find the evidence if the theory were correct. Outside the context, the principle is harder to apply because the conditional probability is harder to calculate, but it is still valid.

Not once did Bob bring up such concepts as likelihood ratios or conditional probability.

And plenty of other comments have noted the problem with “starts out with a 50% probability”.

As has also been pointed out already, most of the bullet point statements are either not actually controversial, or distorted from the idea they refer to. In particular, though probability theory does not perfectly align with the scientific method, it does explain how the scientific method has been as successful as it is.

I myself have discussed LW ideas with people from skeptics and atheist groups, and not come off as a crackpot.

Zed, you have earned an upvote (and several more mental ones) from me for this display of understanding on a level of abstraction even beyond what some LW readers are comfortable with, as witnessed by other comments. How prescient indeed was Bayesian Bob’s remark:

You can be assured that poor Rational Rian has no chance when even Less Wrong has trouble!

But yes, this is of course completely correct. 50% is the probability of

total ignorance—including ignorance of how many possibilities are in the hypothesis space. Probability measures how much information you have, and 50% represents a “score” of zero. (How do you calculate the “score”, you ask? It’s the logarithm of the odds ratio. Why should that be chosen as the score? Because it makes updating additive: when you see evidence, you update your score by adding to it the number of bits of evidence you see.)Of course, we almost never reach this level of ignorance in practice, which makes this the type of abstract academic point that people all-too-characteristically have trouble with. The step of calculating the complexity of a hypothesis seems “automatic”, so much so that it’s easy to forget that there is a step there.

If P is the probability that an ideal Bayesian would assign to a proposition A on hearing A but having observed no relevant evidence, then you have described the meta expected value of P in logical ignorance before doing any calculations (and assuming an ignorance prior on the distribution of propositions one might hear about). It seems to me that you have made excessively harsh criticism against those who have made correct statements about P itself.

See my other comments. In my opinion, the correct point of view is that P is a variable (or, if you prefer, a two-argument function); the “correct” statements are about a different value of P from the relevant one (resp. depend on inappropriately fixing one of the two arguments).

EDIT: Also, I think this is the level on which Bayesian Bob was thinking, and the critical comments weren’t taking this into account and were assuming a basic error was being made (just like Rational Rian).

I think this is actually too weak. Hypothesis specification of any kind requires some kind of working model/theory/map of the external world. Otherwise the hypothesis doesn’t have semantic content. And once you have that model some not totally ignorant prior will fall out. You’re right that 50% is the probability of total ignorance, but this is something of a conceptual constant that falls out of the math—you can’t actually specify a hypothesis with such little information.

Yes, that’s exactly right! It is a conceptual constant that falls out of the math. It’s purely a formality. Integrating this into your conceptual scheme is good for the versatility of your conceptual scheme, but not for much else—until, later, greater versatility proves to be important.

People have a great deal of trouble accepting formalities that do not appear to have concrete practical relevance. This is why it took so long for the numbers 0 and 1 to be accepted as numbers.

I disagree with this bit. It’s only purely a formality when you consider a single hypothesis, but when you consider a hypothesis that is comprised of several parts, each of which uses the prior of total ignorance, then the 0.5 prior probability shows up in the real math (that in turn affects the decisions you make).

I describe an example of this here: http://lesswrong.com/r/discussion/lw/73g/take_heed_for_it_is_a_trap/4nl8?context=1#4nl8

If you think that the concept of the universal prior of total ignorance is purely a formality, i.e. something that can

neveraffect the decisions you make, then I’d be very interested in your thoughts behind that.Is it not propositions that can only be true or false, while statements can be other things?

What’s the relevance of this question? Is there a reason “statement” shouldn’t be interpreted as “proposition” in the above?

As I see it, statements start with some probability of being true propositions, some probability of being false propositions, and some probability of being neither. So a statement about which I have no information, say a random statement to which a random number generator was designed to preface with “Not” half the time, has a less than 50% chance of being true.

This speaks to the intuition that statements fail to be true most of the time. “A proposition, any proposition, starts out with a 50% probability of being true” is only true assuming the given statement is a proposition, and I think knowing that an actual statement is a proposition entails being contaminated by knowledge about the proposition’s contents.

Okay. So “a statement, any statement, is as likely to be true as false (under total ignorance)” would be more accurate. The odds ratio remains the same.

The intuition that statements fail to be true most of the time is wrong, however. Because, trivially, for every statement that is true its negation is false and for every statement that is false its negation is true. (Statements that have no negation are neither true nor false)

It’s just that (interesting) statements in practice tend to be positive claims (about the world), and it’s much harder to make a true positive claim about the world than a true negative one. This is why a long (measured in Kolmogorov complexity) positive claim is very unlikely to be true and a long negative claim (Kolmogorov complexity) is very likely to be true. Also, it’s why a long conjunction of terms is unlikely to be true and a long disjunction of terms is likely to be true. Again, symmetry.

S=P+N

P=T+F

T=F

S=~T+T

N>0

~~~

~T+T=P+N

~T+T=T+F+N

~T=F+N

~T=T+N

~T>T

Legend:

I don’t agree with condition S = ~T + T.

Because ~T + T is what you would call the set of (true and false) propositions, and I have readily accepted the existence of statements which are neither true nor false. That’s N. So you get S = ~T + T + N = T + F + N = P + N

We can just taboo proposition and statement as proposed by komponisto. If you agree with the way he phrased it in terms of hypothesis then we’re also in agreement (by transitivity of agreement :)

(This may be redundant, but if your point is that the set of non-true statements is larger than the set of false propositions, then yes, of course, I agree with that. I still don’t think the distinction between statement and proposition is that relevant to the underlying point because the odds ratio is not affected by the inclusion or exclusion of non-propositional statements)

I would be much happier with that survey if it used the standard five-degrees-of-belief format rather than a flat agree/disagree. Especially later on, it includes many statements which I believe or disbelieve with low confidence, or which I consider irrelevant or so malformed as to be essentially meaningless.

Some of those statements in the list are sufficiently unclear that I can’t really agree or disagree with them. Others have multiple different claims in them and I agree with some parts and disagree with others. And some are just false.

This one is false, as some other comments have pointed out.

(Bayesian) Probability theory doesn’t say that the scientific method is wrong. It provides a formal specification of why the scientific method (of changing beliefs based on evidence) is correct and how to apply it. The second sentence refers to the true beliefs explained in Probability is in the Mind and Probability is Subjectively Objective, but it mangles them.

“Science is wrong and stupid” is just false. It’s more like, you can’t detect these copies directly but they are implied by math that has been supported by experiment. Unless you want to claim that theoretical physics is unscientific, you have to accept that using math to find out facts about the physical world is possible.

This exaggerates the simple (tautological?) true statement that “I have enough evidence to convince a rational person of all the above” until it is not true.

This is also a misrepresentation. Some of the guys on the website work for a nonprofit and want you to give money to fund their research, which they believe will save many lives. Or if you don’t want to do that, they want you give money to some other charity that you believe will save the most possible lives. A majority of the content producers don’t ask for money but many of them do give it.

If it were a con, it would be a very long con. It wouldn’t necessarily look any different from what we see and you describe though. Its hard to con this audience, but most of the contributors wouldn’t be in on it, in fact its imperative that they not be.

For future reference, all of my comments and posts are by someone who wants you to give me your money. Likewise for most people, I suspect.

Yes but are my donations to you tax exempt? Can I get a rewards credit card to pay you off? There actually is a difference between “wanting money” and having your livelihood depend on donations to your non-profit foundation.

I do not think there is anything sinister going on at all here, but it is mistaken to think that someone who doesn’t nakedly solicit donations and does endorse other charities cannot be running a con (i.e. they have pretenses about the purpose and usage of the money). For some types of fish you can only catch some if you are willing to let most of them go.

Great point.

I will add two levels of nuance.

One is the extent to which individual future donors are necessary.

The other is the divergence between any group’s goals and those of its members. A good analogy is heat dissipation: for any one member, one can’t predict his or her goals from the group’s goals, though in general one can generalize about group members and their goals.

Note that these are matters of extent and not type. Note also how much this is true for other things. :)

I think a more correct term in this context would be

argumentum ad verecundiam. It’s about arguing based on the opinion of a small number of authoritative people, not the general public.I realize this is not the main point of the post, but this statement made me curious: what fraction of Less Wrong readers become convinced of these less mainstream beliefs?

To this end I made a Google survey! If you have some spare time, please fill it out. (Obviously, we should overlook the deliberately provocative phrasing when answering).

I’ll come back two weeks from now and post a new comment with the results.

Here are the crackpot belief survey results.

All in all, 77 people responded. It seems we do drink the Kool-Aid! Of the substantial questions, the most contentious ones were “many clones” and timeless physics, and even they got over 50%. Thanks to everyone who responded!

I want people to cut off my head when I’m medically dead, so my head can be preserved and I can come back to life in the (far far) future.

Agree73%Disagree27%It is possible to run a person on Conways Game of Life. This would be a person as real as you or me, and wouldn’t be able to tell he’s in a virtual world because it looks exactly like ours.

Agree90%Disagree10%Right now there exist many copies/clones of you, some of which are blissfully happy and some of which are being tortured and we should not care about this at all.

Agree53%Disagree47%Most scientists disagree with this but that’s just because it sounds counter-intuitive and scientists are biased against counterintuitive explanations.

Agree32%Disagree68%Besides, the scientific method is wrong because it is in conflict with probability theory.

Agree23%Disagree77%Oh, and probability is created by humans, it doesn’t exist in the universe.

Agree77%Disagree23%Every fraction of a second you split into thousands of copies of yourself.

Agree74%Disagree26%Of course you cannot detect these copies scientifically, but that because science is wrong and stupid.

Agree7%Disagree93%In fact, it’s not just people that split but the entire universe splits over and over.

Agree77%Disagree23%Time isn’t real. There is no flow of time from 0 to now. All your future and past selves just exist.

Agree53%Disagree47%Computers will soon become so fast that AI researchers will be able to create an artificial intelligence that’s smarter than any human. When this happens humanity will probably be wiped out.

Agree68%Disagree32%To protect us against computers destroying humanity we must create a super-powerful computer that won’t destroy humanity.

Agree70%Disagree30%Ethics are very important and we must take extreme caution to make sure we do the right thing.

Agree82%Disagree18%Also, we sometimes prefer torture to dust-specs.

Agree69%Disagree31%If everything goes to plan a super computer will solve all problems (disease, famine, aging) and turn us into super humans who can then go on to explore the galaxy and have fun.

Agree79%Disagree21%the truth of all these statements is completely obvious to those who take the time to study the underlying arguments. People who disagree are just dumb, irrational, miseducated or a combination thereof.

Agree27%Disagree73%I learned this all from this website by these guys who want us to give them our money.

Agree66%Disagree34%I want to fill it out, I really do, but the double statements make me hesitate.

For example I do believe that there are ~lots of “clones of me” around, but I disagree that we shouldn’t care about this. It has significant meaning when you’re an average utilitarian, or something approaching one.

Most of the questions seem to be loaded or ambiguous in some way.

For example, this one implies intelligence is simply a hardware problem:

Well, to some extent, that’s true. If a malicious god gave us a computer with infinite or nigh-infinite computing power, we could probably have AIXI up and running within a few days. Similar comments apply to brain emulation—things like the Blue Brain project indicate our scanning ability, poor as it may seem, is still way beyond our ability to run the scanned neurons.

Even if you don’t interpret ‘hardware problem’ quite that generously, you still have an argument for hard takeoff—this is the ‘hardware overhang’ argument: if you prefer to argue that software is the bottleneck, then you have the problem that when we finally blunder into a working AI, it will be running on hardware far beyond what was needed for an intelligently-written AI.

So you’re faced with a bit of a dilemma. Either hardware is the limit in which case Moore’s law means you expect an AI soon and then quickly passing human with a few more cranks of the law, or you expect an AI much further out, but when it comes it’ll improve even faster than the other kind would.

I think this survey is a really good illustration of why degrees of belief are so helpful.

I don’t see why you call this a “crackpot belief”. The ( extended ) Church-Turing Thesis has near-universal acceptance and implies that humans can be simulated by turing machines. Similarly, it is widely accepted that Conways Game of Life can run turing machines . Physicists who don’t believe this are widely regarded as controversial.

among the extremely small subset of mankind who have studied it.

Exactly.

And Cryonics is based on the idea that medical death and information death are distinct. This isn’t a crackpot belief either.

And many worlds? Feynman and Hawking and many other well known theoretical physicists have supported it.

And that probability theory only exists “in the mind” isn’t that controversial either.

So, ummm …. these beliefs are not controversial but they are low-status?

And that is evidence that MW is not a low-status, or crackpot, belief. Certainly not among physicists. Just like “you can run people on game of life” is not a low-status belief, certainly not among computer scientists.

Sure, these beliefs are low-status in communities that are low-status by less wrong standards (e.g. various kinds of non-reductionists). And this seems quite unavoidable given some of LW’s goals

Right, so whether a belief is low status is (among other things) a property of the audience.

But even if the audience consists of people who “who like philosophy and [are] familiar with the different streams and philosophical dilemmas, who know computation theory and classical physics, who [have] a good understanding of probability and math and somebody who [are] naturally curious reductionists”, which is a very educated audience, then the cognitive gap is still so large that it cannot be bridged in casual conversation.

I think it’s fair to say a highly educated reductionist audience is considered high status by almost any standard[1]. And my claim is, and my experience is, that if you casually slip in a LW-style argument then because of the cognitive gap you won’t be able to explain exactly what you mean, because it’s extraordinarily difficult to fall back on arguments that don’t depend on the sequences or any other prerequisites.

If you have a belief that you can’t explain coherently then I think people will assume that’s because your understanding of the subject matter is bad, even though that’s not the problem at all. So if you try to explain your beliefs but fail to do so in a manner that makes sense (to the audience) then you face a social penalty.

[1] we can’t get away with defining every group that doesn’t reason like we do as low-status

Extreme non-reductionists tend to form communities with inverted status-ladders (relative to ours) where the high-status members constantly signal adherence to certain baseless assertions.

A: Hi! Have you ever heard of cellular automata?

B: No. What is it?

A: Well basically you take a large cartesian grid and every cell can have 2 values : “alive” or “dead”. And you modify it using these simple rules … and you can get all kinds of neat patterns.

B: Ah, I might have read something like that somewhere.

A: Did you know it’s turing-complete?

B: What?

A: Yes, you can run any computer on such a grid! Neat, huh.

B: One learns a new thing every day… (Note: I have gotten this exact response when I told a friend, a mathematician, about the turing-completeness of the game of life)

A: So, you’re a reductionist, right? No magical stuff inside the brain?

B: Yes, of course.

A: So in principle, we could simulate a human on a computer, right?

B: For sufficiently large values of “in principle”, yes.

A: So we can run a human on game of life!

B: Oh right. “In principle”. Why should I care, again?

OK, fictional evidence, I have only tried the first half of this conversation in reality.

This conversation starts from the non-controversial side, slowly building the infrastructure for the final declaration. If you have friends tolerant enough for you to introduce the LW sequences conversation by conversation in a “had you ever heard” type of way, and you have a lot of time, this will work fine.

However, the OP seems to be about the situation where you start by underestimating the inferential gap and saying something as if it should be obvious, while it still sounds crazy to your audience. How do you rescue yourself from that without a status hit, and without being dishonest?

I think you should have introduced your point much earlier on (perhaps at the beginning).

This reminds me of some old OB posts, I think, on non-conformity—the upshot being that you can’t get away with being public on

allthe ways you are a maverick and to do so is self-sabotaging.Also tangentially related is Paul Graham’s old essay What You Can’t Say.

Related: On interpreting maverick beliefs as signals indicating rationality:

Undiscriminating Skepticism

Only related, though; I take Eliezer as pointing out that individual beliefs are rational but beliefs are highly correlated with other beliefs, so any one position doesn’t allow much inference. The OP and Hanson are discussing more practical signaling issues unrelated to epistemic inferences.

Do you have a title or link?

http://www.overcomingbias.com/2007/06/against_free_th.html and http://www.overcomingbias.com/2007/06/how_to_be_radic.html seem to be what I was thinking of.

Thank you!

I think the Sequences paint a picture of scientists on Many Worlds that is just wrong. Sure, if you count

all scientists. But if you just look at the ones whose opinions matter: 58 percent think its true. Eighteen percent disagree.I have, but I don’t. A couple I agree with and there are some others about which I can at least see how they could be used as a straw man. Then there are some which are just way off.

Then there is:

That I agree with. And it doesn’t qualify as a ‘crackpot belief’.

You probably should have opened with that. It’s true and basically universally accepted here already.

Concealing unconventional beliefs with high inferential distance to those you are speaking with makes sense. Dismissing those beliefs with the absurdity heuristic does not.

Also, I think you underestimate the utility of rhetorical strategies. For example, you could:

Talk about these weird beliefs in a hypothetical, facetious manner (or claim you had been).

Close the inferential difference gradually using the Socratic method.

Introduce them to the belief indirectly. For example, you could link them to a more conventional LessWrong sequence post and let them investigate the others on their own.

Ask them for help finding what is objectively and specifically wrong with the weird belief.

Most of the statements you make are false in their connotations, but there’s one statement you make (and attribute to “Bayesian Bob”) that seems false no matter what way you look at it, and it’s this one: “A statement, any statement, starts out with a 50% probability of being true” Even the rephrasing “in a vacuum we should believe it with 50% certainty” still seems simply wrong. Where in the world did you see that in Bayesian theory?

For saying that, I label you a Level-0 Rationalist. Unless someone’s talking about binary digits of Pi, they should generally remove the concept of “50% probability” from their minds altogether.

A statement, any statement, starts out with a probability that’s based on its

complexity, NOT with a^{50}⁄_{50}probability. “Alice is a banker” is a simpler statement than “Alice is a feminist banker who plays the piano.”. That’s why the former must be assigned greater probability than the latter.You’re wrong in labeling me a level-0 rationalist, for I have never believed in that fallacy. I’m familiar with MML, Kolmogorov complexity, and so forth.

What I meant is that given a proposition P, and its negation non-P, that if both have the same level of complexity and no evidence is provided for either then the same probability must be assigned to both (because otherwise you can be exploited, game theory wise).

The unplausbility of a proposition, measured in complexity, is just the first piece of counter-evidence that it must overcome.

If you don’t know what the proposition is, but just that there IS a proposition named P, then there is no way to calculate its complexity. But even if you don’t know what the statement is and if you don’t know what its complexity is and even if you don’t know what kind of evidence there is that supports it you must still be willing to assign a number to it. And that number is 50%.

I’m not sure why you’d assume that the MML of a random proposition is only one bit...

A complex proposition P (long MML) can have a complex negation (also with long MML) and you’d have no reason to assume you’d be presented with P instead of non-P. The positive proposition P is unlikely if its MML is long, but the proposition non-P, despite its long MML is then likely to be true.

If you have no reason to believe you’re more likely to be presented with P than with non-P, then my understanding is that they cancel each other out.

But now I’m not so sure anymore.

edit: I’m now pretty sure again my initial understanding was correct and that the counterarguments are merely cached thoughts.

I think often “complicated proposition” is used to mean “large conjunction” e.g. A&B&C&D&...

In this case its negation would be a large disjunction, and large disjunctions, while in a sense complex (it may take a lot of information to specify one) usually have prior probabilities close to 1, so in this case complicated statements definitely don’t get probability 0.5 as a prior. “Christianity is completely correct” versus “Christianity is incorrect” is one example of this.

On this other hand, if by ‘complicated proposition’ you just mean something where its truth depends on lots of factors you don’t understand well, and is not itself necessarily a large conjunction, or in any way carrying burdensome details, then you may be right about probability 0.5. “Increasing government spending will help the economy” versus “increasing government spending will harm the economy” seems like an example of this.

My claim is slightly stronger than that. My claim is that the correct prior probability of

anyarbitrary proposition of which we know nothing is 0.5. I’m not restricting my claim to propositions which we know are complex and depend on many factors which are difficult to gauge (as with your economy example).I think I mostly agree. It just seemed like the discussion up to that point had mostly been about complex claims, and so I confined myself to them.

However, I think I cannot fully agree about any claim of which we know nothing. For instance, I might know nothing about A, nothing about B|A, and nothing about A&B, but for me to simultaneously hold P(A) = 0.5, P(B|A) = 0.5 and P(A&B) = 0.5 would be inconsistent.

“B|A” is not a proposition like the others, despite appearing as an input in the P() notation. P(B|A) simply stands for P(A&B)/P(A). So you never “know nothing about B|A”, and you can consistently hold that P(A) = 0.5 and P(A&B) = 0.5, with the consequence that P(B|A) = 1.

The notation P(B|A) is poor. A better notation would be P_A(B); it’s a different function with the same input, not a different input into the same function.

Fair enough, although I think my point stands, it would be fairly silly if you could deduce P(A|B) = 1 simply from the fact that you know nothing about A and B.

Well, you can’t—you would have to know nothing about B and A&B, a very peculiar situation indeed!

EDIT:This is logically delicate, but perhaps can be clarified via the following dialogue:-- What is P(A)?

-- I don’t know anything about A, so 0.5

-- What is P(B)?

-- Likewise, 0.5

-- What is P(C)?

-- 0.5 again.

-- Now compute P(C)/P(B)

-- 0.5/0.5 = 1

-- Ha! Gotcha! C is really A&B; you just said that P(A|B) is 1!

-- Oh; well in that case, P(C) isn’t 0.5 any more: P(C|C=A&B) = 0.25.

As per my point above, we should think of Bayesian updating as the function P varying, rather than its input.

I believe that this dialogue is logically confused, as I argue in this comment.

This is the same confusion I was originally having with Zed. Both you and he appear to consider knowing the explicit form of a statement to be knowing something about the truth value of that statement, whereas I think you can know nothing about a statement even if you know what it is, so you can update on finding out that C is a conjunction.

Given that we aren’t often asked to evaluate the truth of statements without knowing what they are, I think my sense is more useful.

Did you mean “can’t”? Because “can” is my position (as illustrated in the dialogue!).

This exemplifies the point in my original comment:

If you know nothing of A and B then P(A) = P(B) = 0.5, P(B|A) = P(A|B) = 0.5 and P(A & B) = P(A|B) * P(B) = 0.25

You do know something of the conjunction of A and B (because you presume they’re independent) and that’s how you get to 0.25.

I don’t think there’s an inconsistency here.

How do you know something about the conjunction? Have you manufactured evidence from a vacuum?

I don’t think I am presuming them independent, I am merely stating that I have no information to favour a positive or negative correlation.

Look at it another way, suppose A and B are claims that I know nothing about. Then I also know nothing about A&B, A&(~B), (~A)&B and (~A)&(~B) (knowledge about any one of those would constitute knowledge about A and B). I do not think I can consistently hold that those four claims all have probability 0.5.

If you know nothing about A and B, then you know something about A&B. You know it is the conjunction of two things you know nothing about.

Since A=B is a possibility the uses of “two things” here is bit specious. You’re basically saying you know A&B but that could stand for anything at all.

You know that either A and B are highly correlated (one way or the other) or P(A&B) is close to P(A) P(B).

Yeah, and I know that A is the disjunction of A&B and A&(~B), and that it is the negation of the negation of a proposition I know nothing about, and lots of other things. If we reading a statement and analysing its logical consequences to count as knowledge then we know infinitely many things about everything.

In that case it’s clear where we disagree because I think we are completely justified in assuming independence of any two unknown propositions. Intuitively speaking, dependence is

hard. In the space of all propositions the number of dependent pairs of propositions is insignificant compared to the number of independent pairs. But if it so happens that the two propositions are not independent then I think we’re saved by symmetry.There are a number of different combinations of A and ~A and B and ~B but I think that their conditional “biases” all cancel each other out. We just don’t know if we’re dealing with A or with ~A, with B or with ~B. If for every bias there is an equal and opposite bias, to paraphrase Newton, then I think the independence assumption must hold.

Suppose you are handed three closed envelopes each containing a concealed proposition. Without any additional information I think we have no choice but to assign each unknown proposition probability 0.5. If you then open the third envelope and if it reads “envelope-A & envelope-B” then the probability of that proposition changes to 0.25 and the other two stay at 0.5.

If not 0.25, then which number do you think is correct?

Okay, in that case I guess I would agree with you, but it seems a rather vacuous scenario. In real life you are almost never faced with the dilemma of having to evaluate the probability of a claim without even knowing what that claim is, it appears in this case that when you assign a probability of 0.5 to an envelope you are merely assigning 0.5 probability to the claim that “whoever filled this envelope decided to put a true statement in”.

When, as in almost all epistemological dilemmas, you can actually look at the claim you are evaluating, then even if you know nothing about the subject area you should still be able to tell a conjunction from a disjunction. I would never, ever apply the 0.5 rule to an actual political discussion, for example, where almost all propositions are large logical compounds in disguise.

This can’t be right. An unspecified hypothesis can be as many sentence letters and operators as you like, we still don’t have any information about it’s content and so can’t have any P other than 0.5. Take any well-formed formula in propositional logic. You can make that formula say anything you want by the way you assign semantic content to the sentence letters (for propositional logical, not the predicate calculus where can specify indpendence). We have

conventionswhere we don’t do silly things like say “A AND ~B” and then have B come out semantically equivalent to ~A. It is also true that two randomly chosen hypotheses from a large set of mostly independent hypotheses are likely to be independent. But this is a judgment that requires knowing something about the hypothesis: which we don’t, by stipulation. Note, it isn’t just causal dependence we’re worried about here: for all we know A and B are semantically identical. By stipulation we know nothing about the system we’re modeling- the ‘space of all propositions’ could be very small.The answer for all three envelopes is, in the case of complete ignorance, 0.5.

I think I agree completely with all of that. My earlier post was meant as an illustration that once you say C = A & B that you’re no longer dealing with a state of complete ignorance. You’re in complete ignorance of A and B, but not of C. In fact, C is completely defined as being the conjunction of A and B. I used the illustration of an envelope because as long as the envelope is closed you’re completely ignorant about its contents (by stipulation) but once you open it that’s no longer the case.

So the probability that all three envelopes happen to contain a true hypothesis/proposition is 0.125 based on the assumption of independence. Since you said “mostly independent” does that mean you think we’re not allowed to assume complete independence? If the answer isn’t 0.125, what is it?

edit:

If your answer to the above is “still 0.5” then I have another scenario. You’re in total ignorance of A. B denotes the probability of rolling a a 6 on a regular die. What’s the probability that A & B are true? I’d say it has to be

^{1}⁄_{12}, even though it’s possible that A and B are not independent.If you don’t know what A is and you don’t know what B is and C is the conjunction of A and B, then you don’t know what C is. This is precisely because, one

cannotassume the independence of A and B. If you stipulate independence then you are no longer operating under conditions of complete ignorance. Strict, non-statistical independence can be represented as A!=B. A!=B tells you something about the hypothesis- its a fact about the hypothesis that we didn’t have in complete ignorance. This lets us give odds other than 1:1. See my comment here.With regard to the scenario in the edit, the probability of A & B is

^{1}⁄_{6}because we don’t know anything about independence. Now, you might say: “Jack, what are the chances A is dependent on B?! Surely most cases will involve A being something that has nothing to do with dice, much less something closely related to the throw of that particular dice.” But this kind of reasoning involves presuming things about the domain A purports to describe. The universe is really big and complex so we know there are lots of physical events A could conceivably describe. But what if the universe consisted only of one regular die that rolls once! If that is the only variable then A will =B. That we don’t live in such a universe or that this universe seems odd or unlikely are reasonable assumptions only because they’re based on our observations. But in the case of complete ignorance, by stipulation, we have no such observations. By definition, if you don’t know anything about A then you can’t know more about A&B then you know about B.Complete ignorance just

means0.5, its just necessarily the case that when one specifies the hypothesis one provides analytic insight into the hypothesis which can easily change the probability. That is, any hypothesis that can be distinguished from an alternative hypothesis will give us grounds for ascribing a new probability to that hypothesis (based on the information used to distinguish it from alternative hypotheses).Thanks for the explanation, that helped a lot. I expected you to answer 0.5 in the second scenario, and I thought your model was that total ignorance “contaminated” the model such that something + ignorance = ignorance. Now I see this is not what you meant. Instead it’s that something + ignorance = something. And then likewise something + ignorance + ignorance = something according to your model.

The problem with your model is that it clashes with my intuition (I can’t find fault with your arguments). I describe one such scenario here.

My intuition is that the probability of these two statements should not be the same:

A. “In order for us to succeed one of 12 things need to happen”

B. “In order for us to succeed all of these 12 things need to happen”

In one case we’re talking about a disjunction of 12 unknowns and in the second scenario we’re talking about a conjunction. Even if some of the “things” are not completely uncorrelated that shouldn’t affect the total estimate

thatmuch. My intuition is that saying P(A) = 1 − 0.5 ^ 12 and P(B) = 0.5 ^ 12. Worlds apart! As far as I can tell you would say that in both cases the best estimate we can make is 0.5. I introduce the assumption of independence (I don’t stipulate it) to fix this problem. Otherwise the math would lead me down a path that contradicts common sense.The number of possible probability distributions is far larger than the two induced by the belief that P, and the belief that ~P.

If at this point you don’t agree that the probability is 0.5 I’d like to hear your number.

P(A) = 2^-K(A).

As for ~A, see: http://lesswrong.com/lw/vs/selling_nonapples/ (The negation of a complex proposition is much vaguer, and hence more probable (and useless))

Okay, it seems to me we’re simply talking about different things. “Any statement” communicated to me that the particulars of the statements must be taken into account, just not the evidential context. So when you say “any statement starts out with a 50% probability of being true”, this communicated to me that you mean the probability AFTER the sentence’s complexity has been calculated.

But you basically meant what I’d have understood by “truth-claim of any

unknownstatement*”. In short not evaluating the statement “P” but rather to evaluate “The unknown statement P is true”.At this point your words improve to being from “simply wrong” to “most massively flawed in their potential for miscommunication” :-)

I think you were too convinced I was wrong in your previous message for this to be true. I think you didn’t even consider the possibility that complexity of a statement constitutes evidence and that you had never heard the phrasing before. (Admittedly, I should have used the words “total ignorance”, but still)

Your previous post strikes me as a knee-jerk reaction. “Well, that’s

obviouslywrong”. Not as an attempt to seriously consider under which circumstances the statement could be true. You also incorrectly claimed I was an ignoramus rationalist (for which you didn’t apologize) which only provides further evidence you didn’t really think before you started writing your critique (because who seriously considers the opinions of an ignoramus?).And now, instead of just saying “Oops” you shift the goalpost from “false no matter what way you look at it” to something fuzzy where we’re simply talking about different things.

This is blatant intellectual dishonesty.

You are probably right, but I would suggest you to phrase your reaction less combatively. Especially the last sentence is superfluous; it doesn’t contain any information and only heats up the debate.

“Any proposition starts out with a 50% probability of being true” is still utterly wrong. Because “any” indicates multiplicity. At the point where you claim these proposition “begin”, they aren’t even differentiated into

differentpropositions; they’re nothing but the abstraction of a letter P as in “the unknown proposition P”.I’ve conceded that you were instead talking about an abstraction of statements, not any

actualstatements. At this point, if you want to duel it out to the end, I will say that you failed at using language in order to communicate meaning, and you were abstracting words to the point of meaninglessness.edit to add: And as a sidenote, even

unknownstatements can’t be divided into 50% chance of truth and 50% falsehood, as there’s always the chances of self-referential contradiction (e.g. the statement “This statement is wrong”, which can never be assigned a True/False value), self-referential validity (e.g. The statement “This statement is true”, which can be assigned either a true or false value), confusion of terms (e.g. The statement “A tree falling in the woods makes a sound.” which depends on how one defines a “sound”.), utter meaninglessness (“Colorless green ideas sleep furiously”) etc, etc.Now that I’ve cooled off a bit, let me state in detail my complaint against this comment of yours.

You seem to be asking for the highest amount of charity towards your statements. To the point that I ought strive for many long minutes to figure out a sense in which your words might be correct, even if I’d have to fix your claim (e.g. turn ‘statement’ into ‘proposition’—and add after ‘any proposition starts out’ the parenthetical ‘before it is actually stated in words’) before it actually

becomescorrect.But in return you provide the least amount of charity towards my own statements: I kept using the word “seems” in my original response to you (thus showing it may just be a misunderstanding) and I did NOT use the word ‘ignoramus’ which you accuse me of claiming you to be—I used the term ‘Level-0 rationalist’. You may think it’s okay to paraphrase Lesswrong beliefs to show how they might appear to other people, but please don’t paraphrase me and then ask for an apology for the words you put in my mouth. That’s a major no-no. Don’t put words in my mouth, period.

No, I did not apologize for calling you a Level-0 rationalist; I still do not apologize for putting you in that category, since that’s where your badly chosen words properly assigned you (the vast majority of people who’d say something like “all statements begin with a 50% probability” would truly be Level-0), NOR do I apologize for

statingI had placed you in that category: would you prefer if everyone here had just downvoted your article instead of giving you a chance to clarify that (seemingly) terribly wrong position first?Your whole post was about how badly communicated beliefs confer us low status in the minds of others. It was only proper that I should tell you what a status you had achieved in my mind.

I don’t consider you a Level-0 rationalist anymore. But I consider you an extremely low-level communicator.

Complexity weights apply to worlds/models, not propositions. Otherwise you might as well say:

“Alice is a banker” is a simpler statement than “Alice is a feminist, a banker, or a pianist.”. That’s why the former must be assigned greater probability than the latter.

Agreed. Instead of complexity, I should have probably said “specificity”.

“Alice is a banker” is a less complicated statement than “Alice is a feminist, a banker, or a pianist”, but a more specific one.

I think Bayesian Bob should just get better at arguing. It’s the same thing I tell students when they complain that they can’t explain their paper properly in a 4 sentence abstract: The number of possible sentences you might write is very very large. You’re going to have to work a lot harder before I’m convinced that no sequence of 4 sentences will suffice.

My experience has been that if I’m arguing about something I know well and I’m very confident about, it never feels like I’m in danger of “losing status”.

Sure but you pick your arguments, right? If you are in a social situation that won’t permit more than a few sentences to be exchanged on a topic then you certainly can’t expect to take people through more than one level of inference. If you have no idea how many levels of inference are required it would be quite a risking undertaking to explain why everyone should sign up for cryonics, for example.

That’s true. I do avoid “biting off more than I can chew”. But then, I wouldn’t even challenge someone on religion if the context was wrong. I’m not sure the loss of status would come from arguing for “crackpot beliefs”. Rather, if I’m not talking to people who would want to go 10 levels deep with me on an abstract discussion, it’s impolite to put the conversation on that track.

I’m trying to think of arguments I’ve made that have left people a bit horrified. The sort of thing where people have brought it up later and said, “Yeah, but you believe

X”.Once I was talking to some friends about capital punishment, and I suggested that capital punishment would be much better applied to white collar crimes, because those crimes likely involve a more explicit cost/benefit analysis, and they tend to have worse social impacts than a single murder anyway. The inferential distance here is high because it relies on a consequentialist view of the purpose of criminal punishments. I was also being a bit contrarian here just for the sake of it. I’m not at all confident about whether this would be helpful or harmful.

In another similar context, I was explaining how I viewed punishments as strictly deterrents, and didn’t view “justice” as an intrinsic good. The thought experiment I put forward was, if it were all the same to everyone else in the world, and nobody ever knew about it, I would prefer that Hitler had escaped from the bunker and lived out his life happily in isolation somewhere. Or that he died and went to the best heaven imaginable. I guess this is the “Hitler doesn’t deserve so much as a stubbed toe” idea.

I’ve also horrified people with views on child pornography. Arguing that fictive images (cartoons etc) of children shouldn’t be illegal makes people uncomfortable, and questioning the role of child pornography in motivating offenders is also dangerous. I’ve had good and bad discussions about this. Sometimes I’ve also been contrarian about this, too.

These are all similar examples, because they’re the ones that started to come to mind. There may be other cases on different topics, I don’t remember.

Overall I don’t regret talking about these things at all, and I think mostly people find me more interesting for my willingness to “go there”. Hm, I should point out that I believed all these things before reading LessWrong. So maybe the inferential distance isn’t as high anyway.

I agree with everything you said (including the grandparent). Some of the examples you named are primarily difficult because of the ugh-field and not because of inferential distance, though.

One of the problems is that it’s strictly more difficult to explain something than to understand it. To understand something you can just go through the literature at your own pace, look up everything you’re not certain about, and so continue studying until all your questions are answered. When you want to explain something you have to understand it but you also have to be able to figure out the right words to bridge the inferential gap, you have to figure out where the other person’s model differs from yours and so on.

So there will always be a set of problems you understand well enough to be confident they’re true but not well enough to explain them to others.

Anthropomorphic global warming is a belief that falls into this category for most of us. It’s easy to follow the arguments and to look at the data and conclude that yes, it’s humans that are the cause of global warming. But to argue for it successfully? Nearly impossible (unless you have studied the subject for years).

Cryonics is also a topic that’s notoriously difficult to discuss. If you can argue for that effectively my hat’s off to you. (Argue for it effectively ⇒ they sign up)

“Bayesian Bob: … I meant that in a vacuum we should believe it with 50% certainty...”

No we shouldn’t: http://lesswrong.com/lw/jp/occams_razor/

As for proving a negative, I’ve got two words: Modus Tollens.

Bob does need to go back to math class! ;)

You’re right, I should have said “proving non-existence”.

As for the Occam razor (and any formalizations thereof) it’s still 50% for an arbitrary proposition P. You need evidence (for instance in terms of the complexity of the proposition itself) in order to lower the probability of the proposition.

Otherwise I can just present you with two propositions P and Q, where Q happens to be non-P and you’ll assign the same sub-50% probabilities to P and Q, even though exactly one of them is guaranteed to be true. I think that would make you exploitable.

Modus Tollens is: If P, then Q. Not Q. Therefore, not P

But you can’t prove not Q in the first place.

Three more words then, reductio ad absurdum.

Ok, fair.

Suppose we have a statement X, and the only thing we know about X is that it was randomly selected from the set S of statements of 100 characters, with the alphabet consisting of the digits 0 to 9 and the symbols + and =. If

A = ‘X is true’

B = ‘X was randomly selected from the set S of statements of 100 characters, with the alphabet consisting of the digits 0 to 9 and the symbols + and =’

then P(A|B) can be straightforwardly computed by enumerating the set S and checking how many true statements it contains (or some cleverer variation of this). The above quote, on the other hand, suggests that we start with P(A)=0.5, and then… do what? By Bayes’ Theorem, P(A|B) = P(A)*P(B|A)/P(B), but it’s hard to see how that helps.

(In case I’ve chosen a pathological example, what is a good example of starting with .5 probability of a statement being true, and then adjusting that?)

Recall that logical non-omniscience is an open problem. That is, often we get ‘evidence’ in the form of someone pointing out some feature of the hypothesis that, while deducible from it, we were not aware of. For example, if H = “3542423580 is composite” someone might be stumped until they are reminded that integers ending in the digit 0 are all composite. Of course, this fact is deducible from the definition of composite, we just had forgotten it. P(H) now approaches 1, but we don’t have a Bayesian way of talking about what just happened.

Hypothesis specification is just a special case of this problem. The only difference is that instead of pointing out something that is deducible by assuming the hypothesis (think: lines toward the bottom of a proof) we’re stipulating what it means to assume the hypothesis (like reading off the assumptions at the top of a proof). The reason why “any statement starts out with a 50% probability of being true”

soundssilly and is confusing people is that for anyparticularhypothesis the prior will be set, in part, by stipulating the content of the hypothesis—which is a deductive process. And we don’t know how to handle that with Bayesian math.In your example before we have any information we’d assume P(A) = 0.5 and after we have information about the alphabet and how X is constructed from the alphabet we can just calculate the exact value for P(A|B). So the “update” here just consists of replacing the initial estimate with the correct answer. I think this is also what you’re saying so I agree that in situations like these using P(A) = 0.5 as starting point does not affect the final answer (but I’d still start out with a prior of 0.5).

I’ll propose a different example. It’s a bit contrived (well, really contrived, but OK).

Frank and his buddies (of which you are one) decide to rob a bank.

Frank goes: “Alright men, in order for us to pull this off 4 things have to go perfectly according to plan.”

(you think: conjunction of 4 things: 0.0625 prior probability of success)Frank continues: the first thing we need to do is beat the security system (… long explanation follows).

(you think: that plan is genius and almost certain to work (0.9 probability of success follows from Bayesian estimate). I’m updating my confidence to 0.1125)Frank continues: the second thing we we need to do is break into the safe (… again a long explanation follows).

(you think: wow, that’s a clever solution − 0.7 probability of success. Total probability of success 0.1575)Frank continues: So! Are you in or are you out?

At this point you have to decide immediately. You don’t have the time to work out the plausibility of the remaining two factors, you just have to make a decision. But just by knowing that there are two more things that have to go right you can confidently say “Sorry Frank, but I’m out.”.

If you had more time to think you could come up with a better estimate of success. But you don’t have time. You have to go with your prior of total ignorance for the last two factors of your estimate.

If we were to plot the confidence over time I think it should start at 0.5, then go to 0.0625 when we understand a estimate of a conjunction of 4 parts is to be calculated and after that more nuanced Bayesian reasoning follows. So if I were to build an AI then I would make it start out with the universal prior of total ignorance and go from there. So I don’t think the prior is a purely mathematical trick that has no bearing on we way we reason.

(At the risk of stating the obvious: you’re strictly speaking never adjusting based on the prior of 0.5. The moment you have evidence you replace the prior with the estimate based on evidence. When you get more evidence you can

updatebased on that. The prior of 0.5 completely vaporizes the moment evidence enters the picture. Otherwise you would be doing an update on non-evidence.)That’s wildly wrong. “50% probability” is what you assign if someone tells you, “One and only one of the statements X or Y is true, but I’m not going to give you the slightest hint as to what they mean” and it’s questionable whether you can even call that a statement, since you can’t say anything about its truth-conditions.

Any statement for which you have the faintest idea of its truth conditions will be specified in sufficient detail that you can count the bits, or count the symbols, and that’s where the rough measure of prior probability starts—not at 50%. 50% is where you start if you start with 1 bit. If you start with 0 bits the problem is just underspecified.

Update a bit in this direction: That part where Rational Rian said “What the hell do you mean, it starts with 50% probability”, he was perfectly right. If you’re not confident of your ability to wield the math, don’t be so quick to distrust your intuitive side!

What a perfect illustration of what I was talking about when I wrote:

You can call 0 bits “underspecifed” if you like, but the antilogarithm of 0 is still 1, and odds of 1 still corresponds to 50% probability.

Given your preceding comment, I realize you have a high prior on people making simple errors. And, at the very least, this is a perfect illustration of why never to use the “50%” line on a non-initiate: even

Yudkowskywon’t realize you’re saying something sophisticated and true rather than banal and false.Nevertheless, that doesn’t change the fact that

knowing the complexityof a statement isknowing somethingabout the statement (and hencenotbeing intotal ignorance).I still don’t think you’re saying something sophisticated and true. I think you’re saying something sophisticated and nonsensical. I think it’s meaningless to assign a probability to the assertion “understand up without any clams” because you can’t say what configurations of the universe would make it true or false, nor interpret it as a question about the logical validity of an implication. Assigning probabilities to A, B, C as in your linked writing strikes me as equally nonsensical. The part where you end up with a probability of 25% after doing an elaborate calculation based on having no idea what your symbols are talking about is not a feature, it is a bug. To convince me otherwise, explain how an AI that assigns probabilities to arbitrary labels about which it knows nothing will function in a superior fashion to an AI that only assigns probabilities to things for which it has nonzero notion of its truth condition.

“If you know nothing, 50% prior probability” still strikes me as just plain wrong.

That strikes me as even weirder and wrong. So given a variable A which could be every possible variable, I should assign it… 75% and ~A 25%? or 25%, and make ~A 75%? Or what? - Isn’t 50% the only symmetrical answer?

Basically, given a single variable and its negation, isn’t

^{1}⁄_{2}the max-entropy distribution, just as a collection ofnvariables has 1/nas the max-ent answer for them?Okay, I was among the first people here who called Zed’s statement plain wrong, but I now think that there are enough high-status individuals of the community that are taking that same position, that it would serve knowledge more if I explained a bit in what slight sense his statement might not be

completelywrong.One would normally say that you calculate 3^4 by multiplying 3 four times: 3

3333But someone like Zed would say: “No! Every exponential calculation starts out with the number 1. You ought say 3 ^ 4 =1

33 * 3”.And most of us would then say: “What the hell sense does that make? What would it help an AI to begin by multiplying the number 1 with 3? You are not making sense.”

And then Zed would say “But 0^0 = 1 -- and you can only see that if you add the number 1 in the sequence of the numbers to multiply.”

And then we would say “What does it even mean to raise zero in the zeroth power? That has no meaning.”

And we would be right in the sense it has no meaning in the physical universe. But Zed would be right in the sense he’s mathematically correct, and it has mathematical meaning, and equations wouldn’t work without the fact of 0^0=1.

I think we can visualize the “starting probability of a proposition” as “50%” in the same way we can visualize the “starting multiplier” of an exponential calculation as “1″. This starting number really does NOT help a computer calculate anything. In fact it’s a waste of processor cycles for a computer to make that “1*3” calcullation, instead of just using the number 3 as the first number to use.

But “1” can be considered to be the number that remains if all the multipliers are taken away one by one.

Likewise, imagine that we have used both several pieces of evidence and the complexity of a proposition to calculate its probability -- but then for some reason we have to start taking away these evidence -- (e.g. perhaps the AI has to calculate what probability a different AI would have calculated, using less evidence). As we take away more and more evidence, we’ll eventually end up reaching towards 50%, same way that 0^0=1.

I feel compelled to point out that 0^0 is undefined, since the limit of x^0 at x=0 is 1 but the limit of 0^x at x=0 is 0.

Yes, in combinatorics assuming 0^0=1 is sensible since it simplifies a lot of formulas which would otherwise have to include special cases at 0.

If you’re thinking truly reductionistically about programming an AI, you’ll realize that “probability” is nothing more than a numerical measure of the

amount of informationthe AI has. And when the AI counts the number of bits of information it has, it has to start at some number, and that number is zero.The point is about the internal computations of the AI, not the output on the screen. The output on the screen may very well be “ERROR: SYNTAX” rather than “50%” for large classes of human inputs. The human inputs are not what I’m talking about when I refer to unspecified hypotheses like A,B, and C. I’m talking about when, deep within its inner workings, the AI is computing a certain number associated with a string of binary digits. And if the string is empty, the associated number is 0.

The translation of

-- “What is P(A), for totally unspecified hypothesis A?”

-- “50%.”

into AI-internal-speak is

-- “Okay, I’m about to feed you a binary string. What digits have I fed you so far?”

-- “Nothing yet.”

That’s because in almost all practical human uses, “know nothing”

doesn’t actually mean“zero information content”.And here I thought it was a numerical measure of how credible it is that the universe looks a particular way. “Probability” is what I plug into expected utility calculations. I didn’t realize that I ought to be weighing futures based on “the amount of information” I have about them, rather than how likely they are to come to pass.

A wise person once said (emphasis—and the letter c—added):

That’s all we’re talking about here. This is exactly like the biased coin where you don’t know what the bias is. All we know is that our hypothesis is either true or false.

If that’s all we know, there’s no probability other than 50% that we can sensibly assign. (Maybe using fancy words like “maximum entropy” will help.)I fully acknowledge that it’s a rare situation when that’s all we know. Usually, if we know enough to be able to state the hypothesis, we already have enough information to drive the probability away from 50%. I grant this. But 50% is still where the probability gets driven away

from.Denying this is tantamount to denying the existence of the number 0.

Let n be an integer. Knowing nothing else about n, would you assign 50% probability to n being odd? To n being positive? To n being greater than 3? You see how fast you get into trouble.

You need a prior distribution on n. Without a prior, these probabilities are not 50%. They are undefined.

The particular mathematical problem is that you can’t define a uniform distribution over an unbounded domain. This doesn’t apply to the biased coin: in that case, you know the bias is somewhere between 0 and 1, and for every distribution that favors heads, there’s one that favors tails, so you can actually perform the integration.

Finally, on an empirical level, it seems like there are more false n-bit statements than true n-bit statements. Like, if you took the first N Godel numbers, I’d expect more falsehoods than truths. Similarly for statements like “Obama is the 44th president”: so many ways to go wrong, just a few ways to go right.

Edit: that last paragraph isn’t right. For every true proposition, there’s a false one of equal complexity.

I’m pretty certain this intuition is false. It feels true because it’s much harder to come up with a true statement from N bits if you restrict yourself to positive claims about reality. If you get random statements like “the frooble fuzzes violently” they’re bound to be false, right? But for every nonsensical or false statement you also get the negation of a nonsensical or false statement. “not( the frooble fuzzes violiently)”. It’s hard to arrive at a statement like “Obama is the 44th president” and be correct, but it’s very easy to enumerate a million things that do not orbit Pluto (and be correct).

(FYI: somewhere below there is a different discussion about whether there are more n-bit statements about reality that are false than true)

There’s a 1-to-1 correspondence between any true statement and its negation, and the sets aren’t overlapping, so there’s an equal number of true and false statements—and they can be coded in the identical amount of bits, as the interpreting machine can always be made to consider the negation of the statement you’ve written to it.

You just need to add the term ‘...NOT!’ at the end. As in ’The Chudley Cannons are a great team… NOT!”

Or we may call it the “He loves me, he loves me not” principle.

Doesn’t it take more bits to specify NOT P than to specify P? I mean, I can take any proposition and add ”..., and I like pudding” but this doesn’t mean that half of all n-bit propositions are about me liking pudding.

No. If “NOT P” took more bits to specify than “P”, this would also mean that “NOT NOT P” would take more bits to specify than “NOT P”. But NOT NOT P is identical to P, so it would mean that P takes more bits to specify than itself.

With actual propositions now, instead of letters:

If you have the proposition “The Moon is Earth’s satellite”, and the proposition “The Moon isn’t Earth’s satellite”, each is the negation of the other. If a proposition’s negation takes more bits to specify than the proposition, then you’re saying that each statement takes more bits to specify than the other.

Even simpler—can you think any reason why it would necessarily take more bits to codify “x != 5” than “x == 5″?

We’re talking about minimum message length, and the minimum message of NOT NOT P is simply P.

Once you consider double negation, I don’t have any problem with saying that

“the Moon is Earth’s satellite”

is a simpler proposition than

“The following statement is false: the Moon is Earth’s satellite”

The abstract syntax tree for “x != 5” is bigger than the AST of “x == 5“. One of them uses numeric equality only, the other uses numeric equality and negation. I expect, though I haven’t verified, that the earliest, simplest compilers generated more processor instructions to compute “x != 5” than they did to compute “x == 5”

Aris is right. NOT is just an operator that flips a bit. Take a single bit: 1. Now apply NOT. You get 0. Or you could have a bit that is 0. Now apply NOT. You get 1. Same number of bits. Truth tables for A and ~A are the same size.

That’s what i said. But you also said that NOT P takes more bits to specify than P. You can’t have it both ways.

You don’t understand this point. If I’ve already communicated P to you—do you need any further bits of info to calculate NOT P? No: Once you know P, NOT P is

alsoperfectly well defined, which means that NOT P by necessity has the SAME message length as P.You aren’t talking about minimum message length anymore, you’re talking about human conventions. One might just as well reply that since “No” is a two-letter word that means rejection takes less bits to encode than the confirmation of “Yes” which is a three-letter word.

If we have a computer that evaluates statements and returns 1 for true and 0 for false—we can just as well imagine that it returns 0 for true and 1 for false and calculates the

negationof those statements. In fact you wouldn’t be able to KNOW whether the computer calculates the statements or their negation, which means when you’re inputting a statement, it’s the same as inputting its negation.I think I get it. You need n bits of evidence to evaluate a statement whose MML is n bits long. Once you know the truth value of P, you don’t need any more evidence to compute NOT(P), so MML(P) has to equal MML(NOT(P)). In the real world we tend to care about true statements more than false statements, so human formalisms make it easier to talk about truths rather than falsehoods. But for every such formalism, there is an equivalent one that makes it easier to talk about false statements.

I think I had confused the statement of a problem with the amount of evidence needed to evaluate it. Thanks for the correction!

A big thumbs up for you, and you’re very welcome! :-)

I read the rest of this discussion but did not understand the conclusion. Do you now think that the first N Godel numbers would be expected to have the same number of truths as falsehoods?

It turns out not to matter. Consider a formalism G’, identical to Godel numbering, but that reverses the sign, such that G(N) is true iff G’(N) is false. In the first N numbers in G+G’, there are an equal number of truths and falsehoods.

For every formalism that makes it easy to encode true statements, there’s an isomorphic one that does the same for false statements, and vice versa. This is why the set of statements of a given complexity can never be unbalanced.

Gotcha, thanks.

Who said anything about not having a prior distribution? “Let n be a [randomly selected] integer” isn’t even a meaningful statement without one!

What gave you the impression that I thought probabilities could be assigned to non-hypotheses?

This is irrelevant: once you have made an observation like this, you are no longer in a state of total ignorance.

We agree that we can’t assign a probability to a property of a number without a prior distribution. And yet it seems like you’re saying that it is nonetheless correct to assign a probability of truth to a statement without a prior distribution, and that the probability is 50% true, 50% false.

Doesn’t the second statement follow from the first? Something like this:

For any P, a nontrivial predicate on integers, and an integer n, Pr(P(n)) is undefined without a distribution on n.

Define X(n), a predicate on integers, true if and only if the nth Godel number is true.

Pr(X(n)) is undefined without a distribution on n.

Integers and statements are isomorphic. If you’re saying that you can assign a probability to a statement without knowing anything about the statement, then you’re saying that you can assign a probability to a property of a number without knowing anything about the number.

That is not what I claim. I take it for granted that all probability statements require a prior distribution. What I claim is that if the prior probability of a hypothesis evaluates to something other than 50%, then the prior distribution cannot be said to represent “total ignorance” of whether the hypothesis is true.

This is only important at the meta-level, where one is regarding the probability function as a variable—such as in the context of modeling logical uncertainty, for example. It allows one to regard “calculating the prior probability” as a special case of “updating on evidence”.

I think I see what you’re saying. You’re saying that if you do the math out, Pr(S) comes out to 0.5, just like 0! = 1 or a^0 = 1, even though the situation is rare where you’d actually want to calculate those things (permutations of zero elements or the empty product, respectively). Do I understand you, at least?

I expect Pr(S) to come out to be undefined, but I’ll work through it and see. Anyway, I’m not getting any karma for these comments, so I guess nobody wants to see them. I won’t fill the channel with any more noise.

[ replied to the wrong person ]

When is this

everthe situation?Can you give an example of “driving the probability away from 50%”? I note that no one responded to my earlier request for such an example.

No one can give an example because it is logically impossible for it to be the situation, it’s not just rare. It cannot be that “All we know is that our hypothesis is either true or false.” because to know that something is a hypothesis entails knowing more than nothing. It’s like saying “knowing that a statement is either false or a paradox, but having no information at all as to whether it is false or a paradox”.

You seem to be using a translation scheme that I have not encountered before. You give one example of its operation, but that is not enough for me to distill the general rule. As with all translation schemes, it will be easier to see the pattern if we see how it works on several different examples.

So, with that in mind, suppose that the AI were asked the question

-- “What is P(A), for a hypothesis A whose first digit is 1, but which is otherwise totally unspecified?”

What should the AI’s answer be,

priorto translation into “AI-internal-speak”?Why does not knowing the hypothesis translate into assigning the hypothesis probability 0.5 ?

If this is the approach that you want to take, then surely the AI-internal-speak translation of “What is P(A), for totally unspecified hypothesis A?” would be “What proportion of binary strings encode true statements?”

ETA: On second thought, even that wouldn’t make sense, because the truth of a binary string is a property involving the territory, while prior probability should be entirely determined by the map. Perhaps sense could be salvaged by passing to a meta-language. Then the AI could translate “What is P(A), for totally unspecified hypothesis A?” as “What is the expected value of the proportion of binary strings that encode true statements?”.But really, the question “What is P(A), for totally unspecified hypothesis A?” just isn’t well-formed. For the AI to evaluate “P(A)”, the AI needs

alreadyto have been fed a symbol A in the domain of P.Your AI-internal-speak version is a perfectly valid question to ask, but why do you consider it to be the translation of “What is P(A), for totally unspecified hypothesis A?” ?

I don’t see how the claim is “sophisticated and true”. Let P and Q be statements. You cannot simultaneously assign 50% prior probability to each of the following three statements:

P

P & Q

P & ~Q

This remains true even if you don’t know the complexities of these statements.

See here.

I think that either you are making a use-mention error, or you are confusing syntax with semantics.

Formally speaking, the expression “p(

A)” makes sense only ifAis a sentence in some formal system.I can think of three ways to try to understand what’s going in your dialogue, but none leads to your conclusion. Let Alice and Bob be the first and second interlocutor, respectively. Let p be Bob’s probability function. My three interpretations of your dialogue are as follows:

Alice and Bob are using different formal systems. In this case, Bob cannot use Alice’s utterances; he can only mention them.

Alice and Bob are both using the same formal system, so that A, B, and C are sentences—e.g., atomic proposition letters—for both Alice and Bob.

Alice is talking

aboutBob’s formal system. She somehow knows that Bob’s model-theoretic interpretations of the sentences C and A&B are the same, even though [C = A&B] isn’t a theorem in Bob’s formal system. (So, in particular, Bob’s formal system is not complete.)Under the first interpretation, Bob cannot evaluate expressions of the form “p(A)”, because “A” is not a sentence in his formal system. The closest he can come is to evaluate expressions like “p(Alice was thinking of a true proposition when she said ‘A’)”. If Bob attends to the use-mention distinction carefully, he cannot be trapped in the way that you portray. For, while C = A & B may be a theorem in Alice’s system,

(Alice was thinking of a true proposition when she said ‘C’) = (Alice was thinking of a true proposition when she said ‘A’) & (Alice was thinking of a true proposition when she said ‘B’)

is not (we may suppose) a theorem in Bob’s formal system. (If, by chance, it

isa theorem in Bob’s formal system, then the essence of the remarks below apply.)Now consider the second interpretation. Then, evidently, C = A & B is a theorem in Alice and Bob’s shared formal system. (Otherwise, Alice would not be in a position to assert that C = A & B.) But then p,

by definition, will respect logical connectives so that, for example, if p(B & ~A) > 0, then p(A) < p(C). This is trueeven if Bob hasn’t yet worked out that C = A & B is in fact a consequence of his axioms. It just follows from the fact that p is a coherent probability function over propositions.This means that, if the algorithm that determines how Bob answers a question like “What is p(A)?” is indeed an implementation of the probability function p, then he simply will not in all cases assert that p(A) = 0.5, p(B) = 0.5, and p(C) = 0.5.

Finally, under the third interpretation, Bob did not say that p(A|B) = 1 when he said that p(C)/ p(B) = 1, because A&B is not

syntacticallyequivalent to C under Bob’s formal system. So again Alice’s trap fails to spring.How does it makes sense then? Quite a bit more would need to be assumed and specified.

Hence the “only if”. I am stating a necessary, but not sufficient, condition. Or do I miss your point?

Well, we could also assume and specify additional things that would make “p(A)” make sense even if “A” is not a statement in some formal system. So I don’t see how your remark is meaningful.

Do you mean, for example, that p could be a measure and A could be a set? Since komponisto was talking about expressions of the form p(A) such that A can appear in expressions like A&B, I understood the context to be one in which we were already considering p to be a function over sentences or propositions (which, following komponisto, I was equating), and not, for example, sets.

Do you mean that “p(A)” can make sense in some case where A is a sentence, but not a sentence in some formal system? If so, would you give an example? Do you mean, for example, that A could be a statement in some non-formal language like English?

Or do you mean something else?

In my own interpretation, A is a hypothesis -- something that represents a possible state of the world. Hypotheses are of course subject to Boolean algebra, so you could perhaps model them as sentences or sets.

You have made a number of interesting comments that will probably take me some time to respond to.

I’ve been trying to develop a formal understanding of your claim that the prior probability of an unknown arbitrary hypothesis

Amakes sense and should equal 0.5. I’m not there yet, but I have a couple of tentative approaches. I was wondering whether either one looks at all like what you are getting at.The first approach is to let the sample space Ω be the set of all hypotheses, endowed with a suitable probability distribution p. It’s not clear to me what probability distribution p you would have in mind, though. Presumably it would be “uniform” in some appropriate sense, because we are supposed to start in a state of complete ignorance about the elements of Ω.

At any rate, you would then define the random variable

v: Ω → {True, False} that returns theactualtruth value of each hypothesis. The quantity “p(A), for arbitrary unknownA” would be interpreted to mean the value of p(v= True). One would then show that half of the hypotheses in Ω (with respect to p-measure) are true. That is, one would have p(v= True) = 0.5, yielding your claim.I have two difficulties with this approach. First, as I mentioned, I don’t see how to define p. Second, as I mentioned in this comment, “the truth of a binary string is a property involving the territory, while prior probability should be entirely determined by the map.” (ETA: I should emphasize that this second difficulty seems fatal to me. Defining p might just be a technicality. But making probability a property of the territory is fundamentally contrary to the Bayesian Way.)

The second approach tries to avoid that last difficulty by going “meta”. Under this approach, you would take the sample space Ω to be the set of logically consistent possible worlds. More precisely, Ω would be the set of

allvaluation mapsv: {hypotheses} → {True, False} assigning a truth value to every hypothesis. (By calling a mapva “valuation map” here, I just mean that it respects the usual logical connectives and quantifiers. E.g., ifv(A) = True andv(B) = True, thenv(A&B) = True.) You would then endow Ω with some appropriate probability distribution p. However, again, I don’t yet see precisely what p should be.Then, for each hypothesis

A, you would have a random variableV_A: Ω → {True, False} that equals True on precisely those valuation mapsvsuch thatv(A) = True. The claim that “p(A) = 0.5 for arbitrary unknownA” would unpack as the claim that, for every hypothesisA, p(V_A= True) = 0.5 — that is, that each hypothesisAis true in exactly half of all possible worlds (with respect to p-measure).Do either of these approaches look to you like they are on the right track?

ETA: Here’s a third approach which combines the previous two: When you’re asked “What’s p(

A), whereAis an arbitrary unknown hypothesis?”, and you are still in a state of complete ignorance, then you know neither the world you’re in, nor the hypothesisAwhose truth in that world you are being asked to consider. So, let the sample space Ω be the set of ordered pairs (v,A), wherevis a valuation map andAis a hypothesis. You endow Ω with some appropriate probability distribution p, and you have a random variableV: Ω → {True, False} that maps (v,A) to True precisely whenv(A) = True — i.e., whenAis true underv. You give the response “0.5″ to the question because (we suppose) p(V= True) = 0.5.But I still don’t see how to define p. Is there a well-known and widely-agreed-upon definition for p? On the one hand, p is a probability distribution over a countably infinite set (assuming that we identify the set of hypotheses with the set of sentences in some formal language). [

ETA: That was a mistake. The sample space is countable in the first of the approaches above, but there might be uncountably many logically consistent ways to assign truth values to hypotheses.] On the other hand, it seems intuitively like p should be “uniform” in some sense, to capture the condition that we start in a state of total ignorance. How can these conditions be met simultaneously?I think the second approach (and possibly the third also, but I haven’t yet considered it as deeply) is close to the right idea.

It’s pretty easy to see how it would work if there are only a finite number of hypotheses, say n: in that case, Ω is basically just the collection of binary strings of length n (assuming the hypothesis space is carved up appropriately), and each map V_A is evaluation at a particular coordinate. Sure enough, at each coordinate, half the elements of Ω evaluate to 1, and half to 0 !

More generally, one could imagine a probability distribution on the hypothesis space controlling the “weighting” of elements of Ω. For instance, if hypothesis #6 gets its probability raised, then those mappings v in Ω such that v(6) = 1 would be weighted more than those such that v(6) = 0. I haven’t checked that this type of arrangement is actually possible, but something like it ought to be.

Here are a few problems that I have with this approach:

This approach makes your focus on the case where the hypotheses

Ais “unspecified” seem very mysterious. Under this model, we have P(V_A= True) = 0.5 even for a hypothesisAthat is entirely specified, down to its last bit. So why all the talk about how a true prior probability forAneeds to be based on complete ignorance even of the content ofA? Under this model, even if you grant complete knowledge ofA, you’re still assigning it a prior probability of 0.5. Much of the push-back you got seemed to be around the meaningfulness of assigning a probability to anunspecifiedhypothesis. But you could have sidestepped that issue and still established the claim in the OP under this model, because here the claim is true even ofspecifiedhypotheses. (However, you would still need to justify that this model is how we ought to think about Bayesian updating. My remaining concerns address this.)By having Ω be the collection of

allbit strings of lengthn, you’ve dropped the condition that the mapsvrespect logical operations. This is equivalent to dropping the requirement that the possible worlds belogicallypossible. E.g., your sample space would include mapsvsuch thatv(A) =v(~A) for some hypothesisA. But, maybe you figure that this is a feature, not a bug, because knowledge about logical consistency is something that the agent shouldn’t yet have in its prior state of complete ignorance. But then …… If the agent starts out as logically ignorant, how can it work with only a finite number of hypotheses? It doesn’t start out knowing that

A,A&A,A&A&A, etc., can all be collapsed down to justA, and that’s infinitely many hypotheses right there. But maybe you mean for thenhypotheses to be “atomic” propositions, each represented by a distinct proposition letterA,B,C, …, with no logical dependencies among them, and all other hypotheses built up out of these “atoms” with logical connectives. It’s not clear to me how you would handle quantifiers this way, but set that aside. The more important problem is …… How do you ever accomplish any nontrivial Bayesian updating under this model? For suppose that you learn somehow that

Ais true. Now, conditioned onA, what is the probability ofB? Still 0.5. Even if you learn the truth value ofeveryhypothesis exceptB, you still would assign probability 0.5 toB.Is this a description of what the

priordistribution might be like? Or is it a description of whatupdatingon the prior distribution might yield?If you meant the former, wouldn’t you lose your justification for claiming that the prior probability of an unspecified hypothesis is exactly 0.5? For, couldn’t it be the case that most hypotheses are true in most worlds (counted by weight), so that an unknown random hypothesis would be more likely to be true than not?

If you meant the latter, I would like to see how this updating would work in more detail. I especially would like to see how Problem 4 above could be overcome.

Knowing that a statement is a proposition is far from being in total ignorance.

Writing about propositions using the word “statements” and then correcting people who say you are wrong based on true things they say about actual statements would be annoying. Please make it clear you aren’t doing that.

Neither the grandparent nor (so far as I can tell) the great-grandparent makes the distinction between “statements” and “propositions” that you have drawn elsewhere. I used the term “statement” because that was what was used in the great-grandparent (just as I used it in my other comment because it was used in the post). Feel free to mentally substitute “proposition” if that is what you prefer.

Shall I mentally substitute “acoustic vibrations in the air” for “an auditory experience in a brain”?

My previous comment should have sufficed to communicate to you that I do not regard the distinction you are making as relevant to the present discussion. It should be amply clear by this point that I am exclusively concerned with things-that-must-be-either-true-or-false, and that calling attention to a separate class of utterances that do not have truth-values (and therefore do not have probabilities assigned to them) is not an interesting thing to do in this context. Downvoted for failure to take a hint.

Then when Eliezer says:

you shouldn’t say:

as a response to Eliezer making true statements about statements and not playing along with OP’s possible special definition of “statement”. If when Eliezer read “statement”, he’d interpreted it as “proposition”, he might be unreasonable in inferring there was an error, but he didn’t, so he wasn’t. So you shouldn’t implicitly call him out as having made a simple error.

As far as “even

Yudkowskywon’t realize you’re saying something sophisticated and true rather than banal and false” goes, no one can read minds. It is possible the OP meant to convey actual perfect understanding with the inaccurate language he used. Likewise for “Not just low-status like believing in a deity, but majorly low status,” assuming idiosyncratic enough meaning.I’m calling attention to a class that contains all propositions as well as other things. Probabilities may be assigned to statements being true even if they are actually false, or neither true nor false. If a statement is specified to be a proposition, you have information such that a bare 50% won’t do.

Where did you get the idea that “statement” in Eliezer’s comment is to be understood in your idiosyncratic sense of “utterances that may or may not be ‘propositions’”? Not only do I dispute this, I

explicitly did so earlierwhen I wrote (emphasis added):Indeed, it is manifestly clear from this sentence in his comment:

that Eliezer means by “statement” what you have insisted on calling a “proposition”: something with truth-conditions, i.e. which is capable of assuming a truth-value. I, in turn, simply followed this usage in my reply. I have never had the slightest interest in entering a sub-discussion about whether this is a good choice of terminology.

Furthermore, I deny the following:and, indeed, regard the falsity of that claim as a basic background assumption upon which my entire discussion was premised.

Perhaps it would make things clearer if the linguistic terminology (“statement”, “proposition”, etc) were abandoned altogether (being really inappropriate to begin with), in favor of the term “hypothesis”. I can then state my position in (hopefully) unambiguous terms: all hypotheses are either true or false (otherwise they are not hypotheses), hypotheses are the only entities to which probabilities may be assigned, and a Bayesian with

literally zeroinformation about whether a hypothesis is true or false must assign it a probability of 50% -- the last point being an abstract technicality that seldom if ever needs to be mentioned explicitly, lest it cause confusion of the sort we have been seeing here (so that Bayesian Bob indeed made a mistake by saying it, although I am impressed with Zed for having him say it).Make sense now?

You can state it better like this: “A Bayesian with literally zero information about the hypothesis.”

“Zero information about whether a hypothesis is true or false” implies that we know the hypothesis, and we just don’t know whether it’s a member in the set of true propositions.

“Zero information about the hypothesis” indicates what you really seem to want to say—that we don’t know anything about this hypothesis; not its content, not its length, not even who made the hypothesis, or how it came to our attention.

I don’t see how this can make sense in one sense. If we don’t know exactly how it came to our attention, we know that it didn’t come to our attention in a way that stuck with us, so that is some information we have about how it came to our attention—we know some ways that it didn’t come to our attention.

You’re thinking of human minds. But perhaps we’re talking about a computer that knows it’s trying to determine the truth-value of a proposition, but the history of how the proposition got inputted into it got deleted from its memory; or perhaps it was designed to never holds that history in the first place.

So it knows that whoever gave it the proposition didn’t have the power, desire, or competence to tell it how it got the proposition.

It knows the proposition is not from a mind that is meticulous about making sure those to whom it gives propositions know where the propositions are from.

If the computer doesn’t know that it doesn’t know how it learned of something, and can’t know that, I’m not sure it counts as a general intelligence.

What odds does “manifestly clear” imply when you say it? I believe he was referring to either X or Y as otherwise the content of the statement containing “one and only one...X or Y” would be a confusing...coincidence is the best word I can think of. So I think it most likely “call that a statement” is a very poorly worded phrase referring to simultaneously separately statement X or statement Y.

In general, there is a problem with prescribing taboo when one of the two parties is claiming a third party is wrong.

I am impressed by your patience in light of my comments. I think it not terribly unlikely that in this argument I am the equivalent of Jordan Leopold or Ray Fittipaldo (

notan expert!), while you are Andy Sutton.But I still don’t think that’s probable, and think it is easy to see that you have cheated at rationalist’s taboo as one term is replacing the excluded ones, a sure sign that mere label swapping has taken place.

I still think that if I only know that something is a hypothesis and know nothing more, I have enough knowledge to examine how I know that and use an estimate of the hypothesis’ bits that is superior to a raw 0%. I don’t think “a Bayesian with

literally zeroinformation about whether a hypothesis is true or false” is a meaningful sentence. You know it’s a hypothesis because you have information. Granted, the final probability you estimate could be^{50}⁄_{50}.I don’t think it’s that bad. Anything at an inferential distance sounds ridiculous is you just matter-of-factly assert it, but that just means that if you want to tell someone about something at an inferential distance don’t just matter-of-factly assert it. The framing probably matters at least as much as the content.

No. Something like “Bayesian reasoning is better than science” would work.

Not “thousands”. “Astronomically many” would work.

That’s the accelerating change, not the intelligence explosion school of singularity. Only the latter is popular around here.

Add “for sufficiently many dust-specks”.

I also agree with lessdazed’s first three criticisms.

--

Other than these, it’s not a half-bad summary!

In http://lesswrong.com/lw/qa/the_dilemma_science_or_bayes/ Yudkowsky argues that you have to choose and that sometimes science is just plainly wrong. I find his arguments persuasive.

Geez, I said “fraction of a second” for a reason.

Of course. And the accelerating change school of singularity provides the deadline. Friendly AI has to be solved BEFORE computers become so fast moderately intelligent people can brute force an AI.

Even Omega is sometimes wrong. You failed to argue for your claim.

“Fraction” has wrong connotations for this to fix the problem.

Is your refrigerator running?

Rainbows are real.

Not Kurzweil...anything but that

Marxist-style fatalism? Worse than Kurzweil.

Obviousness isn’t a property of statements or facts. This idiom of thought embodies the mind projection fallacy.

I find this a little uncharitable.

Of course I don’t mean that the scientific method is useless (for that would be false within one inference step) nor do I imply with “Time isn’t real” that it doesn’t add up to normality. I’m just referring to timeless physics. (I understand that rainbows are rainbows only relative to the observer, so it is a clever argument.)

Wait, so first you say “anything but kurzweil” and then I point out that same style of old-school AI is likely to end up badly and then it’s suddenly Marxist-style fatalism?

When I quickly enumerate a number of statements about topics that are covered in the sequences then of course they’re easy to misinterpret or to find fault in. The statements are just conclusions without their underlying arguments!

“Is your refrigerator running” has two meanings.

One is that science obviously works.

The other is that the joke based on that phrase relies on misunderstanding and equivocating. Likewise “the scientific method is wrong” has one banal meaning and one fantastical one. You frame it to be misunderstood, and your representations are equivocal misrepresentations. I hope you can keep track of reality even as you misrepresent it to others, and that you have more labels than “right” and “wrong” available.

They’re wrong conclusions that are generally held by either no-one or you alone. In fact, it would take deep thought to more groan-inducingly misrepresent things. Your post is a low entropy product in that things are phrased as badly as recognizably possible, hence I don’t accept the excuse that you were “quickly enumerating” and failed to find a good phrasing. You hit a local maximum of confusion embedded in recognizably quasi-LW ideas/time!

They are deliberate misrepresentations, but taken to such an extreme that I thought a single disclaimer would be sufficient to convince people that to nitpick at the list is to miss the point.

I did put thought into the misrepresentations. The first item in the list starts: “I want people to cut off my head”. I wasn’t exactly subtle.

I’m not apologizing (although given the reaction here I realize that it was a bad idea so I will refrain from making such lists in the future). I had to choose between giving a really good list about beliefs people here are likely to hold (would be too long and distracting) or a list that’s humorous and laughably bad, but that would still touch upon many issues (ethics, cryonics, AI, friendly AI, time, quantum mechanics) to emphasize just how many opinions (mine at least) are influenced by the sequences and the community here.

The list is meant as a reflection on how crazy LessWrong seems to an outsider.

This is a false dichotomy. It would have been better to do one of at least several things:

1) Present beliefs superficially similar to LW beliefs but technically wrong, to show what it looks like to see a LWish looking belief and thinking it ridiculous.

2) Present beliefs that LWers actually hold, but stated in such a way as to appear ridiculous at first pass, to show what it looks like for actual LWish beliefs to appear ridiculous.

I think where you went wrong was trying to do both simultaneously.

I’m sorry if I contributed to an environment in which ideas are too criticized, and/or with clumsy distinguishing between the idea and the idea’s creator, such that people don’t properly float ideas while holding off on proposing solutions, and instead censor themselves. That sort of thing is what the discussion section is for.

One criticism I had that’s more to the “nitpick” than “important, but value-judgement based” end of the spectrum is is how the first statement is a belief about someone’s desires. It’s a gaudy mistake to make, it seems somewhat possible that my harsh general criticism combined with the fact that the critique of that was from me are responsible for you not correcting that bit as I was the one to point it out.

I chose the wording carefully, because “I want people to cut off my head” is funny, and the more general or more correct phrasing is not. But now that it has been thoroughly dissected...

Anyway, since you asked twice I’m going to change way the first statement is phrased. I don’t feel

thatstrongly about it and if you find it grating I’m also happy to change it to any other phrasing of your choosing.I interpret your first post as motivated on a need to voice your disagreement, not motivated based on the expected utility of the post for the community. I’m sometimes guilty of this because sometimes it

seemsalmost criminal to not point out that something is wrong when it is in fact wrong.As a general rule, disagreements voiced in a single sentence “This is false because of X” or “No, this contradicts your second paragraph” come across pretty aggressively. In my experience only very few people respond well to disagreements voiced in that manner. You’ve also accused me of fallacious reasoning twice even though there was no good reason to do so (because more charitable interpretations of what I said are not fallacious).

Causation in general and motivation in particular don’t work like that.

All of my past experiences, excepting none--->me--->my actions

Maybe we can think of something.

I think it is important to keep track of meta levels when talking about beliefs and their relationship to reality.

I think you should stick to doing either of the two sorts of lists I suggested. You say you thought only a single disclaimer was needed, but at least two are:

This is a good example of a false belief resembling a LW one. Looking at it tells me a bit about how others might see a LW belief as radical and false, though not everything as I can see how it isn’t a LW belief.

This is a good example of a true belief phrased to sound unpersuasive and stupid. Looking at it tells me a bit about how others might see a LW belief as radical and false, though not everything as I can see how it is true.

If you so want, then even Rian would agree that the statement is true. I hadn’t mentioned it because I didn’t want to nitpick, but FYI in case you want to post this to the main page.

If you had wanted to say what LW looks like from the outside, there was no reason to tether appearances to actual beliefs.

I would certainly have quoted you, had I seen this comment earlier.

Rainbows are visions, but only illusions, and rainbows have nothing to hide.

Just because I read the sequences doesn’t mean I’m particularly likely to agree with any of them. Some, yes, but not all. Many of the statements you listed are controversial even on LW. If they were unanimously accepted here without further discussion, it would be a

worryingsign.Sure,

unanimousacceptance of the ideas would be worrying sign. Would it be a bad sign if we were 98% in agreement about everything discussed in the sequences? I think that depends on whether you believe that intelligent people when exposed to the same arguments and the same evidence should reach the same conclusion (Aumann’s agreement theorem). I think that disagreement is in practice a combination of (a) bad communication (b) misunderstanding of the subject material by one of the parties (c) poor understanding of the philosophy of science (d) emotions/signaling/dissonance/etc.I think it’s just really difficult to have a fundamental disagreement that isn’t founded on some sort of personal value. Most disagreements can be rephrased in terms of an experiment where both parties will confidently claim the experiment will have different outcomes. By the time such an experiment has been identified the disagreement has dissolved.

Discussion is to be expected because discussions are beneficial for organizing one’s thoughts and because most of us like to discuss the subject material on LW. Persistent disagreement I see mainly as a result of insufficient scholarship.

The first part of this post reminded me of Stranger Than History.

I wonder what should Friendly AI do, when it discovers something that at first sight seems like a “crackpot belief” to its human operators. Let’s assume that the AI is far smarter than humans (and the “crackpot belief” requires many logical steps), but is still in a testing phase and humans don’t believe in its correctness.

If AI tells the discovery openly to humans, they will probably turn it off quickly, assuming there was something wrong in a program.

On the other hand, if the AI predicts that humans are not ready for this information, and tries to hide it, a security subroutine will detect that “AI wants to cheat its masters” and will force a shutdown. Even worse, if the AI decides that the right thing is telling the information to humans, but

not right now, and instead give them first some “sequences” that will prepare them to accept the new information, and only give them the new information when they have changed their thinking… the security subroutine might still evaluate this as “AI wants to manipulate its masters” and force a shutdown.Next question is, what would humans do. I am afraid that after receiving an unbelievable information X, they might simply add “not X” into AI axioms or values. It might seem like the rational thing to do; they will not think: “we don’t like X”, but rather: “X is a cognitive error, possibly one that AIs are prone to, so we should protect our AI against this cognitive error”.

As an example, imagine a world before the quantum physics was discovered; and imagine that AI discovered quantum physics and multiple universes—and gave this all info together to the unprepared humans. Now imagine some new discovery in future, possibly one hundred times less understandable to humans, with even more shocking consequences.

Welcome to Less wrong!

This may be stating the obvious, but isn’t this exactly the reason why there shouldn’t be a subroutine that detects “The AI wants to cheat its masters” (or any similar security subroutines)?

The AI has to look out for humanity’s interests (CEV) but the manner in which it does so we can safely leave up to the AI. Take for analogy Eliezer’s chess computer example. We can’t play chess as well as the chess computer (or we could beat Grand Masters of chess ourselves) but we can predict the outcome of the chess game when we play against the computer: the chess computer finds a winning position against us.

With a friendly AI you can’t predict what it will do, or even why it will do it, but if we get FAI right then we can predict that the actions will steer humanity in the right direction.

(Also building an AI by giving it explicit axioms or values we desire is a really bad idea. Much like the genie in the lamp it is bound to turn out that we don’t get what we think we asked for. See http://singinst.org/upload/CEV.html if you haven’t read it already)

p(convincing argument exists) >= (Number of Bayesians post-sequence—Number of Bayesians pre-sequence) / Number of people reading sequence.

Or, in simpler terms, “the sequences

area convincing chain of argument, and theyareeffective.” I’ll admit I’m working with a smart group of people, but none of my friends have had trouble with any of the inferential steps in the sequences (if I jump in to the advanced stuff, I’ll still lose them just fine thanks to the miracle of inferential distances, obviously, and I haven’t convinced all my friends of all those points yet :))I assume that people in their pre-bayesian days aren’t even aware of the existence of the sequences so I don’t think they can use that to calculate their estimate. What I meant to get at is that it’s easy to be really certain a belief is false if it it’s intuitively wrong (but not wrong in reality) and the inferential distance is large. I think it’s a general bias that people are disproportionately certain about beliefs at large inferential distances, but I don’t think that bias has a name.

(Not to mention that people are really bad at estimating inferential distance in the first place!)

Are you Zed Shaw?

Personal messages can be sent by clicking on someone’s name and selecting “Send Message” in the top right corner of the page.

I can’t handle this site’s scrutiny right now.

As bcoburn said, the point of this post appears to be to highlight the large inferential distance between mainstream thinking and many settled issues on LW. It doesn’t appear to be trying to refute the LW attitudes. Furthermore:

It is trying to list them as they might be seen by outsiders in the least convenient possible world (and/or real world).

Yes, I understand the purpose of the post. I should have added my reasoning to the end: things are weird if you phrase them weirdly, as in the original post. Use of creative grammar and analogies can negate many of the reactions you would otherwise get. Also, ‘settled issues’ are anything but, even to people who have followed the site since SL4 mailing list. Or perhaps I’m simply diverging from the LW mainstream as time progresses.