Note the symmetry factor with the factorials: we’re computing the probability of the observed counts, not the probability of a particular string of outcomes, so we have to add up probabilities of all the outcomes with the same counts.
Can you clarify why we look at the probability of counts rather than the particular string?
The reason I’m asking is that if a problem has continuous outcomes instead of discrete then we automatically look at the string of outcomes instead of the count (unless we bin the results). Is this just a fundamental difference between continuous and discrete outcomes?
No worries, thanks for fixing my pictures!
This post was accidentally released a day early for a few hours before I moved it back into drafts. Apologies for any confusion.
Fun fact: 7 survey respondents attempted to convert the number of minutes between them and their twin into a fraction of a year (e.g. 9.506E-06 years is 5 minutes). All 7 who did this were the older twin.
(I did include these people in the analysis above)
This provides evidence for the “Older twins care about being the oldest, younger twins don’t talk about it” hypothesis. I don’t think this will come as a massive surprise to anyone.
I understand that the price to swap birth order with your twin is a bowl of soup, although adjusting for 1% yearly inflation over 4000 years this now comes to 193 quadrillion bowls of soup.
Firstly, I really like this kind of thing and enjoyed you analysis.
One thing I think it misses out on Marek’s choice of who to inspect.
Liberal!Marek chooses without knowledge of who is fascist and who is liberal so has a 50:50 chance of selecting a fascist or a liberal. So if he is a liberal there is a 50:50 chance of him selecting a fascist, outing them and getting into this argument. (I’m ignoring the possibility that Marek will just say nothing)
Fascist!Marek already knows who is fascist/liberal and looking at the party membership card is a charade for him. He has 4 options:
1. Choose liberal, claim liberal
2. Choose liberal, claim fascist
3. Choose fascist, claim fascist
4. Choose fascist, clam liberal
On the surface option 3 doesn’t seem likely. Options 1 and 2 are the options investigated in the OP (but assuming liberal was chosen by chance). Option 4 also seems like it might be used.
If we set option 4 to 0% then Marek is guaranteed to choose a liberal and assume the 50:50 bold/timid split for 1&2 then fascist!Marek has a 50:50 chance of getting into this argument—the same as liberal!Marek so this provides no evidence either way.
If we say split the probabilities of option 1,2 and 4 in 25%:25%:50% then we return to the result in the OP. If option 4 is between 0 and 50% likely then the argument happening is somewhere between 0 and 1 bit of evidence in favour of Marek being liberal.
Of course fascist!Marek makes the choice between the 4 options in the knowledge that everyone already thinks he’s probably a fascist (although he’s probably not Hitler). This will effect his choice as he may be extra keen to send a signal that he isn’t a fascist, so would ideally like to not accuse anyone in the knowledge that everyone will probably side with the person he accuses. He might choose option 1 as this will increase that person’s trust in him and also cast doubt on that person in the mind of everyone else. Even option 3 might be appealing—it might harm Marek but it makes the person he accuses look very liberal.
But everyone knows that Marek is in this position and Marek knows that everyone knows so this begins to hurt my head and is also why this kind of game is amazing!
Harry, smiling, had asked Professor Quirrell what level he played at, and Professor Quirrell, also smiling, had responded, One level higher than you. - HPMor
The first mistake you mention is exactly the mistake I make when I don’t convert to odds form as I mentioned here.
If I start with P(Marekliberal)=1/2 and him accusing gives me 1 bit of evidence (he’s twice as likely to accuse if he’s liberal) then the temptation is to split the uncertainty in half and update incorrectly to P(Marekliberal|accuse|)=3/4 .
Odds form helps − 1:1 becomes 2:1 after 1 bit of evidence so P(Marekliberal|accuse|)=2/3.
I find if I try using probabilities in Bayes in my head then I make mistakes. If I start at 1⁄4 probability and get 1 bit of evidence to lower this further then I think “ok, Ill update to 1/8”. If I use odds I start at 1:3, update to 1:6 and get the correct posterior of 1⁄7.
So essentially I’m constantly going back and forth—like you I find probabilities easier to picture but find odds easier for updates.
For an introduction to MCMC aimed at a similar level target audience, I found this explanation helpful.
Communication requires both input and output channels. All of the instances I can think of from the animal world involve a sense (hearing, sight, smell, touch) which has evolved with a different benefit. Then an output can evolve to take communicate using this sense as the input.
This seems orders of magnitude less complex than evolving input and output simultaneously which would be required for direct brain communication (a least I can’t think of another option).
Even if it could potentially happen, before it did there would be many instances of indirect communication evolving. Take-off happening first in a species with indirect communication is a fairly inevitable consequence of the relative complexity of the evolutions required.
Imagine a second agent which has the same preferences but an anti-status-quo preference between mushroom and pepperoni.
This would be exploitable by a third agent who is able to compare mushroom and pepperoni but assigns equal utilities to both. However the original agent described in the OP would not be able to exploit agent 2 (if agent 1′s status-quo bias is larger than agent 2′s anti-status-quo bias), so agent 3 dominates agent 1 in terms of performance.
Over multiple dimensions agent 3 becomes much more complex than agent 1. Having a status quo bias makes sense as a way to avoid being exploited whilst also being less computationally expensive than tracking or calculating every preference ordering.
Assuming agent 2 is rare, the loss incurred by not being able to exploit others is small.
Start with lower-effort posts, to get a sense of how people react to the headline and thesis statement.
Shortform seems like a great way to do this.
In removing the O(1) terms I think we’re removing all of the widths of the peak in the various dimensions. So in the case where the widths are radically different between the models this would mean that N would need to be even larger for BIC to be a useful approximation.
The widths issue might come up, for example, when an additional parameter is added which splits the data into 2 populations with drastically different population sizes—the small population is likely to have a wider peak.
Is that right?
Thanks for this sequence, I’ve read each post 3 or 4 times to try to properly get it.
Am I right in thinking that in order to replace dP[θ]=dθ we not only require a uniform prior but also that θ span unit volume?
The last one appears to be 2016 (this was a slightly wider survey which included other rationalist communities) which was before the lesswrong 2.0 relaunch. I haven’t heard of any plans for surveys—maybe a mod can fill us in.
Slatestarcodex does an annual survey of its readers. Scott pre-registers some investigations and then reports on results. This year, for example, he got a negative result on “Math preference vs Corn eating style” and more interesting results in the ongoing birth-order investigation.
My own feelings on MBTI are similar to this SSC post—it’s unscientific but manages to kinda work as long as you don’t expect too much of it. I wouldn’t make any life decisions based on it!
For the third part of the question we don’t have to guess—the 2012 lesswrong survey included an MBTI question. Of the people who answered, 65% were INTP or INTJ, compared to 5-9% of Americans according to the MBTI website.
Thanks for this.
The description of a big blind:
Big blind: the minimal money/poker chips that every player must bet in order to play. For example, $0.1 would be a reasonable amount in casual play.
sounds more like an ante than a big blind. This is important for understanding the discussion of limping in Ars Technica.
Yes, that’s definitely upward selection pressure but I think that’s more evidence for “ability to solve problems” being the cause of our intelligence rather than “ability to transmit culture”.
Most cultural processes could be transmitted by being shown what to do and punished if you do it wrong. Language makes it easier but isn’t necessarily required. Chimps have some fairly complex tool kits knowledge of which appear to be transmitted culturally.
A version of this that I hear fairly often is “it’s common sense that...”
It works in the same way in that it makes it socially costly to argue against but is more insidious than “everybody knows” (at least in my circles “it’s common sense” has more of a veneer of respectability).
Both also have their proper uses which I think makes the improper uses more difficult to counter.
Thanks for this. I’m trying to get an intuition on how this works.
My mental picture is to imagine the likelihood function with respect to theta of the more complex model. The simpler model is the equivalent of a square function with height of its likelihood and width 1.
The relative areas under the graphs reflect the likelihood of the models. So if picturing the relative maximum likelihoods and how sharp the peak is on the more complex model gives an impression of the Bayes factor.
Does that work? Or is there a better mental model?
From the literature on self esteem
Previously, I thought that self-worth was like an estimate of how valuable you are to your peers
is sociometer theory and
Now I think there’s an extra dimension which has to do with simpler dominance-hierarchy behavior.
is hierometer theory.
Hierometer theory is relatively new (2016) and could be though of as a subset of sociometer theory if sociometer theory is interpreted more broadly. Accordingly it has less research backing it up and that which is there is mostly by the original proponents of the theory.
This paper gives an introduction to both and a summary of evidence (I found this diagram a useful explanation of the difference). The paper suggests that both are true to some extent and complement each other.
I’ve included some quotes below.
Sociometer theory starts from the premise that human beings have a fundamental need to belong (Baumeister and Leary, 1995). Satisfying this need is advantageous: group members, when cooperating, afford one another significant opportunities for mutual gain (von Mises, 1963; Nowak and Highfield, 2011; Wilson, 2012). Accordingly, if individuals are excluded from key social networks, their prospects for surviving and reproducing are impaired. It is therefore plausible to hypothesize that a dedicated psychological system evolved to encourage social acceptance (Leary et al., 1995).
The original version of sociometer theory (Leary and Downs, 1995; Leary et al., 1995) emphasizes how self-esteem tracks social acceptance, by which is implied some sort of community belongingness, or social inclusion.
In contrast, the revised version (Leary and Baumeister, 2000) emphasizes how self-esteem tracks relational value, defined as the degree to which other people regard their relationship with the individual as important or valuable overall, for whatever reason.
Like sociometer theory, hierometer theory proposes that self-regard serves an evolutionary function. Unlike sociometer theory, it proposes that this function is to navigate status hierarchies. Specifically, hierometer theory proposes that self-regard operates both indicatively—by tracking levels of social status—and imperatively—by regulating levels of status pursuit (Figure 1).
Note here some key differences between hierometer theory and dominance theory (Barkow, 1975, 1980), another alternative to sociometer theory (e.g., Leary et al., 2001). Dominance theory, plausibly interpreted, states that self-esteem tracks, not levels of social acceptance or relational value, but instead levels of “dominance” or “prestige,” by which some social or psychological, rather than behavioral, construct is meant.
Accordingly, hierometer theory proposes that higher (lower) prior social status promotes a behavioral strategy of augmented (diminished) assertiveness, with self-regard acting as the intrapsychic bridge—in particular, tracking social status in the first instance and then regulating behavioral strategy in terms of it. Note that the overall dynamic involved is consolidatory rather than compensatory: higher rather than lower status is proposed to lead to increased assertiveness. In this regard, hierometer theory differs from dominance theory, which arguably implies that it is losses in social status that prompt attempts to regain it (Barkow, 1980).
… our findings are arguably consistent with the revised version of sociometer theory, which is equivocal about the type of relational value that self-esteem tracks, and by extension, the type of social acceptance that goes hand in hand with it. Indeed, hierometer theory, and the original version of sociometer theory, might each be considered complementary subsets of the revised version of sociometer theory, if the latter is construed very broadly as a theory which states that types of social relations (status, inclusion), which constitute different types of relational value, regulate types of behavioral strategies (assertiveness, affiliativeness) via types of self-regard (self-esteem, narcissism). If so, then our confirmatory findings for hierometer theory, and mixed findings for the original version of sociometer theory, would still suggest that the revised version of sociometer theory holds truer for agentic variables than for communal ones.