French. Product manager at Metaculus. Partial to dark chocolate.
tenthkrige
The scoring system incentivizes predicting your true credence, (gory details here).
I think Metaculus rewarding participation is one of the reasons it has participation. Metaculus can discriminate good predictors from bad predictors because it has their track record (I agree this is not the same as discriminating good/bad predictions). This info is incorporated in the Metaculus prediction, which is hidden by default, but you can unlock with on-site fake currency.
Which you also can’t know if you don’t test other fields. I think there are at least 3 concentric levels to distinguish : ( famous ( intelligent ( STEM ) ) ).
I have also taken the survey.
Content feedback : the inferential distance between Löb’s theorem and spurious counterfactuals seems larger than that of the other points. Maybe that’s because I haven’t internalised the theorem, not being a logician and all.
Unnecessary nitpick: the gears in the robot’s brain would turn just fine as drawn: since the outer gears are both turning anticlockwise, the inner gear would just turn clockwise. (I think my inner engineer is showing)
I’m colorblind. I have color cones in my eyes, but the red ones are mutated towards higher wavelengths (i.e. green). This makes red-green-brown, blue-purple and grey-pink hard to distinguish.
As a result, I pay quite a lot of attention to colors and shades in everyday life. I don’t trust my eyes and often test my perceptions against other people’s (“Hey, is that shirt green or yellow?”). To the point that I actually discern more shades than most people. I’m sometimes wrong about their names, but I see shades other people don’t notice, e.g. me: “This grey has more red than green in it.” someone else: “What are you talking about, it’s just grey.”
On these occasions, the only one who agrees with me on subtle hue differences is actually my sister, who is not colorblind, but has been painting as a hobby for 20-ish years, and is also accustomed to pay attention to hues.
Also the injunction not to trust your senses / brain has always seemed obvious to me. So was the idea of testing your beliefs against the real world / other people’s belief.
Bottom line, you can apparently train your perception by not trusting them. And color blindness taught me distrust in my wetware.
P.S. : I’ve started playing an instrument when I was 7 and was also very surprised to learn several years later that people can’t pick apart instruments when listening to a piece of music.
Well that’s a mindset I don’t encounter often irl. Do you estimate you’re a central example in your country / culture ?
Pretty good. I’ve updated weakly toward “it’s okay to locally redefine accepted terms”. [Meta : I didn’t find the transitions from object level to meta level very intelligible, and I think the ‘notable’ facts deserve some example to ground the the whole thing if this is to be more than a quick idea-dump].
Very interesting!
From eyeballing the graphs, it looks like the average Brier score is barely below 0.25. This indicates that GPT-4 is better than a dart-throwing monkey (i.e. predicting a random %age, score of 0.33), and barely better than chance (always predicting 50%, score of 0.25).
It would be interesting to see the decompositions for those two naive strategies for that set of questions, and compare to the sub-scores GPT-4 got.
You could also check if GPT-4 is significantly better than chance.
Tell me if this gets too personal, but do defectors evoke positive emotions? (Because they lower societal expectations?) Or negative emotions? (i.e. you have a sweet spot of cooperation and dislike deviations from it?)
I’m not sure the authority has to be malevolent, it could be incompetent (or something).
So: [authority / authority-wielders are my enemies / outgroup] & [collaborators side with rules / rulemakers / authority] ⇒ collaborators are my outgroup ⇒ I punish them
This seems to predict that people who distrust authority more will punish cooperators more.
The bottom half of the punishment graph does seem to be places where I would distrust authority more.
dieontrasted
Typo?
people who are focused on providing—and incentivized to provide—estimates of the expected number of cases
Can you say more about this? Would users forecast a single number? Would they get scored on how close their number is to the actual number? Could they give confidence intervals?
I don’t think that solves the problem though. There are a lot of people, and many of them believe very unlikely models. Any model we (lesswrong-ish) people spend time discussing is going to be vastly more likely than a randomly selected human-thought-about model. I realise this is getting close to reference class tennis, sorry.
Cool idea. Any model we actually spend time talking about is going to be vastly above the base rate, though. Because most human-considered models are very nonsensical/unlikely.
This was interesting. I think I missed an assumption somewhere, because for , it seems that the penalty is , which seems very low for a -degree polynomial fitted on points.
I almost gave up halfway through, for much the same reasons, but this somehow felt important, the way some sequences/codex posts felt important at the time, so I powered through. I definitely will need a second pass on some of the large inferential steps, but overall this felt long-term valuable.
I find this kind of post very valuable, thank you for writing it so well.
I see someone who seems to see part of the world the same way I do, and I go “can we talk? can we be buds? can we be twinsies? are we on the same team?” and then I realize “oh, no, outside of this tiny little area, they…really don’t agree with me at all. Dammit.”
That rang very close to home, choked me up a little bit. But the good sad, where you put clean socks on and go make the world less terrible.
I’d never thought about it clearly, so thanks for this model.
A behavior I’ve observed (and participated in) that you don’t mention: the group can temporarily splinter. Picture 6 people. Someone explores topic A. Two other people like the new topic. The other 3 listen politely for 1-2 minutes. One of the three bored people explores topic B, addressing a bored neighbor (identified by their silence). The third bored person latches on to them. Then both conversations evolve until one dies down or a leader forcibly merges the two.
(By forcibly merge, I mean: while participating in conversation A, listen to conversation B, wait for a simultaneous pause in both, then respond conspicuously to conversation B, dragging most conversation A participants with you. I have observed myself doing this.) (Single participants can also switch freely.) (I have observed this to work with close friends and relative strangers, but obviously strangers need leaders/confident people to start new conversations, because they have to be in explore mode.)
I think this lowers the exploration barrier, compared to your model.
Just a nitpick, from one non-native English speaker (to another ?), I have been told that the word “retard” is extremely offensive (in american English at least). I’d say up to you to decide if that was your intended effect.
I’m surprised nobody proposed : “This person is promoting a social norm more stringent than my current behavior, I’ll whack him.”. What’s wrong with it ? Sure in this case the social norm is actually beneficial to the whacker, but we’re adaptation-executers, not fitness-maximizers.