I agree with this response; using first principles is a heuristic, and heuristics always have pros and cons. Just in terms of performance, the benefit is that you can re-assess assumptions but the cost is that you ignore a great amount of information gathered by those before you. Depending on the value of this information, you should frequently seek it out, as least as a supplement to your derivation.
Connor_Flexman
I think people have already considered this, but the strategies converge. If someone else is going to make it first, you have only two possibilities: seize control by exerting a strategic advantage, or let them keep control but convince them to make it safe.
To do the former is very difficult, and the little bit of thinking that has been done about it has mostly exhausted the possibilities. To do the latter requires something like 1) giving them the tools to make it safe, 2) doing enough research to convince them to use your tools or fear catastrophe, and 3) opening communications with them. So far, MIRI and other organizations are focusing on 1 and 2, whereas you’d expect them to primarily do 1 if they expected to get it first. We aren’t doing 3 with respect to China, but that is a step that isn’t easy at the moment and will probably get easier as time goes on.
As you say, the inner circle certainly may have reason to do non-obvious things. But while withholding information from people can be occasionally politically helpful, it seems usually best for the company to have the employees on the same page and working toward a goal they see reason for. Because of this, I would usually assume that seemingly poor decisions in upper management are the result of actual incompetence or a deceitful actor in the information flow on the way down.
Don’t know why the discrepancy, but it seems to me that a great deal of postrationality is littered with historical examples.
I also share your skepticism of clear psychological progression, but would point out plenty of times that people diverge in some ways but converge in more meta ones, e.g. divergence to liberal or conservative but convergence in political acumen, or e.g. divergence to minimalism or luxury but convergence to environmental modification.
It seems important to be extremely clear about the criticism’s target, though. I agree overanalysis is a failure mode of certain rationalists, and statistically more so for those who comment more on LW and SSC (because of the selection effect specifically for those who nitpick). But rationality itself is not the target here, merely naive misapplication of it. The best rationalists tend to cut through the pedantry and focus on the important points, empirically.
I’m not confident these are the right gears, and you might be asking for refined gears than mine, but my working hypothesis is something like:
The umbrella concept of weirdness is about whether people can predict your actions, since this is extremely useful information to track for a social animal. Predictability and therefore weirdness are tracked on a variety of levels—you can be weird because of your sleep schedule, or weird because of your nervous tics and body language, or weird because you talk in a very normal manner about the impending alien rapture, or weird just because you’re a foreigner. The weirdness of an action registers as flags on various mental levels to help you predict when that person later might not do the canonical action, and it registers with a magnitude and some metadata to help you track their weird trait(s) for inner simming. To answer the question of how much disconformity is “enough” to be labeled weird, I have to hand-wave and say that typical people’s social neural nets just get very good at inferring what infractions correspond to how much likelihood of what level of difficulty coordinating with them. (If this is the meat of the question, I could say more later).
Unfortunately, “weird” has had some semantic drift since unpredictable often happens to correlate with “being a less valuable ally” in a variety of ways for systemic or intrinsic reasons. Two important subtypes of weird that this is evident in are 1) the people whom you talk about that are just kind of loners, and 2) the people who actually provide frequent disvalue. The loners are “weird” because they can and do take actions the group hasn’t decided on, which makes them harder to coordinate with and significantly less predictable. But this also correlates with them being weird in other ways, and so it is rightly seen as Bayesian evidence for other problems by their peers—and further, people who sometimes leave the group are just less valuable allies (for dependability, for gossip, etc). When I do focusing on the weirdness of loners, I can kind of pick out these distinct feelings (of which I think the third is most prominent), along with other more personal ones like “weird → unpredictable → higher likelihood of new ideas → valuable” and similar.
I think “weird” has mutated into a slur nowadays because of the subtype of those who provide disvalue and the ways that those traits correlate with weirdness (and why it’s hard to get gears on the different types of nonconformity). You certainly can have good weird, where someone is unpredictable but in ways that everyone repeatedly likes (though they are still tracked as “weird”, importantly). But since a large part of social coordination is being predictable, the people who have fine control over their many levels of dials often do work largely within predictable ranges, and only the best optimizers can escape the local optima and be correct without too much disvalue on the way—which means that most people who aren’t being predictable are doing so because of an inability. And since most people can hit the small range of highly valuable parameter space we call “normal”, that gets set as baseline value, so a vast proportion of other actions are negative. So people who have difficulties with certain dials will regularly cause disvalue in various ways, which means that the trait of “weird” is now correlated with bad actions.
After writing this out, I’m wondering whether I should have called “weird” specifically “negative unpredictability”, and call “positive predictability” something like “interesting”. The people I think of as least weird and those I think of as least interesting both end up as “boring”, in the sense of a very predictable wind-up doll. I think you can have separate tickers for both weirdness and interestingness, but often people will black-and-white it one way or the other (and indeed argue whether someone is “weird” or “interesting”). The needle-threading of getting people to follow you demands an entire toolbox of gears itself, but some heuristics on just pushing the scales a little further from bad unpredictability:
One good way is to use your unpredictable actions to help your peers, as in noticing others are hungry and striking out on your own to fix the problem, or hitting the sweet spot of high-level predictability low-level unpredictability we call humor. Another, probably more important way, is to put a little extra effort at being extra predictable when around: prove you’re normal with small talk, say normal stuff about yourself, and forge social ties or commit to the group in other ways so they can know that you’ll (mostly) be there for them. Allies have to be dependable.
I am confused as to how the propositional consistency and function work together to prevent the trolling in the final step. Suppose I do try to find pairs of sentences such that I can show and also to drive down. Does this fail because you are postulating non-adversarial sampling, as ESRogs mentions? Or is there some other reason why propositional consistency is important here?
I expect certain changes in information flow to affect things somewhat. Anonymity on the internet allowed people to humorize their own laziness and patheticness without unmasking, which seems to have significantly increased common knowledge about lots of people being mentally unwell or otherwise bad at traditionally valued things like hard work. As this gets normalized I expect it to further erode adherence to mask-like values and promote the cluster of things like “be true to yourself” and “it’s ok to be depressed and seek help” and other MtG red/green over white. In fact, the selection effect of internet heroes being young, engaged in the gig economy, non-neurotypical, etc may create a sort of new value stratum if it doesn’t percolate further.
The social media bubble effect seems like it could also lead to a further divergence of values along various class/bubble lines as Vaniver mentioned was the case historically. This might be exacerbated on the economic axis if we keep seeing capital growth gaining relative to wages, though I don’t know much about that trend.
I actually had some similar alarm bells go off for conflation of concepts in the op, especially because the post specifically gestures at one concept and doesn’t give explanations of the different examples where this might come up.
However, on second thought I think I do like the concept this builds. To phrase it in your formal terms, I think it’s very useful to notice all the systems in which the Taylor series for has , ESPECIALLY when it’s comparably easy to control via rather than just .
In this light, you can view momentum, exponential growth, heavy-tails, etc., as all cases where a main component of controlling or predicting future is by paying attention to the term, and I claim this is an important revelation to have at a variety of levels.
Perhaps more relevant to your actual crux, I also get shudders when people overload physics terms with other meanings, but before they were physics terms they were concepts for intuitive things. Given that we view the world through physical metaphors, I think it’s quite important for us to use the best-fitting words for concepts. Then we can remind people of the different variants when people run into conflationary trouble. If we start off by naming things with poor associations we hold ourselves back more. If you have alternative name to “momentum” for this that you also think have good connotations though, I’d love to hear them.
This is a good point. I was wondering why civic/public is much more functional in meatspace than cyber, whereas a lot of internet communities that seem good are more gated—and I think this is due to the civic/public being sort of superficial, because the actual gatekeepers are in all sorts of transaction costs and social barriers one doesn’t normally notice (or are deliberately obscured).
I don’t think most of the motivation is supposed to come in at the level of doing the final work that might win the award—I agree it seems like Nobel prizes, knighthoods, Hugo and Nebula, etc all aren’t being consciously thought about too much during the year or two beforehand.
“Making the industry seem relevant rather than encouraging behaviors” rings more true. The motivation seems to happen when younger people see that this is a thing society values. That node downstream of the award drives them through years of striving.
I still don’t love the term “subagents”, despite everyone getting lots out of it, as well as personally agreeing with the intentional stance and the “alliances” you mention. I think my crux-net is something like
agents are strategic
fragments of our associative mental structures aren’t strategic except insofar as their output calls other game theoretic substructures or you are looking at something like the parliamentary moderator
if you think of these as agents, you will attribute false strategy to them and feel stuck more often, when in fact they are easily worked with if you think of their apparent strategy as “using highly simplistic native associations and reinforcements, albeit sometimes by pinging other fragments to do things outside their own purview, to accomplish their goal”
However, it does seem possible to me that the “calling other fragments” step does actually chain so far as to constitute real strategy and offer a useful level of abstraction for viewing such webs as subagents. I haven’t seen much evidence for this—does this framing make sense, and do you think it is clear there is something more like Turing-complete webs of strategy within subagents vs merely pseudostrategy? Wish I had a replacement word I liked better than subagent.
Possibly another good example of scientists failing to use More Dakka. The mice studies all showed solid effects, but then the human studies used the same dose range (10^9 or 10^10 CFU) and only about half showed effects! Googled for negative side effects of probiotics and the healthline result really had to stretch for anything bad. Wondering if, as much larger organisms, we should just be jacking up the dosage quite a bit.
I was initially very concerned about this but then noticed that almost all the tested secondary endpoints were positive in the mice studies too. The human studies could plausibly still be meaningless though.
Has anyone (esp you Jim) looked into fecal transplants for this instead, in case our much longer digestive system is a problem?
The folk theory of lying is a tiny bit wrong and I agree it should be patched. I definitely do not agree we should throw it out, or be uncertain whether lying exists.
Lying clearly exists.
1. Oftentimes people consider how best to lie about e.g. them being late. When they settle on the lie of telling their boss they were talking to their other boss and they weren’t, and they know this is a lie, that’s a central case of a lie—definitely not motivated cognition.
To expand our extensional definition to noncentral cases, you can consider some other ways people might tell maybe-lies when they are late. Among others, I have had the experiences [edit: grammar] of
2. telling someone I would be there in 10 minutes when it was going to take 20, and if you asked me on the side with no consequences I would immediately have been able to tell you that it was 20 even though in the moment I certainly hadn’t conceived myself as lying, and I think people would agree with me this is a lie (albeit white)
3. telling someone I would be there in 10 minutes when it was going to take 20, and if you asked me on the side with no consequences I would have still said 10, because my model definitely said 10, and once I started looking into my model I would notice that probably I was missing some contingencies, and that maybe I had been motivated at certain spots when forming my model, and I would start calculating… and I think most people would agree with me this is not a lie
4. telling someone I would be there in 10 minutes when it was going to take 20, and my model was formed epistemically virtuously despitely obviously there being good reasons for expecting shorter timescales, and who knows how long it would take me to find enough nuances to fix it and say 20. This is not a lie.
Ruby’s example of the workplace fits somewhere between numbers 1 and 2. Jessica’s example of short AI timelines I think is intended to fit 3 (although I think the situation is actually 4 for most people). The example of the political fact-checking doesn’t fit cleanly because politically we’re typically allowed to call anything wrong a “lie” regardless of intent, but I think is somewhere between 2 and 3 and I think nonpartisan people would agree that, unless the perpetrators actually could have said they were wrong about the stat, the case was not actually a lie (just a different type of bad falsehood reflecting on the character of those involved). There are certainly many gradations here, but I just wanted to show that there is actually a relatively commonly accepted implicit theory about when things are lies that fits with the territory and isn’t some sort of politicking map distortion as it seemed you were implying.
The intensional definition you found that included “conscious intent to deceive” is not actually the implicit folk theory most people operate under: they include number 2′s “unconscious intent to deceive” or “in-the-moment should-have-been-very-easy-to-tell-you-were-wrong obvious-motivated-cognition-cover-up”. I agree the explicit folk theory should be modified, though.
I also want to point out that this pattern of explicit vs implicit folk theories applies well to lots of other things. Consider “identity”—the explicit folk theory probably says something about souls or a real cohesive “I”, but the implicit version often uses distancing or phrases like “that wasn’t me” [edit: in the context of it being unlike their normal self, not that someone else literally did it] and things such that people clearly sort of know what’s going on. Other examples include theory of action, “I can’t do it”, various things around relationships, what is real as opposed to postmodernism, etc etc. To not cherry-pick, there are some difficult cases to consider like “speak your truth” or the problem of evil, but under nuanced consideration these fit with the dynamic of the others. I just mention this generalization because LW types (incl me) learned to tear apart all the folk theories because their explicit version were horribly contradictive, and while this has been very powerful for us I feel like an equally powerful skill is figuring out how to put Humpty-Dumpty back together again.
On both the piece and the question, I feel consistently confused that people keep asking “is long-range forecasting feasible” as a binary in an overly general context, which, as TedSanders mentioned, is trivially false in some cases and trivially true in others.
I get that if you are doing research on things, you’ll probably do research on real-world-esque cases. But if you were trying to prove long-term forecasting feasibility-at-all (which Luke’s post appears to, as it ends with sounding unsure about this point), you’d want to start from the easiest case for feasibility: the best superforecaster ever predicting the absolute easiest questions, over and over. This is narrow on forecasters and charitable on difficulty. I’m glad to see Tetlock et al looking at a narrower group of people this time, but you could go further. And I feel like people are still ignoring difficulty, to the detriment of everyone’s understanding.
If you predict coin tosses, you’re going to get a ROC AUC of .5. Chaos theory says some features have sensitive dependence to initial conditions that are at too low of resolution for us to track, and that we won’t be able to predict these. Other features are going to sit within basins of attraction that are easy to predict. The curve of AUC over time should absolutely drop off over time like that, because more features slip out of predictability as time goes on. This should not be surprising! The real question is “which questions are how predictable for which people?” (Evidently not the current questions for the current general forecasting pool.)
There are different things to do to answer that. Firstly, two things NOT to do that I see lots:
1. Implying low resolution/AUC is a fault without checking calibration (as I maybe wrongly perceive the above graph or post as doing, but have seen elsewhere in a similar context). If you have good calibration, then a .52 AUC can be fine if you say 50% to most questions and 90% to one question; if you don’t, that 90% is gonna be drowned out in a sea of other wrong 90%s
2. Trying to zero out questions that you give to predictors, e.g. “will Tesla produce more or less than [Tesla’s expected production] next year?”. If you’re looking for resolution/AUC, then baselining on a good guess specifically destroys your ability to measure that. (If you ask the best superforecaster to guess whether a series of 80% heads-weighted coin flips comes up with an average more than .8, they’ll have no resolution, but if you ask what the average will be from 0 to 1 then they’ll have high resolution.) It will also hamstring your ability to remove low-information answers if you try subtracting background, as mentioned in the next list.
Some positive options if you’re interested in figuring out what long-term questions are predictable by whom:
1. At the very least, ask questions you expect people to have real information about
2. Ask superforecasters to forecast metadata about questions, like whether people will have any resolution/AUC on subclasses of questions, or how much resolution/AUC differently ranked people will have on subclasses, or whether a prediction market would answer a question better (e.g. if there is narrowly-dispersed hidden information that is very strong). Then you could avoid asking questions that were expected to be unpredictable or wasteful in some other way.
3. Go through and trying to find simple features of predictable vs unpredictable long-term questions
4. Amplify the informational signal by reducing the haze of uncertainty not specific to the thing the question is interested in (mostly important for decade+ predictions). One option is to ask conditionals, e.g. “what percent chance is there that CRISPR-edited babies account for more than 10% of births if no legislation is passed banning the procedure” or something if you know legislation is very difficult to predict; another option is to ask about upstream features, like specifically whether legislation will be passed banning CRISPR. (Had another better idea here but fell asleep and forgot it)
5. Do a sort of anti-funnel plot or other baselining of the distribution over predictors’ predictions. This could look like subtracting the primary-fit beta distribution from the prediction histogram to see if there’s a secondary beta, or looking for higher-order moments or outliers of high credibility, or other signs of nonrandom prediction distribution that might generalize well. A good filter here is to not anchor them by saying “chances of more than X units” where X is already ~the aggregate mean, but instead make them rederive things (or to be insidious, provide a faulty anchor and subtract an empirical distribution from around that point). Other tweaked opportunities for baseline subtraction abound.
If Luke is primarily just interested in whether OpenPhil employees can make long-term forecasts on the kind of thing they forecast on, they shouldn’t be looking at resolution/AUC, just calibration, and making sure it’s still good at reasonably long timescales. To bootstrap, it would speed things along if they used their best forecasters to predict metadata—if there are classes of questions that are too unpredictable for them, I’m sure they can figure that out, especially if they spot-interviewed some people about long-term predictions they made.
Not core, but when you say
(I don’t know if this is related, but it seems interesting to me that the human mind feels as though it lives in ‘the world’—this one concrete thing—though its epistemic position is in some sense most naturally seen as a probability distribution over many possibilities.)
It’s notable that it seems like some plausible probabilistic models of neuroscience are formatted such that only one path actually fires (is experienced), and the probability only comes in at the level of the structure weighting the probability of which path might fire.
Really like this explanation, especially the third example and conclusion.
I feel like a similar mental move helps me understand and work with all sorts of not-yet-operationalized arguments in my head (or that other people make). If I think people are “too X”, and then I think about what my other options to have said were there, it helps me triangulate about what thing I actually mean. I think this is much faster and more resilient to ladder-of-abstraction mistakes (as you mention) than many operationalization techniques, like trying to put numbers on things.
I think my personal mental move is less like being aware of all the things I could have said, and more like being aware that the thing I was saying was a stand-in meant to imply lots of specific things that are implausible to articulate in their own form.
Does FDT make this any clearer for you?
There is a distinction in the correlation, but it’s somewhat subtle and I don’t fully understand it myself. One silly way to think about it that might be helpful is “how much does the past hinge on your decision?” In smoker’s lesion, it is clear the past is very fixed—even if you decide to not to smoke, that doesn’t affect the genetic code. But in Newcomb’s, the past hinges heavily on your decision: if you decide to one-box, it must have been the case that you could have been predicted to one-box, so it’s logically impossible for it to have gone the other way.
One intermediate example would be if Omega told you they had predicted you to two-box, and you had reason to fully trust this. In this case, I’m pretty sure you’d want to two-box, then immediately precommit to one-boxing in the future. (In this case, the past no longer hinges on your decision.) Another would be if Omega was predicting from your genetic code, which supposedly correlated highly with your decision but was causally separate. In this case, I think you again want to two-box if you have sufficient metacognition that you can actually uncorrelate your decision from genetics, but I’m not sure what you’d do if you can’t uncorrelate. (The difference again lies in how much Omega’s decision hinges on your actual decision.)
Sharing goals is definitely a tricky decision, as you note. I think it has even more subtly than your proposed dichotomy, though.
Getting positive feedback just for proposing a goal takes away the positive future reward, but you still have reason to avoid the negative future reward of failing your commitments. Getting negative feedback early gives a positive future reward of showing people up, but this is little better than the future reward would have been anyways and comes hand in hand with an increased fear that your detractors will indeed be right.
Your point about avoiding early and undeserved praise is an important part of maintaining motivations, but I think a better solution would be something like a ring of friends that strongly support goal-achievement and stretch goals as virtuous and frequently check in with each other on incremental progress to incentivize goal-maintenance.