That’s a good point, though I do still think you need the right motivation. Where you’re convinced you’re right, it’s very easy to skim past passages that are ‘obviously’ incorrect, and fail to question assumptions.(More generally, I do wonder what’s a good heuristic for this—clearly it’s not practical to constantly go back to first principles on everything; I’m not sure how to distinguish [this person is applying a poor heuristic] from [this person is applying a good heuristic to very different initial beliefs])
Perhaps the best would be a combination: a conversation which hopefully leaves you with the thought that you might be wrong, followed by the book to allow you to go into things on your own time without so much worry over losing face or winning.
Another point on the cause-for-optimism side is that being earnestly interested in knowing the truth is a big first step, and I think that description fits everyone mentioned so far.
I’d guess that reciprocal exchanges might work better for friends:I’ll read any m books you pick, so long as you read the n books I pick.
Less likely to get financial ick-factor, and it’s always possible that you’ll gain from reading the books they recommend.
Perhaps this could scale to public intellectuals where there’s either a feeling of trust or some verification mechanism (e.g. if the intellectual wants more people to read [some neglected X], and would willingly trade their time reading Y for a world where X were more widely appreciated).
Whether or not money is involved, I’m sceptical of the likely results for public intellectuals—or in general for people strongly attached to some viewpoint. The usual result seems to be a failure to engage with the relevant points. (perhaps not attacking head-on is the best approach: e.g. the asymmetrical weapons post might be a good place to start for Deutsch/Pinker)
Specifically, I’m thinking of David Deutsch speaking about AGI risk with Sam Harris: he just ends up telling a story where things go ok (or no worse than with humans), and the implicit argument is something like “I can imagine things going ok, and people have been incorrectly worried about things before, so this will probably be fine too”. Certainly Sam’s not the greatest technical advocate on the AGI risk side, but “I can imagine things going ok...” is a pretty general strategy.
The same goes for Steven Pinker, who spends nearly two hours with Stuart Russell on the FLI podcast, and seems to avoid actually thinking in favour of simply repeating the things he already believes. There’s quite a bit of [I can imagine things going ok...], [People have been wrong about downsides in the past...], and [here’s an argument against your trivial example], but no engagement with the more general points behind the trivial example.
Steven Pinker has more than enough intelligence to engage properly and re-think things, but he just pattern-matches any AI risk argument to [some scary argument that the future will be worse] and short-circuits to enlightenment-now cached thoughts. (to be fair to Steven, I imagine doing a book tour will tend to set related cached thoughts in stone, so this is a particularly hard case… but you’d hope someone who focuses on the way the brain works would realise this danger and adjust)
When you’re up against this kind of pattern-matching, I don’t think even the ideal book is likely to do much good. If two hours with Stuart Russell doesn’t work, it’s hard to see what would.
Unless I’ve confused myself badly (always possible!), I think either’s fine here. The | version just takes out a factor that’ll be common to all hypotheses: [p(e+) / p(e-)]. (since p(Tk & e+) ≡ p(Tk | e+) * p(e+))
Since we’ll renormalise, common factors don’t matter. Using the | version felt right to me at the time, but whatever allows clearer thinking is the way forward.
Taking your last point first: I entirely agree on that. Most of my other points were based on the implicit assumption that readers of your post don’t think something like “It’s directly clear that 9 OOM will almost certainly be enough, by a similar argument”.
Certainly if they do conclude anything like that, then it’s going to massively drop their odds on 9-12 too. However, I’d still make an argument of a similar form: for some people, I expect that argument may well increase the 5-8 range more (than proportionately) than the 1-4 range.
On (1), I agree that the same goes for pretty-much any argument: that’s why it’s important. If you update without factoring in (some approximation of) your best judgement of the evidence’s impact on all hypotheses, you’re going to get the wrong answer. This will depend highly on your underlying model.
On the information content of the post, I’d say it’s something like “12 OOMs is probably enough (without things needing to scale surprisingly well)”. My credence for low OOM values is mostly based on worlds where things scale surprisingly well.
But this is a bit weird; my post didn’t talk about the <7 range at all, so why would it disproportionately rule out stuff in that range?
I don’t think this is weird. What matters isn’t what the post talks about directly—it’s the impact of the evidence provided on the various hypotheses. There’s nothing inherently weird about evidence increasing our credence in [TAI by +10OOM] and leaving our credence in [TAI by +3OOM] almost unaltered (quite plausibly because it’s not too relevant to the +3OOM case).
Compare the 1-2-3 coins example: learning y tells you nothing about the value of x. It’s only ruling out any part of the 1 outcome in the sense that it maintains [x_heads & something independent is heads], and rules out [x_heads & something independent is tails]. It doesn’t need to talk about x to do this.
You can do the same thing with the TAI first at k OOM case—call that Tk. Let’s say that your post is our evidence e and that e+ stands for [e gives a compelling argument against T13+].Updating on e+ you get the following for each k:Initial hypotheses: [Tk & e+], [Tk & e-]Final hypothesis: [Tk & e+]
So what ends up mattering is the ratio p[Tk | e+] : p[Tk | e-]I’m claiming that this ratio is likely to vary with k.
Specifically, I’d expect T1 to be almost precisely independent of e+, while I’d expect T8 to be correlated. My reason on the T1 is that I think something radically unexpected would need to occur for T1 to hold, and your post just doesn’t seem to give any evidence for/against that.I expect most people would change their T8 credence on seeing the post and accepting its arguments (if they’ve not thought similar things before). The direction would depend on whether they thought the post’s arguments could apply equally well to ~8 OOM as 12.
Note that I am assuming the argument ruling out 13+ OOM is as in the post (or similar).If it could take any form, then it could be a more or less direct argument for T1.
Overall, I’d expect most people who agree with the post’s argument to update along the following lines (but smoothly):T0 to Ta: low increase in credenceTa to Tb: higher increase in credenceTb+: reduced credence
with something like (0 < a < 6) and (4 < b < 13).I’m pretty sure a is going to be non-zero for many people.
[[ETA, I’m not claiming the >12 OOM mass must all go somewhere other than the <4 OOM case: this was a hypothetical example for the sake of simplicity. I was saying that if I had such a model (with zwomples or the like), then a perfectly good update could leave me with the same posterior credence on <4 OOM.In fact my credence on <4 OOM was increased, but only very slightly]]
First I should clarify that the only point I’m really confident on here is the “In general, you can’t just throw out the >12 OOM and re-normalise, without further assumptions” argument.
I’m making a weak claim: we’re not in a position of complete ignorance w.r.t. the new evidence’s impact on alternate hypotheses.
My confidence in any specific approach is much weaker: I know little relevant data.
That said, I think the main adjustment I’d make to your description is to add the possibility for sublinear scaling of compute requirements with current techniques. E.g. if beyond some threshold meta-learning efficiency benefits are linear in compute, and non-meta-learned capabilities would otherwise scale linearly, then capabilities could scale with the square root of compute (feel free to replace with a less silly example of your own).
This doesn’t require “We’ll soon get more ideas”—just a version of “current methods scale” with unlucky (from the safety perspective) synergies.
So while the “current methods scale” hypothesis isn’t confined to 7-12 OOMs, the distribution does depend on how things scale: a higher proportion of the 1-6 region is composed of “current methods scale (very) sublinearly”.
My p(>12 OOM | sublinear scaling) was already low, so my p(1-6 OOM | sublinear scaling) doesn’t get much of a post-update boost (not much mass to re-assign).My p(>12 OOM | (super)linear scaling) was higher, but my p(1-6 OOM | (super)linear scaling) was low, so there’s not too much of a boost there either (small proportion of mass assigned).
I do think it makes sense to end up with a post-update credence that’s somewhat higher than before for the 1-6 range—just not proportionately higher. I’m confident the right answer for the lower range lies somewhere between [just renormalise] and [don’t adjust at all], but I’m not at all sure where.
Perhaps there really is a strong argument that the post-update picture should look almost exactly like immediate renormalisation. My main point is that this does require an argument: I don’t think its a situation where we can claim complete ignorance over impact to other hypotheses (and so renormalise by default), and I don’t think there’s a good positive argument for [all hypotheses will be impacted evenly].
Yes, we’re always renormalising at the end—it amounts to saying ”...and the new evidence will impact all remaining hypotheses evenly”. That’s fine once it’s true.
I think perhaps I wasn’t clear with what I mean by saying “This doesn’t say anything...”.I meant that it may say nothing in absolute terms—i.e. that I may put the same probability of [TAI at 4 OOM] after seeing the evidence as before.
This means that it does say something relative to other not-ruled-out hypotheses: if I’m saying the new evidence rules out >12 OOM, and I’m also saying that this evidence should leave p([TAI at 4 OOM]) fixed, I’m implicitly claiming that the >12 OOM mass must all go somewhere other than the 4 OOM case.
Again, this can be thought of as my claiming e.g.:[TAI at 4 OOM] will happen if and only if zwomples workThere’s a 20% chance zwomples workThe new 12 OOM evidence says nothing at all about zwomples
In terms of what I actually think, my sense is that the 12 OOM arguments are most significant where [there are no high-impact synergistic/amplifying/combinatorial effects I haven’t thought of].My credence for [TAI at < 4 OOM] is largely based on such effects. Perhaps it’s 80% based on some such effect having transformative impact, and 20% on we-just-do-straightforward-stuff. [Caveat: this is all just ottomh; I have NOT thought for long about this, nor looked at much evidence; I think my reasoning is sound, but specific numbers may be way off]
Since the 12 OOM arguments are of the form we-just-do-straightforward-stuff, they cause me to update the 20% component, not the 80%. So the bulk of any mass transferred from >12 OOM, goes to cases where p([we-just-did-straightforward-stuff and no strange high-impact synergies occurred]|[TAI first occurred at this level]) is high.
It’s not entirely clear to me either.Here are a few quick related thoughts:
We shouldn’t assume it’s clear that higher-long-term QoL is the primary motivator for most people who do save. For most of them, it’s something their friends, family, co-workers… think is a good idea.
Evolutionary fitness doesn’t care (directly) about QoL.
There may be unhelpful game theory at work. If in some groups where people tend to spend X, there’s quite a bit to gain in spending [X + 1], and a significant loss in spending [X − 1], you’d expect group spending to increase.
Even if we’re talking about [what’s effective] rather than [our evolutionary programming], we’re still navigating other people’s evolutionary programming. Being slightly above/below average in spending may send a disproportionate signal.
The value of a faked signal is higher for people who don’t have other channels to signal something similar.
Other groups likely are sending similar signals in other ways. E.g. consider intellectuals sitting around having lengthy philosophical discussions that don’t lead to action. They’re often wasting time, simultaneously showing off skills that they could be using more productively, but aren’t. (this is also a problem where it’s a genuine waste—my point is only that very few people avoid doing this in some form)
Of course none of this makes it any less of a problem (to the extent it’s bringing down collective QoL) - but possibly a difficult problem that we’d expect to exist.
Solutions-wise, my main thought is that you’d want to find a way to channel signalling-waste efficiently into public goods—so that personal ‘waste’ becomes a collective advantage (hopefully).
It is also worth noting that not all ‘wasteful’ spending is bad for society. E.g. consider early adopters of new and expensive technology: without people willing to ‘waste’ money on the Tesla Roadster, getting electric cars off the ground may have been a much harder problem.
We do gain evidence on at least some alternatives, but not on all the factors which determine the alternatives. If we know something about those factors, we can’t usually just renormalise. That’s a good default, but it amounts to an assumption of ignorance.
Here’s a simple example:We play a ‘game’ where you observe the outcome of two fair coin tosses x and y.You score:1 if x is heads2 if x is tails and y is heads3 if x is tails and y is tailsSo your score predictions start out at:1 : 50%2 : 25%3 : 25%
We look at y and see that it’s heads. This rules out 3.Renormalising would get us:1 : 66.7%2 : 33.3%3: 0%
This is clearly silly, since we ought to end up at 50:50 - i.e. all the mass from 3 should go to 2. This happens because the evidence that falsified 3 points was insignificant to the question “did you score 1 point?”.On the other hand, if we knew nothing about the existence of x or y, and only knew that we were starting from (1: 50%, 2: 25%, 3: 25%), and that 3 had been ruled out, it’d make sense to re-normalise.
In the TAI case, we haven’t only learned that 12 OOM is probably enough (if we agree on that). Rather we’ve seen specific evidence that leads us to think 12 OOM is probably enough. The specifics of that evidence can lead us to think things like “This doesn’t say anything about TAI at +4 OOM, since my prediction for +4 is based on orthogonal variables”, or perhaps “This makes me near-certain that TAI will happen by +10 OOM, since the +12 OOM argument didn’t require more than that”.
If you have a bunch of hypotheses (e.g. “It’ll take 1 more OOM,” “It’ll take 2 more OOMs,” etc.) and you learn that some of them are false or unlikely (only 10% chance of it taking more than 12″ then you should redistribute the mass over all your remaining hypotheses, preserving their relative strengths.
This depends on the mechanism by which you assigned the mass initially—in particular, whether it’s absolute or relative. If you start out with specific absolute probability estimates as the strongest evidence for some hypotheses, then you can’t just renormalise when you falsify others.
E.g. consider we start out with these beliefs:If [approach X] is viable, TAI will take at most 5 OOM; 20% chance [approach X] is viable.If [approach X] isn’t viable, 0.1% chance TAI will take at most 5 OOM.30% chance TAI will take at least 13 OOM.
We now get this new information:There’s a 95% chance [approach Y] is viable; if [approach Y] is viable TAI will take at most 12 OOM.
We now need to reassign most of the 30% mass we have on >13 OOM, but we can’t simply renormalise: we haven’t (necessarily) gained any information on the viability of [approach X].Our post-update [TAI ⇐ 5OOM] credence should remain almost exactly 20%. Increasing it to ~26% would not make any sense.
For AI timelines, we may well have some concrete, inside-view reasons to put absolute probabilities on contributing factors to short timelines (even without new breakthroughs we may put absolute numbers on statements of the form “[this kind of thing] scales/generalises”). These probabilities shouldn’t necessarily be increased when we learn something giving evidence about other scenarios. (the probability of a short timeline should change, but in general not proportionately)
Perhaps if you’re getting most of your initial distribution from a more outside-view perspective, then you’re right.
Oh I’m not claiming that non-wasted wealth signalling is useless. I’m saying that frivolous spending and saving send very different signals—and that saving doesn’t send the kind of signal tEitB focuses on.
Whether a public saving-signal would actually help is an empirical question. My guess is that it wouldn’t help in most contexts where unwise spending is currently the norm, since I’d expect it to signal lack of ability/confidence. Of course I may be wrong.
When considering status, I think wealth is largely valued as an indirect signal of ability (in a broad sense). E.g. compare getting $100m by founding a business vs winning a lottery. The lottery winner gets nothing like the status bump that the business founder gets.This is another reason I think spending sends a stronger overall signal in many contexts: it (usually) says both [I had the ability to get this wealth] and [I have enough ability that I expect not to need it].
This is interesting, but I think largely misses the point that elephant-in-the-brain-style signalling is often about sending the signal “I can afford to waste resources, because I’ve got great confidence in my ability to do well in the future without them”.
Saving just doesn’t achieve this—it achieves the opposite:”Look at all my savings; I can’t afford to waste any resources, since I have little confidence in my ability to do well in the future without them”.
It makes evolutionary sense to signal ability rather than resources, since resources can’t be passed on (until very recently, at least), and won’t necessarily translate to all situations. By wasting resources, you’re signalling your confidence you’ll do well whatever the future throws at you.
If you want a signalling approach that improves the world, I think it has to be conspicuous donation, not conspicuous saving.
Very interesting—I’ll give some thought to answers, but for now a quick cached-thought comment:
A proposed solution: bills cannot be contradicted by bills which pass with less votes.
I don’t think this is practical as a full solution to this problem, since a bill doesn’t need to explicitly contradict a previous bill in order to make the previous one irrelevant.
You’ve made foobles legal? We’ll require fooble licenses costing two years’ training and a million dollars.You’ve banned smarbling? We’ll switch all resources from anti-smarbling enforcement to crack down on unlicensed foobles.
Of course you could craft the fooble/smarbling laws to avoid these pitfalls, but there’s more than one way to smarble a fooble.
Sure, but what I mean is that this is hard to do for hypothesis-location, since post-update you still have the hypothesis-locating information, and there’s some chance that your “explaining away” was itself incorrect (or your memory is bad, you have bugs in your code...).
For an extreme case, take Donald’s example, where the initial prior would be 8,000,000 bits against.Locating the hypothesis there gives you ~8,000,000 bits of evidence. The amount you get in an “explaining away” process is bounded by your confidence in the new evidence. How sure are you that you correctly observed and interpreted the “explaining away” evidence? Maybe you’re 20 bits sure; perhaps 40 bits sure. You’re not 8,000,000 bits sure.
Then let’s say you’ve updated down quite a few times, but not yet close to the initial prior value. For the next update, how sure are you that the stored value that you’ll be using as your new prior is correct? If you’re human, perhaps you misremembered; if a computer system, perhaps there’s a bug...Below a certain point, the new probability you arrive at will be dominated by contributions from weird bugs, misrememberings etc.This remains true until/unless you lose the information describing the hypothesis itself.
I’m not clear how much this is a practical problem—I agree you can update the odds of a hypothesis down to no-longer-worthy-of-consideration. In general, I don’t think you can get back to the original prior without making invalid assumptions (e.g. zero probability of a bug/hack/hoax...), or losing the information that picks out the hypothesis.
It’s worth noting that most of the strong evidence here is in locating the hypothesis.That doesn’t apply to the juggling example—but that’s not so much evidence. “I can juggle” might take you from 1:100 to 10:1. Still quite a bit, but 10 bits isn’t 24.
I think this relates to Donald’s point on the asymmetry between getting from exponentially small to likely (commonplace) vs getting from likely to exponentially sure (rare). Locating a hypothesis can get you the first, but not the second.
It’s even hard to get back to exponentially small chance of x once it seems plausible (this amounts to becoming exponentially sure of ¬x). E.g., if I say “My name is Mortimer Q. Snodgrass… Only kidding, it’s actually Joe Collman”, what are the odds that my name is Mortimer Q. Snodgrass? 1% perhaps, but it’s nowhere near as low as the initial prior.The only simple way to get all the way back is to lose/throw-away the hypothesis-locating information—which you can’t do via a Bayesian update. I think that’s what makes privileging the hypothesis such a costly error: in general you can’t cleanly update your way back (if your evidence, memory and computation were perfectly reliable, you could—but they’re not). The way to get back is to find the original error and throw it out.
How difficult is it to get into the top 1% of traders? To be 50% sure you’re in the top 1%, you only need 200:1 evidence. This seemingly large odds ratio might be easy to get.
I don’t think your examples say much about this. They’re all of the form [trusted-in-context source] communicates [unlikely result]. They don’t seem to show a reason to expect strong evidence may be easy to get when this pattern doesn’t hold. (I suppose they do say that you should check for the pattern—and probably it is useful to occasionally be told “There may be low-hanging fruit. Look for it!”)
I hope you find the time. I hadn’t realised this was happening and would be interested in any thoughts and ideas. It’s an issue that’s high impact, broadly relevant, hard to get good data and easy to reason poorly about—so I’m glad to see some thoughtful discussion.
Is there a source that shows there’s even a correlation? Please link one if there is—perhaps I missed it. The reports I’ve seen don’t suggest any—e.g. bmj report.
From what (little) I’ve seen, this seems to be evidence for the hypothesis “Anecdotes frequently cause officials with bad incentives to make harmful decisions”.
I think it’s important to distinguish between:
1) Rationality as truth-seeking.2) Rationality as utility maximization.
For some of the examples these will go together. For others, moving closer to the truth may be a utility loss—e.g. for political zealots whose friends and colleagues tend to be political zealots.
It’d be interesting to see a comparison between such cases. At the least, you’d want to vary the following:
Having a very high prior on X’s being true.Having a strong desire to believe X is true.Having a strong emotional response to X-style situations.The expected loss/gain in incorrectly believing X to be true/false.
Cultists and zealots will often have a strong incentive to believe some X even if it’s false, so it’s not clear the high prior is doing most/much of the work there.
With trauma-based situations, it also seems particularly important to consider utilities: more to lose in incorrectly thinking things are safe, than in incorrectly thinking they’re dangerous.When you start out believing something’s almost certainly very dangerous, you may be right. For a human, the utility-maximising move probably is to require more than the ‘correct’ amount of evidence to shift your belief (given that you’re impulsive, foolish, impatient… and so can’t necessarily be trusted to act in your own interests with an accurate assessment).
It’s also worth noting that habituation can be irrational. If you’re repeatedly in a situation where there’s good reason to expect a 0.1% risk of death, but nothing bad happens the first 200 times, then you’ll likely habituate to under-rate the risk—unless your awareness of the risk makes the experience of the situation appropriately horrifying each time.
On polar bears vs coyotes:
I don’t think it’s reasonable to label the …I saw a polar bear… sensation as “evidence for bear”. It’s weak evidence for bear. It’s stronger evidence for the beginning of a joke. For [polar bear] the [earnest report]:[joke] odds ratio is much lower than for [coyote].
I don’t think you need to bring in any irrational bias to get this result. There’s little shift in belief since it’s very weak evidence.
If your friend never makes jokes, then the point may be reasonable. (in particular, for your friend to mistakenly earnestly believe she saw a polar bear, it’s reasonable to assume that she already compensated for polar-bear unlikeliness; the same doesn’t apply if she’s telling a joke)
I don’t mean that values converge.I mean that if you take a truth-seeking approach to some fixed set of values, it won’t matter whether you start out analysing them through the lens of utility/duty/virtue. In the limit you’ll come to the same conclusions.
This is interesting. My initial instinct was to disagree, then to think you’re pointing to something real… and now I’m unsure :)First, I don’t think your examples directly disagree with what I’m saying. Saying that our preferences can be represented by a UF over histories is not to say that these preferences only care about the physical history of our universe—they can care about non-physical predictions too (desirable anthropic measures and universal-prior-based manipulations included).
So then I assume we say something like:”This makes our UF representation identical to that of a set of preferences which does only care about the physical history of our universe. Therefore we’ve lost that caring-about-other-worlds aspect of our values. The UF might fully determine actions in accordance with our values, but it doesn’t fully express the values themselves.”Strictly, this seems true to me—but in practice I think we might be guilty of ignoring much of the content of our UF. For example, our UF contains preferences over histories containing philosophy discussions.
Now I claim that it’s logically possible for a philosophy discussion to have no significant consequences outside the discussion (I realise this is hard to imagine, but please try).Our UF will say something about such discussions. If such a UF is both fully consistent with having particular preferences over [anthropic measures, acausal trade, universal-prior-based influence...], and prefers philosophical statements that argue for precisely these preferences, we seem to have to be pretty obtuse to stick with “this is still perfectly consistent with caring only about [histories of the physical world]”.
It’s always possible to interpret such a UF as encoding only preferences directly about histories of the physical world. It’s also possible to think that this post is in Russian, but contains many typos. I submit that это маловероятно.
If we say that the [preferences ‘of’ a UF] are the [distribution over preferences we’d ascribe to an agent acting according to that UF (over some large set of environments)], then I think we capture the “something substantive” with substantial probability mass in most cases.(not always through this kind of arguing-for-itself mechanism; the more general point is that the UF contains huge amounts of information, and it’ll be surprising if the expression of particular preferences doesn’t show up in a priori unlikely patterns)
If we’re still losing something, it feels like an epsilon’s worth in most cases.Perhaps there are important edge cases??
Note that I’m only claiming “almost precisely the information you’re talking about is in there somewhere”, not that the UF is necessarily a useful/efficient/clear way to present the information.This is exactly the role I endorse for other perspectives: avoiding offensively impractical encodings of things we care about.
A second note: in practice, we’re starting out with an uncertain world. Therefore, the inability of a UF over universe histories to express outside-the-universe-history preferences with certainty may not be of real-world relevance. Outside an idealised model, certainty won’t happen for any approach.