Rob Bensinger comments on Beware boasting about non-existent forecasting track records

Rob Bensinger 25 May 2022 0:40 UTC
12 points
0
He clearly believes he could be placing forecasts showing whether or not he is better. Yet he doesn’t.
Eliezer hasn’t said he thinks he can do better than Metaculus on arbitrary questions. He’s just said he thinks Metaculus is wrong on one specific question. Quoting a point I made in our conversation on Twitter:
[...] From my perspective, it looks like: I don’t think Metaculus performance on physics-loaded tech progress is a perfect proxy for physics knowledge (or is the only way a physicist could think they know better than Metaculus on a single question).
It seems like you’re interpreting EY as claiming ‘I have a crystal ball that gives me unique power to precisely time AGI’, whereas I interpret EY as saying that one particular Metaculus estimate is wrong.
Metaculus being wrong on a particular very-hard-to-forecast question is not a weird or crazy claim, so you don’t need to claim to be a genius.
Obviously EY shouldn’t get a bunch of public “aha, you predicted Metaculus’ timeline was way too long” credit when he didn’t clearly state this in advance (at least before the first update) and hasn’t quantified what “too long” means.
I’m not saying ‘give EY social credit for this’ or even ‘defer to EY’s timelines’. I’m saying ‘EY’s empirical claims are perfectly ordinary and don’t raise epistemic red flags.’
Along with: ‘Nor does it raise epistemic red flags to think Metaculus is wrong about a specific very-unusually-hard question regardless of how you think Metaculus does on a portfolio of other questions about weakly related topics.’
A healthy alignment field should contain many reasonable people making lots of (mutually incompatible) EY-style claims based on their own inside-view models of AGI.
It would be nice if we lived in a world where all such models were easy to quickly and conclusively test, but we don’t actually live there, and acting as though we do (or as though the action must mainly be contained in under-the-streetlamp questions) is not actually useful.
“I don’t have the same inside-view models as you, and you haven’t given me an explicit argument to convince me, so I don’t believe you” is a good and normal response here.
“I’m suspicious of the idea that anyone has inside-view models in this arena, and if you claim to have any relevant knowledge whatsoever that isn’t fully public, then I’ll presume you’re a liar until proven otherwise” is a way weirder response, and seems very off-base to me.
It strikes me as a sign that someone is misunderstanding the field as way more mature and streetlamp-y than it is.
In retrospect, I was wrong about “he didn’t clearly state this in advance (at least before the first update)”, at least if prediction 5121 is a good proxy for AGI/TAI (the thing Eliezer made his prediction about). Quoting myself on May 14:
It looks like at the start of 2022, Metaculus’ prediction for https://www.metaculus.com/questions/5121/date-of-general-ai/ was 2052; currently it’s 2035. In Dec 2021, Eliezer wrote (https://www.lesswrong.com/posts/ax695frGJEzGxFBK4/biology-inspired-agi-timelines-the-trick-that-never-works) that “my outward behavior seems to reveal a distribution whose median seems to fall well before 2050”.
The current prediction for that Q is 2040.
I think that a lot of the disagreement here actually comes down to object-level disagreements about AI:
- Paul (and, I think, Jotto) think AI progress in general is much more predictable than Eliezer does, and also think current SotA AI is much more similar to AGI than Eliezer expects.
  - Therefore from Eliezer’s perspective, there’s not much he should be able to predict about future AI progress; and if he did successfully predict four things about narrow AI, that wouldn’t necessarily be much of an update toward him being able to predict a fifth thing about a very different kind of narrow AI, much less about AGI.
  - It’s possible that Paul (or Jotto) have deep mechanistic knowledge of something about intelligence that makes them able to predict narrow-AI things way better than Eliezer would expect (as opposed to just doing the same normal trendline-extrapolatey thing EY or anyone else would do, with maybe some more minutiae about recent AI news in your cache so you’re a bit more aware of what the current trends are). And it’s possible that this is a deep enough insight to also make them better able to predict things about AGI.
  - But Paul (and Jotto) mostly aren’t claiming to have deep insights like that; and Eliezer hasn’t seen a big list of prediction successes from Paul about this thing Paul claims to be unusually good at (whereas, again, EY makes no claim of being unusually good at timing arbitrary narrow-AI advances); so my Eliezer-model remains unconvinced that Paul and Jotto are doing a super sophisticated thing, and instead my Eliezer-model suspects that they’re making the classic ‘assuming blank parts of my map correspond to blank parts of the territory’ mistake, and thereby assuming that there will be fewer surprise developments and sharp left turns in the future than there were in the past 60 years (for people who lived through multiple decades of the field’s history and observed everyone’s ability to make long-term predictions).
  - Another way of putting it is that Eliezer’s model of AI is much more ‘one damned thing after another’ (rather than everything being an interconnected whole with smooth gradual slopes for all the transitions), and AGI is another damned thing with no necessary connection to particular past narrow-AI damned things.
  - This does imply that it should be possible to operationalize some high-level predictions like ‘here’s a portfolio of 50 tasks; Eliezer will expect everyone to perform relatively poorly at predicting performance on those tasks over the next 5 years (since he thinks AI is less predictable than Paul does), and he will also expect more sudden jumps in capability (since he thinks AI is more discontinuous and not-just-one-big-thing)’.
  - This should become clearer over long time horizons, as Eliezer predicts more and more jumpiness over time and Paul predicts less and less.
Or, not to put too fine a point on it:
- Paul thinks Eliezer is basing his models of AI on a mysterious secret insight into the nature of cognition, which (if correct) should be powerful enough to predict lots of other cognition-y things well, and is not the sort of claim that has a high prior or that allows one to gather much evidence in its favor unless the evidence looks like a bunch of observations of how narrow AI works.
  - From my perspective, every part of this sounds wrong: I share Eliezer’s view in hard takeoff, and I didn’t arrive at my view via knowing how to build AGI myself. It just seems like the obvious implication of ‘AGI will be invented at a certain time, and we don’t already have baby AGIs’ and of ‘human intelligence is crap (especially at e.g. STEM), and intelligence lets you invent things’.
  - The discontinuities come from the fact that inventions start working at a discrete time (both in the case of AGI, and in the case of things AGI can build), while the size of the discontinuities comes from the fact that (a) general intelligence is a big deal (!), and (b) the first AGIs will probably have a lot of it compared to humans (broadly for the same reason the first planes had higher carrying capacities than any bird—why would birds be able to compete with a machine optimized for high carrying capacity?).
  - If you think the model is more complicated then that, then I’m confused. Is the Paul/Jotto position that this background view doesn’t predict hard takeoff, that it has too low a prior to be a top contender-view, or that it’s contradicted by observation? If I don’t think that GPT-3 or AlphaGo are AGIs (and therefore think the ‘AGI gets invented in the future’ discontinuity is still in play), then what specific predictions about narrow AI do you think that view should entail? Do you think it requires secret arcane knowledge of AGI in order to not think GPT-3 is an AGI?
  - Note that in all of this, I’m not claiming to know all of EY’s models; and I do think he has a lot more concentrated probability than me on a bunch of questions about how AGI works, and I think these nudge his timelines more toward the present. But this seems totally unnecessary to invoke in order to put a ton of mass on hard takeoff; and on timelines, I don’t read him as claiming much more than “my gestalt impression of the field as a whole, which strikes me as a weak source but still a better one than the explicitly invalid arguments I keep seeing EAs throw my way, gives me a vague hunch that AGI is probably a lot less than 30 years away”.
  - “Probably well before 2050” is not a ton of information, and his criticisms of longer timelines have mainly been of the form ‘it’s hard to concentrate mass a lot and your arguments are invalid ways to do so’, not ‘I have secret info that definitively proves it will happen in the next 10 years’.
- Meanwhile, Eliezer thinks Paul must be basing his models of AI on a mysterious secret insight into the nature of cognition—at least if Paul isn’t making a simple error like ‘assuming blank maps correspond to blank territory’ or ‘assuming surface trends trump deep regularities’—models which, if correct, should be powerful enough to predict lots of other cognition-y things well (since the Paul model explicitly claims that intelligence is way more predictable and homogeneous than Eliezer thinks it is), and is not the sort of claim that has a high prior.
  - This is my current epistemic state re Paul’s view as well. I could understand Paul’s view if he thought we were 3 years away from AGI and had found a trendline proving this; but thinking that we’re decades away and recent trendlines will continue exactly on schedule for decades strikes me as bizarre, even more so if this involves a transition to AGI. Paul may have arguments for this that I haven’t heard (‘smarter AI is more profitable and people want profit’ is obviously not compelling to me on its own), but the ones I have heard seemed very weak to me.
What links here?
- Jotto999's comment on Yudkowsky and Christiano discuss “Takeoff Speeds” by Eliezer Yudkowsky (30 Jun 2022 4:05 UTC; 0 points)
- paulfchristiano 1 Jun 2022 21:15 UTC
  55 points
  0
  Parent
  I think this is an unreasonable characterization of the situation and my position, especially the claim:
  Eliezer hasn’t seen a big list of prediction successes from Paul about this thing Paul claims to be unusually good at (whereas, again, EY makes no claim of being unusually good at timing arbitrary narrow-AI advances)
  I responded to a long thread of Eliezer trash-talking me in particular (here), including making apparent claims about how this is not the kind of methodology that makes good forecasts. He writes:
  It just seems very clear to me that the sort of person who is taken in by this essay is the same sort of person who gets taken in by Hanson’s arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2 [… the kind of person who is] going “Huh?” when AlphaGo or GPT-3 debuts^[1]
  He also writes posts like this one. Saying “the trick that never works” sure seems like it’s making a claim that something has a worse track record than whatever Eliezer is doing.
  Overall it looks to me that Eliezer is saying, not once but many times, that he is better at predicting things than other people and that this should be taken as a reason to dismiss various kinds of argument.
  I’m not claiming to be exceptional at making predictions. I’m claiming that Eliezer is mediocre at it and overconfident about it.
  I’m glad we were able to make one bet, which will give one single bit of evidence for Eliezer’s position if he wins and a a measly 1/8th of a bit against if he loses. But I felt that Eliezer was unwilling to state almost any concrete predictions about anything (even one that was just “Paul is overconfident about X”). In light of that, I think Eliezer probably shouldn’t be ragging so hard on how other people are “caught flatfooted” (in contrast with his superior intuition).
  If you want to say “X doesn’t work for forecasting” (and have it actually mean something rather than being mood affiliation) I think you basically need to finish the sentence by saying ”...as well as Y”. And if you want anyone to take that seriously you should be willing to stick your neck out on some X vs Y comparisons rather than just saying “how about the proponents of X go make some forecasts that I will cherry-pick and deride later.”
  1. ^
    This is particularly frustrating to me because in 2013 I already expected beating the best human go players in the next few years based on the trend extrapolation in this document. As far as I know Eliezer appears to have had a smaller probability on human-level go performance soon (believing it would require some new insight instead of just extrapolating the curve out to expert performance in 2017-2018), and to have been more confident that the match would be either 0-5 or 5-0.
    
    Likewise, I’m pretty confident that I had a higher probability on GPT-3 than Eliezer did. All of his public statements about language modeling suggests skepticism for getting GPT-3 like competencies at this point in the tech tree or producing it by stacking more layers, whereas in internal OpenAI discussions I was giving 25%+ probabilities for LM scaling working this well (admittedly not as high as some other even truer believers in stack more layers).
    
    I’m not presenting these as explicit predictions that should be particularly convincing to others. But I do hope it explains why I disagree with Eliezer’s implications, and at least makes it plausible that his track record claim is backwards.
  - lc 1 Jun 2022 22:51 UTC
    1 point
    0
    Parent
    You’re not wrong, and I’m not saying you shouldn’t have replied in your current position, but the youtube drama isn’t increasing my respect for either you or Eliezer.
    - paulfchristiano 2 Jun 2022 16:02 UTC
      17 points
      0
      Parent
      Yeah, I think I should probably stay out of this kind of interaction if I’m going to feel compelled to respond like this. Not that maximizing respect is the only goal, but I don’t think I’m accomplishing much else.
      I’m also going to edit the the phrases “shouldn’t talk quite as much shit” and “full of himself,” I just shouldn’t have expressed that idea in that way. (Sorry Eliezer.)
      - Zack_M_Davis 3 Jun 2022 4:12 UTC
        34 points
        0
        Parent
        I think the YouTube drama is serving an important function. Yudkowsky routinely positions himself in the role of a religious leader who is (in his own words) “always right”.
        
        (I think “role of a religious leader” is an apt description of what’s going on sociologically, even if no supernatural claims are being made; that’s why the “rightful caliph” language sticks.)
        
        I used to find the hyper-arrogant act charming and harmless back in 2008, because, back in 2008, he actually was right about almost everything I could check myself. (The Sequences were very good.)
        
        For reasons that are beyond the scope of this comment, I no longer think the hyper-arrogant act is harmless; it intimidates many of his faithful students (who genuinely learned a lot from him) into deferring to their tribal leader even when he’s obviously full of shit.
        
        If he can’t actually live up to his marketing bluster, it’s important for our collective sanity that people with reputation and standing call bullshit on the act, so that citizens of the Caliphate remember that they have the right and the responsibility to think things through for themselves. I think that’s a more dignified way to confront the hazards that face us in the future—and I suspect that’s what the Yudkowsky of 2008 would want us to do. (He wrote then of being “not sure that human beings realistically can trust and think at the same time.”) If present-day Yudkowsky (who complains that “too many people think it’s unvirtuous to shut up and listen to [him]”) disagrees, all the more reason not to trust him anymore.
        lc 4 Jun 2022 3:47 UTC
        2 points
        0
        Parent
        This is correct.
- Rob Bensinger 26 May 2022 0:19 UTC
  25 points
  0
  Parent
  Elizabeth van Nostrand comments in private chat:
  Can everyone agree that:
  there are many forms of prediction, of which narrow, precise forecasting of the kind found on prediction markets is only one
  narrow forecasting is only viable for a small subset of problems, and often the most important problems aren’t amenable to narrow forecasting
  narrow forecasting is much harder to fake than the other kinds. Making vague predictions and taking credit for whatever happens to happen is a misallocation of truthseeking credit.
  It is possible to have valuable models without being good at narrow predictions- black swans is a useful concept but it’s very annoying how the media give Nassim Taleb credit everytime something unexpected happens.
  It is possible to have models that are true but not narrow-predictive enough to be valuable [added: you can have a strong, correct model that a stock is overpriced, but unless you have a model for when it will correct it’s ~impossible to make money off that information]
  ?
  I like this addition, and endorse 1-5!
- Rob Bensinger 25 May 2022 0:48 UTC
  12 points
  0
  Parent
  But still without being transparent about his own forecasts, preventing a fair comparison.
  I think it’s a fair comparison, in that we can do at least a weak subjective-Bayesian update on the information—it’s useful and not cherry-picked, at least insofar as we can compare the AGI/TAI construct Eliezer was talking about in December, to the things Metaculus is making predictions about.
  I agree that it’s way harder to do a Bayesian update on data points like ‘EY predicted AGI well before 2050, then Metaculus updated from 2052 to 2035’ when we don’t have a full EY probability distribution over years.
  I mostly just respond by making a smaller subjective update and then going on with my day, rather than treating this as revelatory. I’m better off with the information in hand, but it’s a very small update in the grand scheme of things. Almost all of my knowledge is built out of small updates in the first place, rather than huge revelatory ones.
  If I understand your views, Jotto, three big claims you’re making are:
  1. It’s rude to be as harsh to other futurists as Eliezer was toward Metaculus, and if you’re going to be that harsh then at minimum you should clearly be sticking your neck out as much as the people you’re criticizing. (Analogy: it would be rude, and harmful to pro-quantified-forecasting norms, to loudly criticize Matt Yglesias for having an off year without at minimum having made a similar number of similarly risky, easy-to-resolve public predictions.)
  2. Metaculus-style forecasting is the gold standard for reasoning about the physical world, and is the only game in town when it comes to ‘remotely reasonable methods to try to predict anything about future technology’. Specifically:
    Anyone who claims to know anything relevant to the future should have an account on Metaculus (or a similar site), and people should overwhelmingly base their beliefs about the future on (a) what Metaculus says, and (b) what the people with the highest Metaculus scores say...
    … rather than basing their beliefs on their own inside-view models of anything, personal attempts to do explicit quantified Bayesian updates in response to not-fully-quantitative data (e.g., ‘how surprised would my gut be if something like CYC turned out to be more important than ML to future AI progress, a la Hanson’s claims? in how many worlds do I expect to see that, compared to worlds where I don’t see it?’) , or attempts to shift their implicit strength of belief about things without routing through explicit Bayesian calculations.
  3. If you aren’t a top Metaculus forecaster and aren’t just repeating the current Metaculus consensus using the reasoning ‘X is my belief state because Metaculus thinks it’, then you should shut up rather than poisoning the epistemic commons with your unvalidated inside-view models, hard-to-quickly-quantitatively-evaluate claims about reality, etc.
  (Correct me if I’m misstating any of your views.)
  I don’t have a strong view on whether folks should be friendlier/nicer in general—there are obvious benefits to letting people be blunt, but also obvious costs. Seems hard to resolve. I think it’s healthy that the EA Forum and LW have chosen different tradeoff points here, so we can test the effects of different norms and attract people who favor different tradeoffs. (Though I think there should be more ‘cultural exchange’ between the EA Forum and LW.)
  The more specific question ‘has Eliezer stuck his neck out enough?’ seems to me to turn on 2. Likewise, 3 depends on the truth of 2.
  I think 2 is false—Metaculus strikes me as a good tool to have in the toolbox, and a really cool resource overall, but I don’t see it as a replacement for inside-view reasoning, building your own models of the world, or doing implicit updating and intuition-honing.
  Nor do I think that only the top n% of EAs or rationalists should try to do their own model-building like this; I think nearly every EA and every rationalist should do it, just trying to guard against the obvious pitfalls—and learning via experience, to some degree, where those pitfalls tend to be for them personally.
  Quoting another recent thing I wrote on Twitter:
  At the borderlands of EA and non-EA, I find that the main argument I tend to want to cite is Bayes:
  ‘Yep, A seems possible. But if not-A were true instead, what would you expect to see differently? How well does not-A retrodict the data, compared to A?’
  And relatedly, ‘What are the future predictions of A versus not-A, and how soon can we get data that provides nontrivial evidence for one side versus the other?’ But that’s a more standard part of the non-EA college-educated person’s toolbox.
  And there’s a sense in which almost all of the cognitive resources available to a human look like retrodiction, rather than prediction.
  If you hear a new Q and only trust your pre-registered predictions, then that means your whole lifetime of past knowledge is useless to you.
  We have in fact adopted the norm “give disproportionate weight to explicit written-down predictions”, to guard against hindsight bias and lying.
  But it’s still the case that almost all the cognitive work is being done at any time by “how does this fit my past experience?”.
  I guess there’s another, subtler reason we give extra weight to predictions: there’s a social norm against acknowledging gaps in individual ability.
  If you only discuss observables and objective facts, never priors, then it’s easier to just-not-talk-about individuals’ judgment.
  Whatever the rationale, it’s essential that we in fact get better at retrodiction (i.e., reasoning about the things we already know), because we can’t do without it. We need to be able to talk about our knowledge, and we need deliberate practice at manipulating it.
  The big mistake isn’t “give more weight to pre-registered predictions”; it’s ”… and then make it taboo to say that you’re basing any conclusions on anything else”.
  Predictions are the gold standard, but man cannot live on gold alone.
  To explain more of my view, here’s a thing I wrote in response to some Qs on the 18th:
  I think there are two different topics here:
  1. Should we talk a bunch publicly about the fastest path to AGI?
  2. To what extent should we try to explicate and quantify our *other* predictions, and publish those predictions?
  I think the answer to 1 is an obvious “no”.
  But ‘what is your AGI timeline?’, ‘when do you expect a Gato-like thing to be developed?’, etc. seems basically fine to me, because it doesn’t say much about how you expect AGI to be developed. (Especially if your timeline is long.)
  The Metaculus folks criticizing EY for saying ‘Metaculus updated toward my view’ apparently didn’t realize that he did make a public prediction proving this: (link)
  He didn’t make a Gato-specific public prediction, but it’s also not apparent to me that EY’s making a strong claim like ‘I predicted a system exactly like Gato would be built exactly now’; he’s just saying it’s the broad sort of thing his intuitive models of AI progress allow.
  Translating an intuitive, unstable, preverbal sense of ‘which events are likelier to happen when?’ into a bunch of quantified predictions, without falling victim to issues like framing and salience effects, seems pretty hard to me.
  EY is on the record as saying that it’s hard to get much mileage out of thinking about timelines, and that it’s even harder if you try to switch away from your brain’s native format for representing the probabilities (emotional surprise, concrete anticipations, etc.).
  I could easily imagine that there’s some individual variation about how people best do tech forecasting, and I also think it’s reasonable for folks to disagree about the best norms here. So, I think 2 is a more complicated Q than 1, and I don’t have a strong view on it. [...]
  - Rob Bensinger 25 May 2022 1:08 UTC
    8 points
    0
    Parent
    I guess “top Metaculus forecaster” is a transparently bad metric, because spending more time on Metaculus tends to raise your score? Is there a ‘Metaculus score corrected for how much you use the site’ leaderboard?
    - Alex Lawsen 27 May 2022 5:59 UTC
      6 points
      0
      Parent
      Yes, https://metaculusextras.com/points_per_question
      
      It has its own problems in terms of judging ability. But it does exist.
      - Rob Bensinger 27 May 2022 21:50 UTC
        2 points
        0
        Parent
        Thanks! :)
      - Jotto999 28 May 2022 0:28 UTC
        1 point
        0
        Parent
        This is good in some ways but also very misleading. This selects against people who also place a lot of forecasts on lots of questions, and also against people who place forecasts on questions that have already been open for a long time, and who don’t have time to later update on most of them.
        I’d say it’s a very good way to measure performance within a tournament, but in the broader jungle of questions it misses an awful lot.
        E.g. I have predictions on 1,114 questions, and the majority were never updated, and had negligible energy put into them.
        Sometimes for fun I used to place my first (and only) forecast on questions that were just about to close. I liked it because this made it easier to compare my performance on distribution questions, versus the community, because the final summary would only show that for the final snapshot. But of course, if you do this then you will get very few points per question. But if I look at my results on those, it’s normal for me to slightly outperform the community median.
        This isn’t captured by my average points per question across all questions, where I underperform (partly because I never updated on most of those questions, and partly because a lot of it is amusingly obscure stuff I put little effort into.) Though, that’s not to suggest I’m particular great either (I’m not), but I digress.
        If we’re trying to predict a forecaster’s insight on “the next” given discrete prediction, then a more useful metric would be the forecaster’s log score versus the community’s log score on the same questions, at the time they placed those forecast. Naturally this isn’t a good way to score tournaments, where people should update often, and focus on high-effort per question. But if we’re trying to estimate their judgment from the broader jungle of Metaculus questions, then that would be much more informative than a points average per question.
- Eli Tyre 1 Jun 2022 19:07 UTC
  4 points
  0
  Parent
  “I don’t have the same inside-view models as you, and you haven’t given me an explicit argument to convince me, so I don’t believe you” is a good and normal response here.
  In most of the world this reads as kind of offensive, or as an affront, or as inciting conflict, which makes having this thought in the first place hard, which is one of the contributors to modest epistemology.
  I wish that we had better social norms for distinguishing between
  1. Claims that I am making and justifying based on legible info and reasoning. ie “Not only do I think that X is true, I think that any right thinking person who examines the evidence should come to conclude X,” and if you disagree with me about that we should debate it.
  2. Claims that I am making based on my own private or illegible info and reasoning. ie “Given my own read of the evidence, I happen to think X. But I don’t think that the arguments that I’ve offered are necessarily sufficient to convince a third party.” I’m not claiming that you should believe this, I’m merely providing you the true information that I believe it.
  I think clearly making this distinction would be helpful for giving space for people to think. I think lots of folks implicitly feel like they can’t have an opinion about something unless it is of the first type, which means they have to be prepared to defend their view from attack.
  Accordingly, they have a very high bar for letting themselves believe something, or at least to say it out loud. Which, at least, impoverishes the discourse, but might also hobble their own internal ability to reason about the world.
  On the flip side, I think sometimes people in our community come off as arrogant, because they’re making claims of the second type, and others assume that they’re making claims of the second type, without providing much supporting argument at all.
  (And sometimes, folks around here DO make claims of the first type, without providing supporting arguments. eg “I was convinced by the empty string, I don’t know strange inputs others need to be convinced” implying “I think all right thinking people would reach this conclusion, but none of you are right thinking.”)
- Jotto999 25 May 2022 2:25 UTC
  3 points
  0
  Parent
  First I commend the effort you’re putting into responding to me, and I probably can’t reciprocate as much.
  But here is a major point I suspect you are misunderstanding:
  It seems like you’re interpreting EY as claiming ‘I have a crystal ball that gives me unique power to precisely time AGI’, whereas I interpret EY as saying that one particular Metaculus estimate is wrong.
  This is neither necessary for my argument, nor at any point have I thought he’s saying he can “precisely time AGI”.
  If he thought it was going to happen earlier than the community, it would be easy to show an example distribution of his, without high precision (nor much effort). Literally just add a distribution into the box on the question page, click and drag the sliders so it’s somewhere that seems reasonable to him, and submit it. He could then screenshot it. Even just copypasting the confidence interval figures.
  Note that this doesn’t mean making the date range very narrow (confident), that’s unrelated. He can still be quite uncertain about specific times. Here’s an example of me somewhat disagreeing with the community. Of course now the community has updated to earlier, but he can still do these things, and should. Doesn’t even need to be screenshotted really, just posting it in the Metaculus thread works.
  And further, this point you make here:
  Eliezer hasn’t said he thinks he can do better than Metaculus on arbitrary questions. He’s just said he thinks Metaculus is wrong on one specific question.
  My argument doesn’t need him to necessarily be better at “arbitrary questions”. If Eliezer believes Metaculus is wrong on one specific question, he can trivially show a better answer. If he does this on a few questions and it gets properly scored, that’s a track record.
  You mentioned other things, such as how much it would transfer to broader, longer-term questions. That isn’t known and I can’t stay up late typing about this, but at the very minimum people can demonstrate they are calibrated, even if you believe there is zero knowledge transfer from narrower/shorter questions to broader/longer ones.
  Going to have to stop it there for today, but I would end this comment with a feeling: it feels like I’m mostly debating people who think they can predict when Tetlock’s findings don’t apply, and so reliably that it’s unnecessary to forecast properly nor transparently, and it seems like they don’t understand.
  - Rob Bensinger 25 May 2022 3:35 UTC
    16 points
    0
    Parent
    Note that this doesn’t mean making the date range very narrow (confident), that’s unrelated.
    Fair enough, but I was responding to a pair of tweets where you said:
    Eliezer says that nobody knows much about AI timelines. But then keeps saying “I knew [development] would happen sooner than you guys thought”. Every time he does that, he’s conning people.
    I know I’m using strong wording. But I’d say the same in any other domain.
    He should create a public Metaculus profile. Place a bunch of forecasts.
    If he beats the community by the landslide he claims, then I concede.
    If he’s mediocre, then he was conning people.
    ‘It would be convenient if Eliezer would record his prediction on Metaculus, so we know with more precision how strong of an update to make when he publicly says “my median is well before 2050” and Metaculus later updates toward a nearer-term median’ is a totally fair request, but it doesn’t bear much resemblance to ‘if you record any prediction anywhere other than Metaculus (that doesn’t have similarly good tools for representing probability distributions), you’re a con artist’. Seems way too extreme.
    Likewise, ‘prove that you’re better than Metaculus on a ton of forecasts or you’re a con artist’ seems like a wild response to ‘Metaculus was slower than me to update about a specific quantity in a single question’. So I’m trying to connect the dots, and I end up generating hypotheses like:
    Maybe Jotto is annoyed that Eliezer is confident about hard takeoff, not just that he has a nearer timelines median than Metaculus. And maybe Jotto specifically thinks that there’s no way you can rationally be confident about hard takeoff unless you think you’re better than Metaculus at timing tons of random narrow AI things.
    
    So then it follows that if you’re avoiding testing your mettle vs. Metaculus on a bunch of random narrow AI predictions, then you must not have any rational grounds for confidence in hard takeoff. And moreover this chain of reasoning is obvious, so Eliezer knows he has no grounds for confidence and is deliberately tricking us.
    Or:
    Maybe Jotto hears Eliezer criticize Paul for endorsing soft takeoff, and hears Eliezer criticize Metaculus for endorsing Ajeya-ish timelines, and Jotto concludes ‘ah, Eliezer must think he’s amazing at predicting AI-ish events in general; this should be easy to test, so since he’s avoiding publicly testing it, he must be trying to trick us’.
    In principle you could have an Eliezer-model like that and think that Eliezer has lots of nonstandard beliefs about random AI topics that make him way too confident about things like hard takeoff and yet his distributions tend to be wide, but that seems like a pretty weird combination of views to me, so I assumed that you’d also think Eliezer has relatively narrow distributions about everything.
    it feels like I’m mostly debating people who think they can predict when Tetlock’s findings don’t apply, and so reliably that it’s unnecessary to forecast properly nor transparently, and it seems like they don’t understand.
    Have you read Inadequate Equilibria, or R:AZ? (Or my distinction between ‘rationality as prothesis’ and ‘rationality as strength training’?)
    I think there’s a good amount of overlap between MIRI- and CFAR-ish views of rationality and Tetlock-ish views, but I also don’t think of Tetlock’s tips as the be-all end-all of learning things about the world, of doing science, etc., and I don’t see his findings as showing that we should give up on inside-view model-building, not-fully-explicit-and-quantified reasoning under uncertainty, or any of the suggestions in When (Not) To Use Probabilities.
    (Nor do I think Tetlock would endorse the ‘no future-related knowledge generation except via Metaculus or prediction markets’ policy you seem to be proposing. Maybe if we surveyed them we’d find out that Tetlock thinks Metaculus is 25% cooler than Eliezer does, or something? It’s not obvious to me that it matters.)
    - Rob Bensinger 25 May 2022 4:01 UTC
      12 points
      0
      Parent
      Also, I think you said on Twitter that Eliezer’s a liar unless he generates some AI prediction that lets us easily falsify his views in the near future? Which seems to require that he have very narrow confidence intervals about very near-term events in AI.
      So I continue to not understand what it is about the claims ‘the median on my AGI timeline is well before 2050’, ‘Metaculus updated away from 2050 after I publicly predicted it was well before 2050’, or ‘hard takeoff is true with very high probability’, that makes you think someone must have very narrow contra-mainstream distributions on near-term narrow-AI events or else they’re lying.
    - Jotto999 26 May 2022 1:04 UTC
      −2 points
      0
      Parent
      Some more misunderstanding:
      ‘if you record any prediction anywhere other than Metaculus (that doesn’t have similarly good tools for representing probability distributions), you’re a con artist’. Seems way too extreme.
      No, I don’t mean the distinguishing determinant of con-artist or not-con-artist trait is whether it’s recorded on Metaculus. It’s mentioned in that tweet because if you’re going to bother doing it, might as well go all the way and show a distribution.
      But even if he just posted a confidence interval, on some site other than Metaculus, that would be a huge upgrade. Because then anyone could add it to a spreadsheet scorable forecasts, and reconstruct it without too much effort.
      ‘if you record any prediction anywhere other than Metaculus (that doesn’t have similarly good tools for representing probability distributions), you’re a con artist’. Seems way too extreme.
      No, that’s not what I’m saying. The main thing is that they be scorable. But if someone is going to do it at all, then doing it on Metaculus just makes more sense—the administrative work is already taken care of, and there’s no risk of cherry-picking nor omission.
      Also, from another reply you gave:
      Also, I think you said on Twitter that Eliezer’s a liar unless he generates some AI prediction that lets us easily falsify his views in the near future? Which seems to require that he have very narrow confidence intervals about very near-term events in AI.
      I never used the term “liar”. The thing he’s doing that I think is bad is more like what a pundit does, like the guy who calls recessions, a sort of epistemic conning. “Lying” is different, at least to me.
      More importantly, no he doesn’t necessarily need to have really narrow distributions, and I don’t know why you think this. Only if he was squashed close against the “Now” side on the chart, then yes it would be “narrower”—but if that’s what Eliezer thinks, if he’s saying himself it’s earlier than x date, then on a graph that looks like it’s a bit narrower and shifted to the left, and it simply reflects what he believes.
      There’s nothing about how we score forecasters that requires him have “very narrow” confidence intervals about very near-term events in AI, in order to measure alpha. To help me understand, can you describe why you think this? Why don’t you think alpha would start being measurable with merely slightly more narrow confidence intervals than the community, and centered closer to the actual outcome?
      
      EDIT a week later: I have decided that several of your misunderstandings should be considered strawmanning, and I’ve switched from upvoting some of your comments here to downvoting them.