elifland

Karma: 1,188

https://www.elilifland.com/. You can give me anonymous feedback here.

elifland 23 Jul 2020 4:00 UTC
5 points
on: Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns
I think it’s >1% likely that the one of the first few surveys Rohin conducted would result in a fraction of >0.5.
Evidence from When Will AI Exceed Human Performance?, in the form of median survey responses of researchers who published at ICML and NIPS in 2015:
- 5% chance given to Human Level Machine Intelligence (HLMI) having an extremely bad long run impact (e.g. human extinction)
- Does Stuart Russell’s argument for why highly advanced AI might pose a risk point at an important problem? 39% say at least important, 70% at least moderately important.
  - But on the other hand, only 8.4% said working on this problem is now is more valuable than other problems in the field. 28% said as valuable as other problems.
- 47% agreed that society should prioritize “AI Safety Research” more than it currently was.
These seem like fairly safe lower bounds compared to the population of researchers Rohin would evaluate, since concern regarding safety has increased since 2015 and the survey included all AI researchers rather than only those whose work is related to AGI.
These responses are more directly related to the answer to Question 3 (“Does X agree that there is at least one concern such that we have not yet solved it and we should not build superintelligent AGI until we do solve it?”) than Question 2 (“Does X broadly understand the main concerns of the safety community?”). I feel very uncertain about the percentage that would pass Question 2, but think it is more likely to be the “bottleneck” than Question 3.
Given these considerations, I increased the probability before 2023 to 10%, with 8% below the lower bound. I moved the median | not never up to 2035 as a higher probability pretty soon also means a sooner median. I decreased the probability of “never” to 20%, since the “not enough people update on it / consensus building takes forever / the population I chose just doesn’t pay attention to safety for some reason” condition seems not as likely.
I also added an extra bin to ensure that the probability continues to decrease on the right side of the distribution.
My snapshot
Note: I’m interning at Ought and thus am ineligible for prizes.

elifland 2 Aug 2020 22:48 UTC
3 points
in reply to: GuySrinivasan’s comment on: Delegate a Forecast
My forecast is based on historical data from Zillow. I explained my reasoning in the notes. The summary is that housing prices haven’t changed very much in Seattle since April 2019 (on the whole it’s risen 1%). On the other hand, prices in more expensive areas have stayed the same or declined slightly. I settled on a boring median of the price staying the same. Due to how stable the prices have been recently, I think most of the variation will come from the individual house and which neighborhood it’s in, with an outside chance of large Seattle home value fluctuations.

elifland 3 Aug 2020 1:48 UTC
2 points
in reply to: whyzenna’s comment on: Delegate a Forecast
My forecast is based on:
- The past trend of increasing qubitsin quantum devices
- The current leading commercially available device having 20 qubits
- Google’s plans to make Sycamore, which has 54 qubits, available
I don’t have a background in quantum computing, so there’s a chance I’m misinterpreting the question in some way, but I learned a lot doing the research for the forecast (like that there’s a lot of controversy regarding whether quantum supremacy has been achieved yet).
Amusingly, during my research I stumbled upon this Metaculus question about when a >49 qubit quantum computer would be created which resolved ambiguously due to the issue of how well-controlled the qubits are. For the purposes of this forecast I assumed it would resolve based on the raw number of qubits, without adjusting for control.

elifland 9 Aug 2020 15:09 UTC
1 point
in reply to: GuySrinivasan’s comment on: Delegate a Forecast
I must admit I haven’t followed the discussions you’re referring to but if I were to spend more time forecasting this question I would look into them.
I didn’t include effects of COVID in my forecast as it looks like the Zillow Home Value Index for Seattle has remained relatively steady since March (2% drop). I’m skeptical that there are likely to be large effects from COVID in the future when there hasn’t been a large effect from COVID thus far,
A few reasons I could be wrong:
- Zillow data is inaccurate or incomplete, or I’m interpreting it incorrectly.
- COVID affects variation of individual housing prices much more than the trend of the city as a whole.
- The COVID effects will be much bigger in the next half of a year than in the previous. Perhaps there will be a second wave much worse than the market is pricing in which produces large effects.

elifland 20 Nov 2020 23:32 UTC
4 points
in reply to: Measure’s comment on: Embedded Interactive Predictions on LessWrong
It looks like people can change their predictions after they initially submit them. Is this history recorded somewhere, or just the current distribution?
We do store the history. You can view them by going https://elicit.org/binary then searching for the question, e.g. https://elicit.org/binary?binaryQuestions.search=Will%20there%20be%20more%20than%2050. Although as noted by Oli, we currently only display predictions that haven’t been withdrawn.
Is there an option to have people “lock in” their answer? (Maybe they can still edit/delete for a short time after they submit or before a cutoff date/time)
Not planning on supporting this on our end in the near future, but could be a cool feature down the line.
Is there a way to see in one place all the predictions I’ve submitted an answer to?
As of right now, not if you make the predictions via LW. You can view questions that you’ve submitted a prediction on via Elicit at https://elicit.org/binary?binaryQuestions.hasPredicted=true if you’re logged in, and we’re working on allowing for account linking so your LW predictions would show up in the same place.
The first version of account linking will be contacting someone at Ought then us manually running a script.
Edit: the first version of account linking is ready, email elifland@ought.org with your LW username and Elicit email and I can link them.

elifland 28 Nov 2020 0:36 UTC
9 points
in reply to: Bucky’s comment on: Covid 11/26: Thanksgiving
There’s also a Metaculus question about this:

elifland 12 Feb 2021 15:41 UTC
5 points
in reply to: Davidmanheim’s comment on: Covid 2/11: As Expected
If the user is interested in getting into the top ranks, this strategy won’t be anything like enough.
I think this isn’t true empirically for a reasonable interpretation of top ranks. For example, I’m ranked 5th on questions that have resolved in the past 3 months due to predicting on almost every question.
Looking at my track record, for questions resolved in the last 3 months, evaluated at all times, here’s how my log score looks compared to the community:
- Binary questions (N=19): me: -.072 vs. community: -.045
- Continuous questions (N=20): me: 2.35 vs. community: 2.33
So if anything, I’ve done a bit worse than the community overall, and am in 5th by virtue of predicting on all questions. It’s likely that the predictors significantly in front of me are that far ahead in part due to having predicted on (a) questions that have resolved recently but closed before I was active and (b) a longer portion of the lifespan for questions that were open before I became active.
Edit:
I discovered that the question set changes when I evaluate at “resolve time” and filter for the past 3 months, not sure why exactly. Numbers at resolve time:
- Binary questions (N=102): me: .598 vs. community: .566
- Continuous questions (N=92): me: 2.95 vs. community: 2.86
I think this weakens my case substantially, though I still think a bot that just predicts the community as soon as it becomes visible and updates every day would currently be at least top 10.
Would a bot that just predicted the community be at least top 10 on Metaculus over the last 3 months, as of Feb 12 2021?
Anything much worse than that, yes, people could have negative overall scores—which, if they’ve predicted on a decent number of questions, is pretty strong evidence that they really suck at forecasting
I agree that this should have some effect of being less welcoming to newcomers, but I’m curious to what extent. I have seen plenty of people with worse brier scores than the median continuing to predict on GJO rather than being demoralized and quitting (disclaimer: survivorship bias).
What links here?
- elifland's comment on Covid 2/11: As Expected by Zvi (12 Feb 2021 16:26 UTC; 5 points)

elifland 12 Feb 2021 16:26 UTC
5 points
in reply to: aaguirre’s comment on: Covid 2/11: As Expected
Someone who is near the top of the leaderboard is both accurate and highly experienced
I think this unfortunately isn’t true right now, and just copying the community prediction would place very highly (I’m guessing if made as soon as the community prediction appeared and updated every day, easily ~~top 3~~ (edit: top 10)). See my comment below for more details.
You can look at someone’s track record in detail, but we’re also planning to roll out a more ways to compare people with each other.
I’m very glad to hear this. I really enjoy Metaculus but my main gripe with it has always been (as others have pointed out) a lack of way to distinguish between quality and quantity. I’m looking forward to a more comprehensive selection of metrics to help with this!

elifland 26 Jun 2021 22:35 UTC
11 points
on: What will be the aftermath of the US intelligence lab leak report?

It’s very likely that when the US intelligence community reports on 25. August on their data about the orgins of the COVID-19 they will conclude that it was a lab leak.

Are you open to betting on this? GJOpen community is at 9% that the report will conclude that lab leak is more likely than not, I’m at 12%.

In particular, my actual credence in lab leak is higher (~45%) but I’m guessing the most likely outcome of the report is that it’s inconclusive, and that political pressures will play a large role in the outcome.

My Hypermind Arising Intelligence Forecasts and Reflections

elifland26 Sep 2021 20:47 UTC

23 points

3 comments3 min readLW link

(www.foxy-scout.com)

elifland 11 Dec 2021 15:12 UTC
11 points
in reply to: NunoSempere’s comment on: Conversation on technology forecasting and gradualism
Your prior is for discontinuities throughout the entire development of a technology, so shouldn’t your prior be for discontinuity at any point during the development of AI, rather than discontinuity at or around the specific point when AI becomes AGI? It seems this would be much lower, though we could then adjust upward based on the particulars of why we think a discontinuity is more likely at AGI.

elifland 11 Jan 2022 16:35 UTC
2 points
on: elifland’s Shortform
(epistemic status: exploratory)

I think more people into LessWrong in high school—college should consider trying Battlecode. It’s somewhat similar to The Darwin Game which was pretty popular on here and I think generally the type of people who like LessWrong will both enjoy and be good at Battlecode. (edited to add: A short description of Battlecode is that you write a bot to beat other bots at a turn-based strategy game. Each unit executes its own code so communication/coordination is often one of the most interesting parts.)

I did it with friends for 6 years (junior year of high school—end of undergrad), and I think it at least helped me gain legible expertise in strategizing and coding quickly, but plausibly also helped me pick up skills in these areas as well as teamwork.

If any students are interested (I believe PhD students can qualify as well but may not be worth their time), there’s still ²⁄₃ weeks left in this year’s game which is plenty of time. If you’re curious to learn more about my experiences with Battlecode, see the README and postmortem here.

Feel free to comment or DM me if you have any questions.

elifland 16 Jan 2022 15:55 UTC
5 points
on: elifland’s Shortform
[crossposted from EA Forum]
Reflecting a little on my shortform from a few years ago, I think I wasn’t ambitious enough in trying to actually move this forward.
I want there to be an org that does “human challenge”-style RCTs across lots of important questions that are extremely hard to get at otherwise, including (top 2 are repeated from previous shortform):
1. Health effects of veganism
2. Health effects of restricting sleep
3. Productivity of remote vs. in-person work
4. Productivity effects of blocking out focused/deep work
Edited to add: I no longer think “human challenge” is really the best way to refer to this idea (see comment that convinced me); I mean to say something like “large scale RCTs of important things on volunteers who sign up on an app to randomly try or not try an intervention.” I’m open to suggestions on succinct ways to refer to this.
I’d be very excited about such an org existing. I think it could even grow to become an effective megaproject, pending further analysis on how much it could increase wisdom relative to power. But, I don’t think it’s a good personal fit for me to found given my current interests and skills.
However, I think I could plausibly provide some useful advice/help to anyone who is interested in founding a many-domain human-challenge org. If you are interested in founding such an org or know someone who might be and want my advice, let me know. (I will also be linking this shortform to some people who might be able to help set this up.)
--
Some further inspiration I’m drawing on to be excited about this org:
1. Freakonomics’ RCT on measuring the effects of big life changes like quitting your job or breaking up with your partner. This makes me optimistic about the feasibility of getting lots of people to sign up.
2. Holden’s note on doing these type of experiments with digital people. He mentions some difficulties with running these types of RCTs today, but I think an org specializing in them could help.
Votes/considerations on why this is a good or bad idea are also appreciated!

elifland 17 Jan 2022 3:33 UTC
2 points
in reply to: rossry’s comment on: elifland’s Shortform
Thanks, I agree with this and it’s probably not good branding anyway.
I was thinking the “challenge” was just doing the intervention (e.g. being vegan), but agree that the framing is confusing since it refers to something different in the clinical context. I will edit my shortforms to reflect this updated view.

Impactful Forecasting Prize for forecast writeups on curated Metaculus questions

elifland, sam_atis and yagudin

4 Feb 2022 20:06 UTC

36 points

0 comments4 min readLW link

elifland 2 Apr 2022 12:33 UTC
6 points
on: Using prediction markets to generate LessWrong posts
Given the success of this experiment, we should propose a modified version of futarchy where laws are similarly written letter by letter!

elifland 28 May 2022 21:40 UTC
6 points
on: Can growth continue?
What are your thoughts on This can’t go on?

elifland 1 Jul 2022 2:38 UTC
2 points
in reply to: Daniel Kokotajlo’s comment on: My Hypermind Arising Intelligence Forecasts and Reflections
Yeah I’ve been sporadically making progress on a personal forecasting retrospective, will include reflections and updated forecasts if/when I get around to finishing that.

elifland 5 Jul 2022 0:12 UTC
31 points
2
on: AI Forecasting: One Year In
Overall agree that progress was very surprising and I’ll be thinking about how it affects my big picture views on AI risk and timelines; a few relatively minor nitpicks/clarifications below.
For instance, superforecaster Eli Lifland posted predictions for these forecasts on his blog.
I’m not a superforecaster (TM) though I think some now use the phrase to describe any forecasters with good ~generalist track records?
While he notes that the Hypermind interface limited his ability to provide wide intervals on some questions, he doesn’t make that complaint for the MATH 2022 forecast and posted the following prediction, for which the true answer of 50.3% was even more of an outlier than Hypermind’s aggregate:
[image]
The image in the post is for another question: below shows my prediction for MATH, though it’s not really more flattering. I do think my prediction was quite poor.
I didn’t run up to the maximum standard deviation here, but I probably would have given more weight to larger values if I had been able to forecast a mixture of components like on Metaculus. The resolution of 50.3% would very likely (90%) still have been above my 95th percentile though.
- Hypermind’s interface has some limitations that prevent outputting arbitrary probability distributions. In particular, in some cases there is an artificial limit on the possible standard deviations, which could lead credible intervals to be too narrow.
I think this maybe (40% for my forecast) would have flipped the MMLU forecast to be inside the 90th credible interval, at least for mine and perhaps for the crowd.
In my notes on the MMLU forecast I wrote “Why is the max SD so low???”
What links here?
- Personal forecasting retrospective: 2020-2022 by elifland (21 Jul 2022 0:07 UTC; 35 points)

elifland 13 Jul 2022 2:03 UTC
7 points
5
on: ITT-passing and civility are good; “charity” is bad; steelmanning is niche
Steelmanning might be particularly useful in cases where we have reason to believe those who have engaged most with the arguments are biased toward ones side of the debate.
As described in But Have They Engaged with the Arguments?, perhaps a reason many who dismiss AI risk haven’t engaged much with the arguments is the selection effect of engaging more if the first arguments one hears seems true. Therefore it might be useful to steelman arguments by generally reasonable people against AI risk that might seem off due to lack of engagement with existing counterarguments, to extract potentially relevant insights (though perhaps an alternative is funding more reasonable skeptics to engage with the arguments much more deeply?).
What links here?
- Reasons I’ve been hesitant about high levels of near-ish AI risk by elifland (EA Forum; 22 Jul 2022 1:32 UTC; 206 points)

elifland

My Hyper­mind Aris­ing In­tel­li­gence Fore­casts and Reflections

Im­pact­ful Fore­cast­ing Prize for fore­cast write­ups on cu­rated Me­tac­u­lus questions

My Hypermind Arising Intelligence Forecasts and Reflections

Impactful Forecasting Prize for forecast writeups on curated Metaculus questions