elifland

Karma: 1,259

https://www.elilifland.com/. You can give me anonymous feedback here.

Eli’s review of “Is power-seeking AI an existential risk?”

elifland30 Sep 2022 12:21 UTC

67 points

0 comments3 min readLW link

(docs.google.com)

elifland 14 May 2024 4:40 UTC
55 points
40
on: elifland’s Shortform
The word “overconfident” seems overloaded. Here are some things I think that people sometimes mean when they say someone is overconfident:
1. They gave a binary probability that is too far from 50% (I believe this is the original one)
2. They overestimated a binary probability (e.g. they said 20% when it should be 1%)
3. Their estimate is arrogant (e.g. they say there’s a 40% chance their startup fails when it should be 95%), or maybe they give an arrogant vibe
4. They seem too unwilling to change their mind upon arguments (maybe their credal resilience is too high)
5. They gave a probability distribution that seems wrong in some way (e.g. “50% AGI by 2030 is so overconfident, I think it should be 10%”)
  - This one is pernicious in that any probability distribution gives very low percentages for some range, so being specific here seems important.
6. Their binary estimate or probability distribution seems too different from some sort of base rate, reference class, or expert(s) that they should defer to.
How much does this overloading matter? I’m not sure, but one worry is that it allows people to score cheap rhetorical points by claiming someone else is overconfident when in practice they might mean something like “your probability distribution is wrong in some way”. Beware of accusing someone of overconfidence without being more specific about what you mean.

Scenario Forecasting Workshop: Materials and Learnings

elifland and charlie_griffin

8 Mar 2024 2:30 UTC

50 points

3 comments2 min readLW link

Prize idea: Transmit MIRI and Eliezer’s worldviews

elifland19 Sep 2022 21:21 UTC

47 points

18 comments2 min readLW link

Samotsvety’s AI risk forecasts

elifland9 Sep 2022 4:01 UTC

44 points

0 comments4 min readLW link

Discussing how to align Transformative AI if it’s developed very soon

elifland28 Nov 2022 16:17 UTC

37 points

2 comments28 min readLW link

elifland 2 Nov 2023 19:04 UTC
37 points
38
in reply to: ryan_greenblatt’s comment on: Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk
Thus, due to no one’s intent, insufficiently justified concerns about current open-source AI are propagated to governance orgs, which recommend banning open source based on this research.
The recommendation that current open-source models should be banned is not present in the policy paper, being discussed, AFAICT. The paper’s recommendations are pictured below:
Edited to add: there is a specific footnote that says “Note that we do not claim that existing models are already too risky. We also do not make any predictions about how risky the next generation of models will be. Our claim is that developers need to assess the risks and be willing to not open-source a model if the risks outweigh the benefits” on page 31

Impactful Forecasting Prize for forecast writeups on curated Metaculus questions

elifland, sam_atis and yagudin

4 Feb 2022 20:06 UTC

36 points

0 comments4 min readLW link

Personal forecasting retrospective: 2020-2022

elifland21 Jul 2022 0:07 UTC

35 points

3 comments8 min readLW link

(www.foxy-scout.com)

[Question] Forecasting thread: How does AI risk level vary based on timelines?

elifland14 Sep 2022 23:56 UTC

34 points

7 comments1 min readLW link

elifland 5 Jul 2022 0:12 UTC
31 points
2
on: AI Forecasting: One Year In
Overall agree that progress was very surprising and I’ll be thinking about how it affects my big picture views on AI risk and timelines; a few relatively minor nitpicks/clarifications below.
For instance, superforecaster Eli Lifland posted predictions for these forecasts on his blog.
I’m not a superforecaster (TM) though I think some now use the phrase to describe any forecasters with good ~generalist track records?
While he notes that the Hypermind interface limited his ability to provide wide intervals on some questions, he doesn’t make that complaint for the MATH 2022 forecast and posted the following prediction, for which the true answer of 50.3% was even more of an outlier than Hypermind’s aggregate:
[image]
The image in the post is for another question: below shows my prediction for MATH, though it’s not really more flattering. I do think my prediction was quite poor.
I didn’t run up to the maximum standard deviation here, but I probably would have given more weight to larger values if I had been able to forecast a mixture of components like on Metaculus. The resolution of 50.3% would very likely (90%) still have been above my 95th percentile though.
- Hypermind’s interface has some limitations that prevent outputting arbitrary probability distributions. In particular, in some cases there is an artificial limit on the possible standard deviations, which could lead credible intervals to be too narrow.
I think this maybe (40% for my forecast) would have flipped the MMLU forecast to be inside the 90th credible interval, at least for mine and perhaps for the crowd.
In my notes on the MMLU forecast I wrote “Why is the max SD so low???”
What links here?
- Personal forecasting retrospective: 2020-2022 by elifland (21 Jul 2022 0:07 UTC; 35 points)

Forecasting future gains due to post-training enhancements

elifland, Joel Becker and simeon_c

8 Mar 2024 2:11 UTC

26 points

2 comments1 min readLW link

(docs.google.com)

elifland 26 Oct 2023 12:29 UTC
LW: 25 AF: 12
6
AF
on: Thoughts on responsible scaling policies and regulation
I appreciate this post, in particular the thoughts on an AI pause.
I believe that a very good RSP (of the kind I’ve been advocating for) could cut risk dramatically if implemented effectively, perhaps a 10x reduction. In particular, I think we will probably have stronger signs of dangerous capabilities before something catastrophic happens, and that realistic requirements for protective measures can probably lead to us either managing that risk or pausing when our protective measures are more clearly inadequate. This is a big enough risk reduction that my primary concern is about whether developers will actually adopt good RSPs and implement them effectively.
The 10x reduction claim seems wild to me. I think that a lot of the variance in outcomes of AI is due to differing underlying difficulty, and it’s somewhat unlikely that alignment difficulty is within the range of effort that we would put into the problem in normal-ish circumstances.
So I don’t see how even very good RSPs could come anywhere close to a 10x reduction in risk, when it seems like even if we assume the evals work ~perfectly they would likely at most lead to a few years pause (I’m guessing you’re not assuming that every lab in the world will adopt RSPs, though it’s unclear. And even if every lab implements them presumably some will make mistakes in evals and/or protective measures). Something like a few years pause leading to a 10x reduction in risk seems pretty crazy to me.
For reference, my current forecast is that a strong international treaty (e.g. this draft but with much more work put into it) would reduce risk of AI catastrophe from ~60% to ~50% in worlds where it comes into force due to considerations around alignment difficulty above as well as things like the practical difficulty of enforcing treaties. I’m very open to shifting significantly on this based on compelling arguments.

elifland 1 Nov 2023 23:56 UTC
LW: 24 AF: 14
13
AF
on: My thoughts on the social response to AI risk
I agree much of the community (including me) was wrong or directionally wrong in the past about the level of AI regulation and how quickly it would come.
Regarding the recommendations made in the post for going forward given that there will be some regulation, I feel confused in a few ways.
1. Can you provide examples of interventions that meet your bar for not being done by default? It’s hard to understand the takeaways from your post because the negative examples are made much more concrete than the proposed positive ones
  1. You argue that we perhaps shouldn’t invest as much in preventing deceptive alignment because “regulators will likely adapt, adjusting policy as the difficulty of the problem becomes clearer”
  2. If we are assuming that regulators will adapt and adjust regarding deception, can you provide examples of interventions that policymakers will not be able to solve themselves and why they will be less likely to notice and deal with them than deception?
  3. You say “we should question how plausible it is that society will fail to adequately address such an integral part of the problem”. What things aren’t integral parts of the problem but that should be worked on?
    I feel we would need much better evidence of things being handled competently to invest significantly less into integral parts of the problem.
2. You say: ‘Of course, it may still be true that AI deception is an extremely hard problem that reliably resists almost all attempted solutions in any “normal” regulatory regime, even as concrete evidence continues to accumulate about its difficulty—although I consider that claim unproven, to say the least’
  1. If we expect some problems in AI risk to be solved by default mostly by people outside the community, it feels to me like one takeaway would be that we should shift resources to portions of the problem that we expect to be the hardest
  2. To me, intuitively, deceptive alignment might be one of the hardest parts of the problem as we scale to very superhuman systems, even if we condition on having time to build model organisms of misalignment and experiment with them for a few years. So I feel confused about why you claim a high level of difficulty is “unproven” as a dismissal; of course it’s unproven but you would need to argue that in worlds where the AI risk problem is fairly hard, there’s not much of a chance of it being very hard.
  3. As someone who is relatively optimistic about concrete evidence of deceptive alignment increasing substantially before a potential takeover, I think I still put significantly lower probability on it than you do due to the possibility of fairly fast takeoff.
3. I feel like this post is to some extent counting our chickens before they hatch (tbc I agree with the directional update as I said above). I’m not an expert on what’s going on here but I imagine any of the following happening (non-exhaustive list) that make the current path to potentially sensible regulation in the US and internationally harder:
  1. The EO doesn’t lead to as many resources dedicated to AI-x-risk-reducing things as we might hope. I haven’t read it myself, just the fact sheet and Zvi’s summary but Zvi says “If you were hoping for or worried about potential direct or more substantive action, then the opposite applies – there is very little here in the way of concrete action, only the foundation for potential future action.”
  2. A Republican President comes in power in the US and reverses a lot of the effects in the EO
  3. Rishi Sunak gets voted out in the UK (my sense is that this is likely) and the new Prime Minister is much less gung-ho on AI risk
4. I don’t have strong views on the value of AI advocacy, but this post seems overconfident in calling it out as being basically not useful based on recent shifts.
  1. It seems likely that much stronger regulations will be important, e.g. the model reporting threshold in the EO was set relatively high and many in the AI risk community have voiced support for an international pause if it were politically feasible, which the EO is far from.
  2. The public still doesn’t consider AI risk to be very important. <1% of the American public considers it the most important problem to deal with. So to the extent that raising that number was good before, it still seems pretty good now, even if slightly worse.

My Hypermind Arising Intelligence Forecasts and Reflections

elifland26 Sep 2021 20:47 UTC

23 points

3 comments3 min readLW link

(www.foxy-scout.com)

elifland 7 Oct 2022 15:23 UTC
22 points
16
on: So, geez there’s a lot of AI content these days
Meanwhile Rationality A-Z is just super long. I think anyone who’s a longterm member of LessWrong or the alignment community should read the whole thing sooner or later – it covers a lot of different subtle errors and philosophical confusions that are likely to come up (both in AI alignment and in other difficult challenges)
My current guess is that the meme “every alignment person needs to read the Sequences / Rationality A-Z” is net harmful. They seem to have been valuable for some people but I think many people can contribute to reducing AI x-risk without reading them. I think the current AI risk community overrates them because they are selected strongly to have liked them.
Some anecodtal evidence in favor of my view:
1. To the extent you think I’m promising for reducing AI x-risk and have good epistemics, I haven’t read most of the Sequences. (I have liked some of Eliezer’s other writing, like Intelligence Explosion Microeconomics.)
2. I’ve been moving some of my most talented friends toward work on reducing AI x-risk and similarly have found that while I think all have great epistemics, there’s mixed reception to rationalist-style writing. e.g. one is trialing at a top alignment org and doesn’t like HPMOR, while another likes HPMOR, ACX, etc.

elifland 4 May 2024 1:02 UTC
20 points
8
in reply to: Ben Pace’s comment on: Habryka’s Shortform Feed
I think 356 or more people in the population needed to make there be a >5% of 2+ deaths in a 2 month span from that population

elifland 8 Nov 2023 4:38 UTC
17 points
0
in reply to: Ben Pace’s comment on: Vote on Interesting Disagreements
Among existing alignment research agendas/projects, Superalignment has the highest expected value

Discussion on utilizing AI for alignment

elifland23 Aug 2022 2:36 UTC

16 points

3 comments1 min readLW link

(www.foxy-scout.com)

elifland 28 Dec 2022 13:22 UTC
15 points
11
on: What AI Safety Materials Do ML Researchers Find Compelling?
I’d be curious to see how well The alignment problem from a deep learning perspective and Without specific countermeasures… would do.