https://www.elilifland.com/. You can give me anonymous feedback here.
elifland
Eli’s review of “Is power-seeking AI an existential risk?”
Scenario Forecasting Workshop: Materials and Learnings
Prize idea: Transmit MIRI and Eliezer’s worldviews
Samotsvety’s AI risk forecasts
Discussing how to align Transformative AI if it’s developed very soon
Impactful Forecasting Prize for forecast writeups on curated Metaculus questions
Personal forecasting retrospective: 2020-2022
[Question] Forecasting thread: How does AI risk level vary based on timelines?
Overall agree that progress was very surprising and I’ll be thinking about how it affects my big picture views on AI risk and timelines; a few relatively minor nitpicks/clarifications below.
For instance, superforecaster Eli Lifland posted predictions for these forecasts on his blog.
I’m not a superforecaster (TM) though I think some now use the phrase to describe any forecasters with good ~generalist track records?
While he notes that the Hypermind interface limited his ability to provide wide intervals on some questions, he doesn’t make that complaint for the MATH 2022 forecast and posted the following prediction, for which the true answer of 50.3% was even more of an outlier than Hypermind’s aggregate:
[image]
The image in the post is for another question: below shows my prediction for MATH, though it’s not really more flattering. I do think my prediction was quite poor.
I didn’t run up to the maximum standard deviation here, but I probably would have given more weight to larger values if I had been able to forecast a mixture of components like on Metaculus. The resolution of 50.3% would very likely (90%) still have been above my 95th percentile though.
Hypermind’s interface has some limitations that prevent outputting arbitrary probability distributions. In particular, in some cases there is an artificial limit on the possible standard deviations, which could lead credible intervals to be too narrow.
I think this maybe (40% for my forecast) would have flipped the MMLU forecast to be inside the 90th credible interval, at least for mine and perhaps for the crowd.
In my notes on the MMLU forecast I wrote “Why is the max SD so low???”
Forecasting future gains due to post-training enhancements
I agree much of the community (including me) was wrong or directionally wrong in the past about the level of AI regulation and how quickly it would come.
Regarding the recommendations made in the post for going forward given that there will be some regulation, I feel confused in a few ways.
Can you provide examples of interventions that meet your bar for not being done by default? It’s hard to understand the takeaways from your post because the negative examples are made much more concrete than the proposed positive ones
You argue that we perhaps shouldn’t invest as much in preventing deceptive alignment because “regulators will likely adapt, adjusting policy as the difficulty of the problem becomes clearer”
If we are assuming that regulators will adapt and adjust regarding deception, can you provide examples of interventions that policymakers will not be able to solve themselves and why they will be less likely to notice and deal with them than deception?
You say “we should question how plausible it is that society will fail to adequately address such an integral part of the problem”. What things aren’t integral parts of the problem but that should be worked on?
I feel we would need much better evidence of things being handled competently to invest significantly less into integral parts of the problem.
You say: ‘Of course, it may still be true that AI deception is an extremely hard problem that reliably resists almost all attempted solutions in any “normal” regulatory regime, even as concrete evidence continues to accumulate about its difficulty—although I consider that claim unproven, to say the least’
If we expect some problems in AI risk to be solved by default mostly by people outside the community, it feels to me like one takeaway would be that we should shift resources to portions of the problem that we expect to be the hardest
To me, intuitively, deceptive alignment might be one of the hardest parts of the problem as we scale to very superhuman systems, even if we condition on having time to build model organisms of misalignment and experiment with them for a few years. So I feel confused about why you claim a high level of difficulty is “unproven” as a dismissal; of course it’s unproven but you would need to argue that in worlds where the AI risk problem is fairly hard, there’s not much of a chance of it being very hard.
As someone who is relatively optimistic about concrete evidence of deceptive alignment increasing substantially before a potential takeover, I think I still put significantly lower probability on it than you do due to the possibility of fairly fast takeoff.
I feel like this post is to some extent counting our chickens before they hatch (tbc I agree with the directional update as I said above). I’m not an expert on what’s going on here but I imagine any of the following happening (non-exhaustive list) that make the current path to potentially sensible regulation in the US and internationally harder:
The EO doesn’t lead to as many resources dedicated to AI-x-risk-reducing things as we might hope. I haven’t read it myself, just the fact sheet and Zvi’s summary but Zvi says “If you were hoping for or worried about potential direct or more substantive action, then the opposite applies – there is very little here in the way of concrete action, only the foundation for potential future action.”
A Republican President comes in power in the US and reverses a lot of the effects in the EO
Rishi Sunak gets voted out in the UK (my sense is that this is likely) and the new Prime Minister is much less gung-ho on AI risk
I don’t have strong views on the value of AI advocacy, but this post seems overconfident in calling it out as being basically not useful based on recent shifts.
It seems likely that much stronger regulations will be important, e.g. the model reporting threshold in the EO was set relatively high and many in the AI risk community have voiced support for an international pause if it were politically feasible, which the EO is far from.
The public still doesn’t consider AI risk to be very important. <1% of the American public considers it the most important problem to deal with. So to the extent that raising that number was good before, it still seems pretty good now, even if slightly worse.
My Hypermind Arising Intelligence Forecasts and Reflections
I appreciate this post, in particular the thoughts on an AI pause.
I believe that a very good RSP (of the kind I’ve been advocating for) could cut risk dramatically if implemented effectively, perhaps a 10x reduction. In particular, I think we will probably have stronger signs of dangerous capabilities before something catastrophic happens, and that realistic requirements for protective measures can probably lead to us either managing that risk or pausing when our protective measures are more clearly inadequate. This is a big enough risk reduction that my primary concern is about whether developers will actually adopt good RSPs and implement them effectively.
The 10x reduction claim seems wild to me. I think that a lot of the variance in outcomes of AI is due to differing underlying difficulty, and it’s somewhat unlikely that alignment difficulty is within the range of effort that we would put into the problem in normal-ish circumstances.
So I don’t see how even very good RSPs could come anywhere close to a 10x reduction in risk, when it seems like even if we assume the evals work ~perfectly they would likely at most lead to a few years pause (I’m guessing you’re not assuming that every lab in the world will adopt RSPs, though it’s unclear. And even if every lab implements them presumably some will make mistakes in evals and/or protective measures). Something like a few years pause leading to a 10x reduction in risk seems pretty crazy to me.
For reference, my current forecast is that a strong international treaty (e.g. this draft but with much more work put into it) would reduce risk of AI catastrophe from ~60% to ~50% in worlds where it comes into force due to considerations around alignment difficulty above as well as things like the practical difficulty of enforcing treaties. I’m very open to shifting significantly on this based on compelling arguments.
Meanwhile Rationality A-Z is just super long. I think anyone who’s a longterm member of LessWrong or the alignment community should read the whole thing sooner or later – it covers a lot of different subtle errors and philosophical confusions that are likely to come up (both in AI alignment and in other difficult challenges)
My current guess is that the meme “every alignment person needs to read the Sequences / Rationality A-Z” is net harmful. They seem to have been valuable for some people but I think many people can contribute to reducing AI x-risk without reading them. I think the current AI risk community overrates them because they are selected strongly to have liked them.
Some anecodtal evidence in favor of my view:
To the extent you think I’m promising for reducing AI x-risk and have good epistemics, I haven’t read most of the Sequences. (I have liked some of Eliezer’s other writing, like Intelligence Explosion Microeconomics.)
I’ve been moving some of my most talented friends toward work on reducing AI x-risk and similarly have found that while I think all have great epistemics, there’s mixed reception to rationalist-style writing. e.g. one is trialing at a top alignment org and doesn’t like HPMOR, while another likes HPMOR, ACX, etc.
Among existing alignment research agendas/projects, Superalignment has the highest expected value
Discussion on utilizing AI for alignment
I’d be curious to see how well The alignment problem from a deep learning perspective and Without specific countermeasures… would do.
Good point. For myself:
Background (see also https://www.elilifland.com/): I did some research on adversarial robustness of NLP models while in undergrad. I then worked at Ought as a software/research engineer for 1.5 years, was briefly a longtermist forecasting entrepreneur then have been thinking independently about alignment strategy among other things for the past 2 months.
Research tastes: I’m not great at understanding and working on super mathy stuff, so I mostly avoided giving opinions on these. I enjoy toy programming puzzles/competitions but got bored of engineering large/complex systems which is part of why I left Ought. I’m generally excited about some level of automating alignment research.
Who I’ve interacted with:
A ton: Ought
~3-10 conversations: Conjecture (vast majority being “Simulacra Theory” team), Team Shard
~1-2 conversations with some team members: ARC, CAIS, CHAI, CLR, Encultured, Externalized Reasoning Oversight, MIRI, OpenAI, John Wentworth, Truthful AI / Owain Evans
(speaking for just myself, not Thomas but I think it’s likely he’d endorse most of this)
I agree it would be great to include many of these academic groups; the exclusion wasn’t out of any sort of malice. Personally I don’t know very much about what most of these groups are doing or their motivations; if any of them want to submit brief write ups I‘d be happy to add them! :)
edit: lol, Thomas responded with a similar tone while I was typing
The recommendation that current open-source models should be banned is not present in the policy paper, being discussed, AFAICT. The paper’s recommendations are pictured below:
Edited to add: there is a specific footnote that says “Note that we do not claim that existing models are already too risky. We also do not make any predictions about how risky the next generation of models will be. Our claim is that developers need to assess the risks and be willing to not open-source a model if the risks outweigh the benefits” on page 31