https://www.elilifland.com/. You can give me anonymous feedback here. I often change my mind and don’t necessarily endorse past writings.
elifland
The authors have emphasized repeatedly that AI 2027 was and is faster than their mode scenario, which makes doing this kind of evaluation annoying,
We’ve said that it was faster than our median, not our mode. I think it was close to most of our modes at the time of publication, mostly we were at around 2027-2028.
But the evaluation itself seems useful either way, in terms of checking in on how things are going relative to the trajectory that was our best guess conditional on the AGI timelines depicted.
“Slower takeoff should be correlated with ‘harder’ alignment (in terms of cognitive labor requirements) because slower takeoff implies returns to cognitive labor in capabilities R&D are relatively lower and we should expect this means that alignment returns to cognitive labor are relatively lower (due to common causes like ‘small experiments and theory don’t generalize well and it is hard to work around this’). For the same reasons, faster takeoff should be correlated with ‘easier’ alignment.”
Yes, that is what I’m saying. In general a lot of prosaic alignment activities seem pretty correlated with capabilities in terms of their effectiveness.
some reasons for anti-correlation, e.g., worlds where there is a small simple core to intelligence which can be found substantially from first principles make alignment harder, in practice there is an epistemic correlation among humans between absolute alignment difficulty (in terms of cognitive labor requirements) and slower takeoff.
Good points.
I don’t really understand why this should extremize my probabilities
For the “Does aligned DAI suffice?” section, as I understand it you define an alignment labor requirement, then you combine that with your uncertainty over takeoff speed to see if the alignment labor requirement would be met.
I guess I’m making a claim that if you added uncertainty over the alignment labor requirement, then you added the correlation, the latter change would extremize the probability.
This is because slower takeoff corresponds to better outcomes, while harder alignment corresponds to worse outcomes, so making them correlated results in more clustering toward worlds with median easiness, which means that if you think the easiness requirement to get alignment is low, the probability of success goes up, and vice versa. This is glossing a bit but I think it’s probably right.
Seems like diminishing returns to capabiltiies r&d should be at least somewhat correlated with diminishing returns to safety r&d, which I believe should extremize your probability (because e.g. if before you were counting on worlds with slow takeoff and low alignment requirements, these become less likely; and the inverse if you’re optimistic)
I agree about what is more evidence in my view, but that could be consistent with current AIs and the pace of their advancement being more compelling to the average reader, particularly people who strongly prefer empirical evidence to conceptual arguments.
Not sure whether Collier was referring to it being more compelling in her view, readers’, or both.
edit: also of course current AIs and the pace of advancement are very relevant evidence for whether superhuman AGIs will arrive soon. And I think often people (imo wrongly in this case, but still) round off “won’t happen for 10-20+ years” to “we don’t need to worry about it now.”
Okay, it sounds like our disagrement basically boils down to the value of the forecasts as well as the value of the scenario format (does that seem right?), which I don’t think is something we’ll come to agreement on.
Thanks again for writing this up! I hope you’re right about timelines being much longer and 2027 being insane (as I mentioned, it’s faster than my median has ever been, but I think it’s plausible enough to take seriously).
edit: I’d also be curious for you to specify what you mean by academic? The scenario itself seems like a very unusual format for academia. I think it would have seemed more serious academic-y if we had ditched the scenario format.
Thanks for writing this up, glad to see the engagement! I’ve only skimmed and have not run this by any other AI 2027 authors, but a few thoughts on particular sections:
My predictions for AI by the end of 2027
I agree with most but not all of these in the median case, AI 2027 was roughly my 80th percentile aggressiveness prediction at the time.
Edited to add, I feel like I should list the ones that I have <50% on explicitly:
AI still can’t tell novel funny jokes, write clever prose, generate great business ideas, invent new in-demand products, or generate important scientific breakthroughs, except by accident.
I disagree re: novel funny jokes, seems plausible that this bar has already been passed. I agree with the rest except maybe clever prose, depending on the operationalization.
LLMs are broadly acknowledged to be plateauing, and there is a broader discussion about what kind of AI will have to replace it.
Disagree but not super confident.
Most breakthroughs in AI are not a result of directly increasing the general intelligence/”IQ” of the model, e.g. advances in memory, reasoning or agency. AI can stay on task much longer than before without supervision, especially for well-specified, simple tasks. Especially since AI coding platforms will have gotten better at tool use and allowing AI to manually test the thing they’re working on. By the end of 2027, AI can beat a wide variety of video games is hasn’t played before.
I disagree with the first clause, but I’m not sure what you mean because advances in reasoning and agency seem to me like examples of increases in general intelligence. Especially staying on task for longer without supervision. Are you saying that these reasoning and agency advances will mostly come from scaffolding rather than the underlying model getting smarter? That I disagree with.
There is more public discussion on e.g. Hacker News about AI code rot and the downsides of using AI. People have been burned by relying too much on AI. But I think non-coders running businesses will still by hyped about AI in 2027.
Disagree on the first two sentences.
AI still can’t drive a damned car well enough that if I bought a car I wouldn’t have to.
I don’t follow self-driving stuff much, but this might depend on location? Seems like good self-driving cars are getting rolled out in limited areas at the moment.
As you touch on later in your post, it’s plausible that we made a mistake by focusing on 2027 in particular:
But I do worry about what happens in 2028, when everyone realizes none of the doomsday stuff predicted in 2025 actually came true, or even came close. Then the AI alignment project as a whole may risk being taken as seriously as the 2012 apocalypse theory was in 2013. The last thing you want is to be seen as crackpots.
I think this is a very reasonable concern and we probably should have done better in our initial release making our uncertainty about timelines clear (and/or taking the time to rewrite and push back to a later time frame, e.g. once Daniel’s median changed to 2028). We are hoping to do better on this in future releases, including via just having scenarios be further out, and perhaps better communicating our timelines distributions.
Also:
Listening to several of the authors discuss the AI 2027 predictions after they were published leads me to believe they don’t intuitively believe their own estimates.
What do you mean by this? My guess is that it’s related to the communciation issues on timelines?
The Takeoff Forecast is Based on Guesswork
Agree.
The Presentation was Misleading
Nothing wrong with guesswork, of course, if it’s all you’ve got! But I would have felt a lot better if the front page of the document had said “AI 2027: Our best guess about what AI progress might look like, formulated by using math to combine our arbitrary intuitions about what might happen.”
But instead it claims to be based on “trend extrapolations, wargames, expert feedback, experience at OpenAI, and previous forecasting successes”, and links to 193 pages of data/theory/evidence.
They never outright stated it wasn’t based on vibes, of course, and if you dig into the document, that’s what you find out.
I very much understand this take and understand where you’re coming from because it’s a complaint I’ve had regarding some previous timelines/takeoff forecasts.
Probably some of our disagreement is very tied-in to the object-level disagreements about the usefulness of doing this sort of forecasting; I personally think that although the timelines and takeoff forecasts clearly involved a ton of guesswork, they are still some of the best forecasts out there, and we need to base our timelines and takeoff forecasts on something in the absence of good data.
But still, since we both agree that the forecasts rely on lots of guesswork, even if we disagree on their usefulness, we might be able to have some common ground when discussing whether the presentation was misleading in this respect. I’ll share a few thoughts from my perspective below:
I think it’s a very tricky problem to communicate that we think that AI 2027 and its associated background research is some of the best stuff out there, but is still relying on tons of guesswork because there’s simply not enough empirical data to forecast when AGI will arrive, how fast takeoff will be, and what effects it will have precisely. It’s very plausible that we messed up in some ways, including in the direction that you posit.
Keep in mind that we have to optimize for a bunch of different audiences, I’d guess that for each direction (i.e. taking the forecast too seriously, vs. not seriously enough) many people came away with conclusions too far in that direction, from my perspective. This also means that some others have advertised our work in a way that seems overselling to me, though others have IMO undersold it.
As you say, we tried to take care to not overclaim regarding the forecast, in terms of the level of vibes it was based on. We also explicitly disclaimed our uncertainty in several places, e.g. in the expandables “Why our uncertainty increases substantially beyond 2026” and “Our uncertainty continues to increase.” as well as “Why is it valuable?” right below the foreword.
Should we have had something stronger in the foreword or otherwise more prominent on the frontpage? Yeah, perhaps, we iterated on the language a bunch to try to make it convey all of (a) that we put quite a lot of work into it, (b) that we think it’s state-of-the-art or close on most dimensions and represents subtantial intellectual progress, but also (c) giving the right impression about our uncertainty level and (d) not overclaiming regarding the methodology. But we might have messed up these tradeoffs.
You proposed “AI 2027: Our best guess about what AI progress might look like, formulated by using math to combine our arbitrary intuitions about what might happen.” This seems pretty reasonable to me except as you might guess I take issue with the connotation of arbitary. In particular, I think there’s reason to trust our intuitions regarding guesswork given that we’ve put more thinking time into this sort of thing than all but a few people in the world, our guesswork was also sometimes informed by surveys (which were still very non-robust, to be clear, but I think improving upon previous work in terms of connecting surveys to takeoff estimates), and we have a track record to at least some extent. So I agree with arbitrary in some sense in that we can’t ground out our intuitions into solid data, but my guess is that it gives the wrong connotation in terms of to what weight the guesswork should be given relative to other forms of evidence,
I’d also not emphasize math if we’re discussing the scenario as opposed to timelines or takeoff speeds in particular.
My best guess is for the timelines and takeoff forecast, we should have had a stronger disclaimer or otherwise made more clear in the summary that they are based on lots of guesswork. I also agree that the summaries at the top had pretty substantial room for improvement.
I’m curious what you would think of something like this disclaimer in the timelines forecast summary (and a corresponding one in takeoff): Disclaimer: This forecast relies substantially on intuitive judgment, and involves high levels of uncertainty. Unfortunately, we believe that incorporating intuitive judgment is necessary to forecast timelines to highly advanced AIs, since there simply isn’t enough evidence to extrapolate conclusively.
I’ve been considering adding something like this but haven’t quite gotten to it due to various reasons, but potentially I should prioritize it more highly.
We’re also working on updates to these models and will aim to do better at communicating in the future! And will take into account suggestions.
I think this might have happened because to us it’s clear to us that we can’t make these sorts of forecasts without tons of guesswork, and we didn’t have much slack in terms of the time spent thinking about how these supplements would read to others; I perhaps made a similar mistake to one that I have previously criticized others for.
(I had edited to add this paragraph in, but I’m going to actually strike it out for now because I’m not sure I’m doing a good job accurately representing what happened and it seems important to do so precisely, but I’ll still leave it up because I don’t want to feel like I’m censoring something that I already had in a version of the comment.)
Potentially important context is that our median expectation is that AI 2027 would do much worse than it did, so we were mostly spending time trying to increase the expected readership (while of course following other constraints like properly disclaiming uncertainty). I think we potentially should have spent a larger fraction of our time thinking “if this got a ton of readership then what would happen” and to be clear we did spend time thinking about this, but I think it might be important context to note that we did not expect AI 2027 to get so many readers so a lot of our headspace was around increasing readership.
Linking to some other comments I’ve written that are relevant to this: here, here
In terms of general intelligence including long-horizon agency, reliability, etc., do we think AIs are yet, for example, as autonomously good as the worst professionals? My instinct is no for many of them, even though the AIs might be better at the majority of sub-tasks and are very helpful as collaborators rather than fully replacing someone. But I’m uncertain, it might depend on the operalization and profession, for some professions the answer seems clearly yes.[1][2] It also seems harder to reason about than the literally least capable professional something like the 10th percentile.
If the answer is no and we’re looking at the ability to fully autonomously replace humans, this would mean the village idiot → Einstein claim might technically not be falsified. The spirit of the claim might be though, e.g. in terms of the claimed implications.
- ^
There’s also a question of whether we should include phyiscal abilities, if so then the answer would clearly be no for those professions or tasks.
- ^
One profession for which it seems likely that the AIs are better than the least capable humans is therapy. Also teaching/tutoring. In general this seems true for professions that can be done via remote work, don’t involve heavy required computer use or long horizon agency.
- ^
I’d be excited for people (with aid of LLMs) to go back and grade how various past predictions from MIRI folks are doing, plus ideally others who disagreed. I just read back through part of https://www.lesswrong.com/posts/vwLxd6hhFvPbvKmBH/yudkowsky-and-christiano-discuss-takeoff-speeds and my quick take is that Paul looks mildly better than Eliezer due to predicting larger impacts/revenue/investment pre-AGI (which we appear to be on track for and to some extent already seeing) and predicitng a more smooth increase in coding abilities, but hard to say in part because Eliezer mostly didn’t want to make confident predictions, also I think Paul was wrong about Nvidia but that felt like an aside.
edit: oh also there’s the IMO bet, I didn’t get to that part on my partial re-read, that one goes to Eliezer.
Looking through IEM and the Yudkowsky-Hanson debate also seems like potentially useful sources, as well as things that I’m probably forgetting or unaware of.
Eli was saying something like “90%-horizons of 100 years sound about right for Superhuman Coder level performance”
To be clear, this is on (a theoretical extrapolated version of) METR HCAST, not the real world distribution of software engineering projects.
Also to remind others of the definition of superhuman coder, it’s a pretty high bar:
Superhuman coder (SC): An AI system for which the company could run with 5% of their compute budget 30x as many agents as they have human research engineers, each of which is on average accomplishing coding tasks involved in AI research (e.g. experiment implementation but not ideation/prioritization) at 30x the speed (i.e. the tasks take them 30x less time, not necessarily that they write or “think” at 30x the speed of humans) of the company’s best engineer. This includes being able to accomplish tasks that are in any human researchers’ area of expertise.
Yeah this is the primary argument pushing me toward thinking there shouldn’t be a finite-time singularity, as I mentioned I’m not confident. It does feel pretty crazy that a limits-of-intelligence ASI would have a (very large horizon) time horizon at which it has 0.00001% reliability though, which I think is unavoidable if we accept the trend.
I think how things behave might depend to some extent on how you define an achieved time horizon; if there is a cost/speed requirement, then it becomes more plausible that longer horizon lengths would either have ~the same or lower reliability / success rate as smaller ones, once the AI surpasses humans in long-horizon agency. Similar to how if we created a version of HCAST but flipped based on AI times, then at a fixed speed budget human “reliability” might increase at higher time horizons, because our advantage is in long horizon agency and not speed.
In general things seem potentially sensitive to definitional choices and I don’t feel like I’ve got things fully figured out in terms of what the behavior in the limit should be.
Thanks for writing this up! I actually mostly agree with everything you say about how much evidence the historical data points provide for a superexponential-given-no-automation trend. I think I place a bit more weight on it than you but I think we’re close enough that it’s not worth getting into.
The reason we have a superexponential option isn’t primarily because of the existing empirical data, it’s because we think the underlying curve is plausibly superexponential for conceptual reasons (in our original timelines forecast we had equal weight on superexponential and exponential, though after more thinking I’m considering giving more weight to superexponential). I think the current empirical evidence doesn’t distinguish much between the two hypotheses.
In our latest published model we had an option for it being exponential up until a certain point, then becoming superexponential afterward. Though this seems fairly ad hoc so we might remove that in our next version.
The main thing I disagree with is your skepticism of the meta-skills argument, which is driving much of my credence. It just seems extremely unintuitive to me to think that it would take as much effective compute to go from 1 million years to 10 million years as it takes to go from 1 hour to 10 hours, so seems like we mainly have a difference in intuitions here. I agree it would be nice to make the argument more carefully, I won’t take the time to try to do that right now. but will spew some more intuitions.
“We might expect progress in chess ability to be superexponential, as AIs start to figure out the meta-skills (such as tactical ability) required to fully understand how chess pieces can interact. That is, we would expect it to require more new skills to go from an ELO of 2400 to 2500, than it does to go from an ELO of 3400 to 3500.”
I don’t think this analogy as stated makes sense. My impression is that going from 3400 to 3500 is likely starting to bump up against the limits of how good you can be at Chess, or a weaker claims is that it is very superhuman. While we’re talking about “just” reaching the level of a top human.
To my mind an AI that can do tasks that take top humans 1 million years feels like it’s essentially top human level. And the same for 10 milion years, but very slightly better. So I’d think the equivalent of this jump is more like 2799.99 to 2799.991 ELO (2800 is roughly top human ELO). While the earlier 1 to 10 hour jump would be more like a 100 point jump or something.
I guess I’m just restating my intuitions that at higher levels the jump is a smaller of a difference in skills. I’m not sure how to further convey that. It personally feels like when I do a 1 hour vs. 10 hour programming task the latter often but not always involves significantly more high-level planning, investigating subtle bugs and consistently error correcting, etc.. While if I imagine spending 1 million years on a coding task, there’s not really any new agency skills needed to get to 10 million years, I already have the ability to consistently make steady progress on a very difficult problem.
My understanding is that AI 2027′s forecast is heavily driven by putting substantial weight on such a superexponential fit, in which case my claim may call into question the reliability of this forecast. However, I have not dug into AI 2027′s forecast, and am happy to be corrected on this point. My primary concern is with the specific claim I am making rather than how it relates to any particular aggregated forecast.
You can see a sensitivity analysis for our latest published model here, though we’re working on a new one which might change things (the benchmarks and gaps model relies a lot less on the time horizon extrapolation which is why the difference is much smaller; also “superexponential immediately” is more aggressive than “becomes superexponential at some point” would be, due to the transition to superexponential that I mentioned above):
We’re working on updating our timelines in a bunch of ways, and I’ve thought a bit more about this.
My current best guess, which isn’t confident, is that if we took a version of the METR HCAST suite which didn’t have any ambiguities or bugs, then the AGIs would have infinite time horizons. And that we should discuss this theoretical task suite rather than the literal HCAST in our timelines forecast, so therefore we should expect a finite time singularity. If we kept the ambiguities/bugs instead we’d have an asymptote and a non-infinite value, as would humans with inifinite time. In the paragraph you’re quoting, I think that the main thing driving non-infinite values is that longer tasks are more likely to have ambiguities/bugs that make them unsolvable with full reliability.
I overall agree that things seem to be going slower than AI 2027 (and my median was longer when it came out).
However as mentioned in the caption, the green curve is a simplified version of our original timelines model. Apologies about that, I think it’s reasonable to judge us based on that.
FWIW though, the central superexponential Mar 2027 trajectory from our original model certainly is not strongly contradicted by GPT-5, both with and without an AI R&D speedup interpolation issue fixed.
The original model, filtered for superexponential (pre-AI-R&D-automation) trajectories that reach superhuman coder in 2027:
With AI R&D speedup bug fixed, also filtered for superexponential pre-AI-R&D-automation (backcast looks much better, GPT-5 prediction slightly worse):
Either way, we’re now working on a much improved model which will likely have an interactive web app which will provide an improvement over this static graph, e.g. you’ll be able to try various parameter settings and see what time horizon trajectories they generate and how consistent they are with future data points.
Note also that the above trajectories are from the original model, not the May update model which we unfortunately aren’t taking the time to create for various reasons, we think it would likely look a little worse in terms of the GPT-5 fit but might depend how you filter for which trajectories count as superexponential.
I (AI 2027 co-author) mostly agree (haven’t run this by any other authors).
I agree regarding the GPT-5 release being something to pay attention to.
Whenever I put specific metrics in the text it was supposed to refer to the end of the time period (so “mid 2025” would be end of August since that’s that’s 2⁄3 through the year), but I never made that clear anywhere, my bad and I think the middle of the period is a natural interpretation as well. You can see that on the side panel on the right the date starts at Aug 2025.
But either way, waiting until the end of Aug probably won’t change the directional update.
What these results do provide is evidence that, indeed, the suggested AI 2027 timelines are unlikely to be met; this should perhaps nudge us towards slightly longer timelines.
I agree that these results should nudge us toward slightly longer timelines, I think bolding that AI 2027 timelines are unlikely to be met seems like it’s taking too much from these few early data points if you’re implying that these play a big factor in that conclusion (though I might agreed with your assessment even before these data points depending on what you mean by unlikely).
As you allude to, the AI 2027 teams’ current median timelines are all longer than 2027 to varying extents (roughly, 2028-2033). We’re currently working on an update to our teams’ timelines forecasts and will share an update on our views soon.
If the consensus is that the original timeline was too aggressive, another idea for community effort is rewriting the original plot with a longer-term horizon, say an “AI 2030” narrative, which will be published here and then analyzed in real-time, month by month.
We’re in the early stage of working on something like this. It wouldn’t be just the original plot then stretched out, it would be our best guess predictions conditional on longer timelines.
In this Epoch paper appendix https://arxiv.org/pdf/2403.05812#page=12.3 they report efficiency improvements across 1.5+ years of time:
(a) is faster than your Mamba paper example but still much slower than 3-4x/year. (b) and (c) are at ~4x, though (c) isn’t much longer than a year. And these are basically not taking into account post-training efficiency gains iiuc.
We’re not working with many data points but it seems like these provide an existence proof that gains can compound across at least 3 years.
Would love to see some updated data collection on this, I think we could get more evidence on your hypothesis.
It seems more important whether humans can figure out how to evaluate alignment in 2028 rather than whether they can make human level aligned AGIs (though of course that’s instrumentally useful and correlated). In particular, the AIs need to prevent humans from discovering the method by which the AIs evaluate alignment. This seems probably doable for ASIs but may be a significant constraint esp. for only somewhat superhuman AIs if they’ve e.g. solved mech interp and applied it themselves but need to hide this for a long time.
Recent and forecasted rates of software and hardware progress
The timelines model didn’t get nearly as many reviews as the scenario. We shared the timelines writeup with all of the people who we shared the later drafts of the scenario with, but I think almost none of them looked at the timelines writeup.
We also asked a few people to specifically review the timelines forecasts, most notably a few FutureSearch forecasters who we then added as a final author. However, we mainly wanted them to estimate the parameter values and didn’t specifically ask them for feedback on the underlying modeling choices (though they did form some opinions, for example they liked benchmark and gaps much more than time horizon extension; also btw the superexponential plays a much smaller role in benchmarks and gaps). No one brought up the criticisms that titotal did.
In general the timelines model certainly got way less effort than the scenario, probably about 5% as much effort. Our main focus was the scenario as we think that it’s a much higher value add.
I’m been pretty surprised at to how much quality-weighted criticisms have focused on the timelines model relative to the scenario, and wish that it was more tilted toward the scenario (and also toward the takeoff model, which IMO is more important than the timelines model but has gotten much less attention). To be clear I’m still very glad that these critiques exist if the alternative is that they didn’t exist and nothing replaced them.
I’ll say various facts as best as I can recall and allow you and others to decide how bad/deceptive the time horizon prediction graph was.
The prediction on the graph was formed by extrapolating a superexponential with a 15% decay. This was set to roughly get SC at the right time, based on an estimate for what time horizon is needed for SC that is similar to my median in the timelines forecast. This is essentially a simplified version of our time horizon extension model that doesn’t account for AI R&D automation. Or another way to view this is that we crudely accounted for AI R&D automation by raising the decay.
This was not intended to represent our overall median forecast as that is later, but instead to represent roughly the trajectory that happens in AI 2027.
As is shown in titotal’s post, the graph is barely in distribution for the trajectories of our timelines model which reach SC in Mar 2027, it’s certainly not central. We did not check this before the AI 2027 release.
Why didn’t we use a central trajectory from our timelines model rather than the simplified version? This was on my TODO list, but I ran out of time. As you can imagine, we were working right up until a deadline and didn’t get to many TODOs that would have been great to have. But very likely I should have prioritized it more highly, so this is my mistake.
Or we should have more clearly labeled that the graph was not generated via the timelines model.
If we had the correct graph, then the new model releases would have been a bit above our predicted trend, rather than right on it. So it should be a very slight update toward the plausibility of shorter timelines than AI 2027.
I was only referring to our AI timelines mode, in this case it’s defined as the most likely year in which superhuman coder arrives.
In general the concept of mode for most of the scenario decisions seems not well defined as e.g. for non-naturally-numeric choices it depends on how you define the categories and what past events you condition on (for the timelines mode we’re conditioning on the starting point but in other cases one might condition on all events thus far).
I would personally describe our process as some mixture of sampling what intuitively feels most likely at each point (which might e.g. correspond to the mode of a natural categorical breakdown or of a distribution conditional on all events thus far, but we mostly didn’t explicitly calculate this), while also optimizing for making things not too degenerate and overall intuitively feel like a plausible trajectory (because by default doing mode every time would look unlike what we actually expect in some sense, because in the real world there will be many surprises).
As an example of how much definitions matter here, if we just conditioned on the previous conditions for each month and sampled what big algorithmic improvements might happen treating this as a categorical variable which enumerated many possible improvements, we might never end up with any specific algorithmic improvements or end up with them quite late in the game. But if we instead assume that we think overall probably some will come before superhuman coder and then pick what we think are the most likely ones even though any individual one may be <50% this quickly (though not totally clear in this case) and <<50% in any individual month, then we end up with neuralese recurrence and shared memory bank right before SC.
Perhaps a simpler example of how categorization matters is that if we break down possible AIs’ goals very granularly then we have the most peobabilities of AIs being very well aligned, relative to any very specific misaligned goal. But we overall have more probability on misalignment in this scenario so we first make that high level choice, then we choose one of the most likely specific misaligned goals.