My AI Vibes are Shifting
I think vibes-wise I am a bit less worried about AI than I was a couple of years ago. Perhaps (vibewise) P(doom) 5% to like 1%.[1]
Happy to discuss in the comments. I maybe very wrong. I wrote this up in about 30 minutes.
Note I still think that AI is probably a very serious issue, but one to focus on and understand rather than to necessarily push for slowing in the next 2 years. I find this very hard to predict, so am not making strong claims.
My current model has two kind of AI risk:
AI risk to any civilisation roughly like ours
Our specific AI risk given where we we were in 2015 and are right now.
Perhaps civilisations almost always end up on paths they strongly don’t endorse due to AI. Perhaps AI risk is vastly overrated. That would be a consideration in the first bucket. Yudkowskian arguments feel more over here.
Perhaps we are making the situation much worse (or better) by actions in the last 5 and next 3 years. That would be the second bucket. It seems much less important that the first, unless the first is like 50⁄50.
Civilisational AI risk considerations and their direction (in some rough order of importance):
More risk. I find it credible that this is a game we only get to play once (over some time period). AGI might lock in a bad outcome. The time period seems important here. I think if that time period is 1000 years, that’s probably fine—if we screw up 1000 years of AI development, with many choices, that sort of feels like our fault. If the time period is 10 years, then that feels like we should stop building AI now. I don’t think we are competent enough to manage 10 years the first time.
Unclear risk. The median “do AI right” time period seems about 6 years. A while back Rob and I combined a set of different timelines for AI. Currently the median is 2030 and the 90th percentile is 2045. If we say the ChatGPT launch was the beginning of the AI launch in ~2024 then about half the probability mass will have been in 6 years. That’s longer than 3 years, but shorter than I’d like. On the other hand 20 years to do this well seems better. I hope it’s at the longer end.
More risk. We live at a geopolitically more unstable time than the last 20 years. China can credibly challenge US hegemony in a way it couldn’t previously. AI development would have, I predict, been a clearly more US/European thing 20 years ago. That seems better, since there seem some things that western companies obviously won’t do[2] - see the viciousness of tiktok’s algorithm.
Less risk. Humans are naturally (and understandably) apocalypticist as a species. Many times we have thought that we are likely to see the end of humanic and almost all of those times our reasoning has been specious. In particular, we see problems but not solutions.
Uncertain risk. AI tools were trained on text, which seems to align them far more to human desires than one might expect. Compare this to training them primarily on the Civilisation games. What would that AI be like. That said, it now seems like if you even slightly misalign them during training, i.e., training them on making errors rather than making clean code, they can be misaligned in many other ways too. Perhaps giving them a strong sense of alignment also gives them a strong sense of misalignment. Likewise, future tools may not be trained on text; they might be primarily trained in RL environments.
More local considerations and their direction (in some rough order of importance):
Less risk. AI is progressing fast, but there is still a huge amount of ground to cover. Median AGI timeline vibes seem to be moving backwards. This increases the chance of a substantial time for regulation while AI grows. It decreases the chance that AI will just be 50% of the economy before governance gets its shoes on.
Uncertain risk. AI infrastructure seems really expensive. I need to actually do the math here (and I haven’t! hence this is uncertain) but do we really expect growth on trend given the cost of this buildout in both chips and energy? Can someone really careful please look at this?
Less risk. AI revenue seems more spread. While OpenAI and Anthropic are becoming far more valuable, it looks less likely that they will be $100T companies to the exclusion of everything else. Google, Meta, Tesla, US Government, Chinese Government, DeepSeek, Perplexity, Tiktok, Cloudflare, Atlassian, Thinking Machines. The more companies there are that can credibly exist and takes shares of the pie at the same time, the more optimistic I am about governance sitting between them and ensuring a single actor doesn’t deploy world destroying AGI. Companies are already very powerful, but why doesn’t Tesla have an army? Why doesn’t Apple own ~any land it controls the labour laws of? The East India Company has fallen and no company has ever taken its crown[3]. Governance is very powerful.
Less risk. AI revenues more spread part 2. The more revenue is likely to spread between multiple companies, the harder it is to justify extremely high expenditure on data centers which will be required to train even more powerful models. I think someone argued that this will make OpenAI/Anthropic focus on ever greater training runs so they can find the thing that they can take a large share of, but I think this has to be a negative update in terms of risk—they could have far more revenue right now!
Less risk. The public is coming to understand and dislike AI. Whether it’s artists, teachers, people who don’t like tech bros. Many powerful forces are coming to find AI distasteful. I think these people will (often for bad reasons) jump on any reason to slow or stymie AI. The AI moratorium was blocked. People cheer when Tesla gets large fines. I don’t think AI is gonna be popular with the median American (thought the median Chinese person, perhaps??).
More risk. Specific geopolitical conflict[4]. If China seems likely to pull ahead geopolitically, the US may pull out all the stops. That might involve injecting huge amounts of capital and building into AI. Currently China doesn’t seem to want to race, but they are much better at building energy and trying to get their own chip production. Let’s see.
What do you think I am wrong about here? What considerations am I missing? What should I focus more attention on?
- ^
I guess I am building up to some kind of more robust calculation, but this is kind of the information/provocation phase.
- ^
You might argue that China seems not to want to race or put AI in charge of key processes, and I’d agree. But given we would have had the West regardless, this seems to make things less worse than they could have been, rather than better.
- ^
Did FTX try? Like what was the Bahamas like in 10 years in the FTX success world?
- ^
I may be double counting here but there feels like something different about the general geopolitical instability and specifically how US/China might react.
- 6 Sep 2025 8:07 UTC; 5 points) 's comment on My AI Vibes are Shifting by (
I mean… I really hope this isn’t the list of considerations you consider most load-bearing for your estimates of AI risk. The biggest determinants are of course how hard it is to control system much smarter than us, and whether that will happen any time soon. It seems like the answer to the first one is quite solidly “very hard” (though not like arbitrarily hard) and the answer to the second one seems very likely “yes, quite soon, sometime in the coming decades, unless something else seriously derails humanity”.
After that, the questions are “can humanity coordinate to have more time, or somehow get much better at controlling systems smarter than us?”. The answer to the former is “probably not, but it’s really hard to say” and the answer to the latter is IMO “very likely not, but it’s still worth a shot”.
Another relevant crux/consideration would be: “will it turn out to be the case that controlling AI systems scales very smoothly, i.e. you can use a slightly superhuman system to control an even smarter system, etc. in an inductive fashion?”
My current answer to that is “no”, and it seems quite hard to learn more before it becomes a load-bearing guess. Digging into that requires building pretty detailed models.
I think you basically don’t talk about any of these in your post above? I also don’t know where your 5% comes from, so I don’t really know how to interface with it. Like do you actually mean that with 95% probability we will end up with an aligned superintelligence that will allow us to genuinely capture most of our cosmic endowment? If so, I have no idea why you believe that. Maybe you mean that with 90% probability AI will not really matter and never reach superhuman capabilities? That seems very unlikely to me, but seems fine to write about.
I would be interested to know how you think things are going to go in the 95-99% of non-doom worlds. Do you expect AI to look like “ChatGPT but bigger, broader, and better” in the sense of being mostly abstracted and boxed away into individual usage cases/situations? Do you expect AIs to be ~100% in command but just basically aligned and helpful?
These are vibes, not predictions.
But in the other worlds I expect governance to sit between many different AI actors and ensure that no single actor controls everything. And then to tax them to pay for this function.
Why doesn’t SpaceX run a country?
I mean… this still sounds like total human disempowerment to me? Just because the world is split up between 5 different AI systems doesn’t mean anything good is happening? What does “a single actor controls everything” have to do with AI existential risk? You can just have 4 or 40 or 40 billion AI systems control everything and this is just the same.
This seems a little bit like a homunculus sitting behind the eyes- the governance makes the AIs aligned and helpful, but why is the governance basically aligned and helpful? I am particularly concerned about the permanent loss of labor strikes and open rebellion as negotiation options for the non-governance people.
Do you think governance is currently misaligned. It seems fine to me?
How do you explain the news? Why do MM predictors keep missing negative surprises there?
I think current governments are kept in check, which scales differently than being aligned when the capabilities of the government are increased.
SpaceX doesn’t run a country because rockets+rocket building engineers+money cannot perform all the functions of labour, capital, and government and there’s no smooth pathway to them expanding that far. Increasing company scale is costly and often decreases efficiency; since they don’t have a monopoly on force, they have to maintain cost efficiency and can’t expand into all the functions of government.
An AGI has the important properties of labour and capital and government (i.e. no “Lump of Labour” so it does ’t devalue the more of it there is, but it can be produced at scale by more labour, but also it can organize itself without external coordination or limitations). I expect any AGI which has these properties to very rapidly outscale all humans, regardless of starting conditions, since the AGI won’t suffer from the same inefficiencies of scale or shortages of staff.
I don’t expect AGIs to respect human laws and tax codes once they have the capability to just kill us.
That seems more probably in a world where AI companies can bring all the required tools in house. But what if they have large supply chains for minerals and robotics and renting factory space and employign contractors to do the .0001% of work they can’t.
At that point I still expect it to be hard for them to control bits of land without being governed, which I expect to be good for AI risk.
I think that AI companies being governed (in general) is marginally better than them not being governed at all, but I also expect that the AI governance that occurs will look more like “AI companies have to pay X tax and heed Y planning system” which still leads to AI(s) eating ~100% of the economy, while not being aligned to human values, and then the first coalition (which might be a singleton AI, or might not be) which is capable of killing off the rest and advancing its own aims will just do that, regulations be damned. I don’t expect that humans will be part of the winning coalition that gets a stake in the future.
Thanks for publishing this!
My main disagreement is about a missing consideration: Shrinking time to get alignment right. Despite us finding out that frontier models are less misaligned by default than [1]most here would have predicted, the bigger problem to me is that we have made only barely progress about crossing the remaining alignment gap. As a concrete example: LLMs will in conversation display a great understanding and agreement with human values, but in agentic settings (Claude 4 system card examples of blackmail) act quite differently. More importantly on the research side: to my knowledge, there has neither been a recognized breakthrough nor generally recognized smooth progress towards actually getting values into LLMs.
Similarly, at least for me a top consideration that AFAICT is not in your list: the geopolitical move towards right-wing populism (particularly in the USA) seems to reduce the chances of sensible governance quite severely.
This seems basically true to me if we are comparing against early 2025 vibes, but not against e.g. 2023 vibes (“I think vibes-wise I am a bit less worried about AI than I was a couple of years ago”). Hard to provide evidence for this, but I’d gesture at the relatively smooth progress between the release of ChatGPT and now, which I’d summarize as “AI is not hitting a wall, at the very most a little speedbump”.
This is an interesting angle, and feels important. The baseline prior should imo be: governing more entities with near 100% effectiveness is harder than governing fewer. While I agree that conditional on having lots of companies it is likelier that some governance structure exists, it seems that the primary question is whether we get a close to zero miss rate for “deploying dangerous AGI”. And that seems much harder to do when you have 20 to 30 companies that are in a race dynamic, rather than 3. Having said that, I agree with your other point about AI infrastructure becoming really expensive and that the exact implications are poorly understood.
I think about two/thirds of this perceived effect are due to LLMs not having much goals at all rather than them having human compatible goals.
As for shrinking time to get alignment right, my worse-case scenario is that someone commits a breakthrough in AGI capabilities research and the breakthrough is algorithmic, not achieved by concentrating the resources, as the AI-2027 forecast assumes.
However, even this case can provide a bit of hope. Recall that GPT-3 was trained by using just about 3e23 FLOP and ~300B tokens. If it was OpenBrain who trained a thousand of GPT-3-scaled models with the breakthrough by using different parts of training data, then they might even be able to run a Cannell-like experiment and determine models’ true goals, alignment or misalignment...
This is not a really careful look, but: The world has managed extremely fast (well, trains and highways fast, not FOOM-fast) large-scale transformations of the planet before. Mostly this requires that 1) the cost is worth the benefit to those spending and 2) we get out of our own way and let it happen. I don’t think money or fundamental feasibility will be the limiters here.
Also, consider that training is now, or is becoming, a minority of compute. More and more is going towards inference—aka that which generates revenue. If building inference compute is profitable and becoming more profitable, then it doesn’t really matter how little of the value is captured by the likes of OpenAI. It’s worth building, so it’ll get built. And some of it will go towards training and research, in ever-increasing absolute amounts.
Even if many of the companies building data centers die out because of a slump of some kind, the data centers themselves, and the energy to power them, will still exist. Plausibly the second buyers then get the infrastructural benefits at a much lower price—kinda like the fiber optic buildout of the 1990s and early 2000s. AKA “AI slump wipes out the leaders” might mean “all of a sudden there’s huge amounts of compute available at much lower cost.”
I think this is a question on which we should spend lots of time actually thinking and writing. I’m not sure my approximations will be good at guessing the final result.
Please give takes on
https://www.lesswrong.com/posts/evYne4Xx7L9J96BHW/video-and-transcript-of-talk-on-can-goodness-compete
https://www.lesswrong.com/posts/4hCca952hGKH8Bynt/nina-panickssery-s-shortform?commentId=quPNTp46CRMMJoamB (my comment)
Action conditional or action unconditional? If people update “my world saving plan might work if I keep going hard” from this, does that change your view? What about if this makes alignment-concerned people stand down, how does that change your view?
I see I have 4 votes, with neutral karma overall. I should hope that the downvotes thought this wasn’t worth reading, as opposed to that they disagreed.
Since you imply you want feedback, I’ll give it.
First, to answer your question: the big thing you’re missing are the technical arguments for the difficulty of alignment. That’s where most of the variance lies. All of the factors you list are small potatoes if it just turns out that alignment is quite hard.
The other big factor is the overall societal dynamics. You casually mention”then we shouldn’t build it”. The big question is if we SHOULD stop, COULD we stop? I think there’s only maybe a 10% chance we could if we should. The incentives are just too strong and the coordination problem is unsolved. And the people in power on this are looking quite incompetent to deal with it. They’ll probably start taking it seriously at some point, but whether that’s soon enough to help is a big question. You could look at my Whether governments will control AGI is important and neglected if you wanted a little more of my logic on that.
It’s a problem for logic based on careful observations and data. Vibes-based arguments are actively unhelpful. It’s been fine to just guess at most other stuff in history, but not this.
I’m being kind of judgmental because I think having good estimates of AGI risk is quite important for our collective odds of survival and/or flourishing And LW is, among other things, the best place on earth to get good guesses on that.. I apologize for being a little harsh.
If you titled this “some factors maybe in AI risk” or “some changes that have shifted my p(doom)” or something and left out the p(doom) I’d have upvoted because you have some interesting observations.
As it is, I did very much think this is anti-worth reading in the context of LW. I couldn’t decide between normal and big downvotes.
I think this is polluting the community epistemics by making predictions based on vibes. Then you deny they’re predictions in the comments. P(doom) is very much a prediction and you shouldn’t make it publicly if you aren’t really trying. Or maybe that’s what twitter or reddit are for?
It would be pretty rare that 30 minutes of work would be worth reading on LW compared to the many excellent high effort posts here. I do appreciate you putting that caveat. The exception would be an expert or unique insight or unique idea. Just voicing non-expert quick takes isn’t really what LW is for IMO. At most it would be a quick take.
Or to put it this way: LW is for the opposite of vibes-based opinions.
I think considerations are in important input into decision making and if you downvote anyone who writes clear considerations without conforming to your extremely high standards then you will tax disagreement.
Perhaps you are very confident that you are only taxing bad takes and not just contrary ones, but I am not as confident as you are.
Overall, I think this is poor behaviour from a truth-seeking community. I don’t expect every critic to be complimented to high heaven (as sometimes happens on the EA forum) but I think that this seems like a bad equilibrium for a post that is (in my view) fine and presented in the way this community requests (transparent and with a list of considerations).
As for the title:
This is particular seems like a dodge. The actual title “My AI Vibes are Shifting” is hardly confident or declarative. Are you sure you would actully upvote if I had titled as you suggest?
I went back and reread it. Because you did mark that p(doom) as vibes based and Said you weren’t making strong predictions near the top, I removed my small downvote.
I said I’d have upvoted if you removed the prediction. The prediction is the problem here because it is appears to be based on bad logic—vibes instead of gears.
I have never downvoted something that disagrees with my stance if it tries to come to grips with the central problem of how difficult alignment is.
Nor have I downvoted pieces that scope to address only part of the pmquestion and don’t make a P(doom) prediction.
I have frequently complained on new authors’ behalf that the LW community has downvoted unfairly.
I think vibes which are actually gestalts of seeing a lot of mechanisms are potentially okay, but then I expect to see the vibes be responsive to evidence. Predictors who consistently get things right are likely to be pretty good at getting an accurate vibe, but then in order to export their view to others, I want to see them disassemble their vibe into parts. A major flaw in prediction markets in that they don’t demand you share the reasoning, or even have any particular reasoning. They allow being right for arbitrarily wrong reasons, which generalizes poorly.
I upvoted. While I disagree with most of the reasoning, it seems relatively clear to me that going against community opinion is the main reason for the downvotes. Consider this: If an author well known for his work in forecasting had pregresitered that he was going to write a bunch of not fully fleshed out arguments in favor or against updating in a particular direction, most people here would be encouraging of publishing it. I dont think there has ever been a consistent standard for “only publish highly thought out arguments” here, and we should not engage in isolated demands for rigor here, even if the topic is somewhat dicey.
What I expect is another series of algorithmic breakthroughs (e.g. neuralese) which rapidly increases the AIs’ capabilities if not outright FOOMs them into the ASI. These breakthroughs would likely make mankind obsolete.
When do you expect this to happen by?
I don’t know. As I discussed with Kokotajlo, he recently claimed that “we should have some credence on new breakthroughs e.g. neuralese, online learning, whatever. Maybe like 8%/yr?”, but I doubt that it will be 8%/year. Denote the probability that the breakthrough wasn’t discovered as of time t by P(t). Then one of the models is dP/dt=−PNc, where N is the effective progress rate. This rate is likely proportional to the amount of researchers hired and to progress multipliers, since new architectures and training methods can be cheaply tested (e.g. on GPT-2 or GPT-3), but need the ideas and coding.
The number of researchers and coders was estimated in the AI-2027 security forecast to increase exponentially until the intelligence explosion (which the scenario’s authors assumed to start in March 2027 with superhuman coders). What I don’t understand how to estimate is the constant c which symbolises the difficulty[1] of discovering the breakthrough. If, say, c was 200 per million of human-years, then 5K human years would likely be enough and the explosion would likely start in 3 years. Hell, if c was 8%/yr in a company with 1K humans, then the company would need to have 12.5K human-years, shifting the timelines to at most 5-6 years from Dec 2024…
EDIT: Kokotajlo promised to write a blog post with a detailed explanation of the models.
The worse-case scenario is that diffusion models are already a breakthrough.
You estimate c by looking at how many breakthroughs we’ve had in AI per person year so far. That’s where the 8% per year comes from. It seems low to me with the large influx of people working on AI, but I’m sure Daniel’s math makes sense given his estimate of breakthroughs to date