Contact me at x.com/testdrivenzen
Alex Amadori
my view of how elites will act when informed about AI x-risk is based on actual examples that happened
If you are referring to the founding OpenAI and Anthropic, you should look at how they happened. Sam Altman is a career entrepreneur, whose job is to create businesses. Dario Amodei was an AI researcher, whose job was to research AI. Inertia, my default predictor for individuals, is a very good predictor here, good enough I don’t feel the need to perform any update on other heuristics.
In democratic institutions, things tend to happen quite differently. From the post:
Normally, parts of a country’s executive branch are responsible for international negotiations around urgent issues concerning national and global security. … Branches of the government are generally not in the business of independently taking bold positions and then pursuing those positions to their logical ends. Instead, their stances and actions are mostly shaped by prevailing social currents.
Basically decisions are much more diffused, and this should favor caution about x-risks if people are properly informed about x-risks.
As we point to in the post, our budget is very small compared to major political campaigns, and would still be at least an order of magnitude smaller at $50M / year.
More than half the post is about how funding is the main bottleneck, I would like to hear a more detailed counterargument to this point.
I think the attitude toward even just the EU AI Act on Capitol Hill is mostly derision
Keep in mind “leading by example” is not necessarily by impressing people on the Hill, the effect can also be through fostering popular support, or simply by putting in the initial impetus (big institutions can struggle with initiative) and then having the US take over negotiations later.
We’ve also written about how ASI prevention could happen through more distributed coalitions at asi-prevention.com if you’re interested.
Similarly to Andrea, I don’t expect us getting funded more heavily to significantly change funding or tactics in AI industry lobbying. Like concretely, the scale of AI industry lobbying is already ~100Ms, and potentially a lot more if you include broader marketing and influence operations, not just disclosed campaign and lobbying spend.
We could scale up 2 OOMs from where we currently are, and I don’t think that’s nearly true of AI industry lobbying, as that would take them to at least ~10B.
When it comes to tactics… as it says in the section “an asymmetric war”, it won’t be very effective for them to try to make the issue divisive. Instead it’s more likely there will be a lot of FUDding (https://en.wikipedia.org/wiki/Fear,_uncertainty,_and_doubt), but this is already happening, and I expect it to be their main strategy regardless of whether we receive funding.
The conflict-theoretic approach is to acknowledge that AI’s nature as an amplifier of power will always make it an irresistible temptation to elites.
I think this is not true for specifically for:
ASI
AI that can rapidly self-improve
Or at least whether it’s true is highly contingent on whether elites have been informed about x-risks. The whole thing with x-risk is that AI will not empower you, it will do the opposite: get out of your hands and probably kill you. At multiple layers:
You fail to control ASI and it kills you
Someone else fails to control their ASI and it kills both them and you
Your geopolitical adversary goes to war with you before you manage to build ASI cause they don’t want to run the risk of you getting there
We went into more detail in our geopolitics modeling paper, ai-scenarios.com
Signatures are very valuable rn, but as mentioned in the post, we’re already introduced (and achieved success in pursuing) more ambitious goals for lawmaker outreach.
More in general, goals have indeed been modified in the past, so that they would more closely track what we really cared about, so I’m fairly optimistic about this.
Hi Michael. Naturally a lot of resources will go to the US. For example, in the public awareness section in this post, we were assuming that 60% of ad spend will go to the US, and the largest policy headcount scaling in the US.
That said, an international prohibition will definitely need a coalition with more than just one 1 country supporting the agreement, and more than one country championing it will make it go further, faster.
A group of 2 to 3 middle powers who are extremely motivated to get an international prohibition on ASI could go a long way towards getting it done, even if most of it ends up being about motivating the US or leading by example. And to maximize the chances to produce these champions, we need to put our eggs in as many baskets as possible.
Our estimate of 10% chance comes from the following: two of us authors independently formulated our own guess, and both of these guesses ended up being close to 10% for $50M and 30% for $500M.
In the footnote, we list the conjunctive steps we considered; we’re interested in the readers’ guesses of the total probability after they considered these!
More detailed answer:
I can’t speak to how Andrea did it, but personally: I considered what were the major barriers to the desired outcome before producing the guess, and assigned probabilities of passing each barrier conditional on having passed the previous one.
After doing that for a little though, I started being worried about doing the calculation too formally, since I was adding a lot of conjunctions (when humans do estimates like this, the probability tends to go down arbitrarily by just adding more and more barriers), so instead I just went with what my gut said after having done this. This actually corrected me downwards a bit compared to what I’d have answered if you asked me to produce a gut estimate before this process.
I take your point that the post title can communicate more credibility than you’d assign to this process. Personally, I think the title is still ok. We’re talking about a world model that includes phenomena that are chaotic and not well modeled, like public opinion.
Any model of these phenomena will have the final answer depend fully on gut guesses, even if there’s some superstructure helping people with coherence etc.
EDIT: Either way, we changed the title so that we don’t risk being clickbaity!
Preventing extinction from ASI on a $50M yearly budget
I think this comment is making too many simplifying assumptions that will shatter on contact with the real world.
From the point of view of any entity pursuing an ASI project, in a world with no global ban, you will always want to deploy too early and risk destroying the universe.
If you’re allowed to run an ASI project, so are others. What does this mean for you?
For one thing...
The point of an ASI ban/pause is to create the time to reduce this gap
You can never know how big this gap is. Perhaps, in a world with much more advanced epistemology, you can get a usable estimate ahead of deployment; perhaps, in a world with much better coordination and strategy, you can gather enough information about it from smaller experiments and deployments without destroying the world.
But these worlds are very different from ours. We can write about them for fun, or as an intellectual exercise, but we should never forget that they are fantasy worlds. Any conclusion that starts from assuming we’re in one of these worlds simply does not apply to ours, and we should not confuse this fanfiction for predictions.
From your point of view, you never know how far away you are from building safe ASI, and you should place an unreasonable (in terms of risk) amount of probability on the outcome that if someone else builds it using your state of the art approach, everyone dies.
Do you place absolute trust in all other entities capable of developing ASI to not try? Of course not. So you’re going to cut corners.
And secondly...
Building ASI that is safe from your point of view is not just a technical problem. Other entities will have other views. In most cases, if a small group of people (compared to all 8 billion people on earth) gets an ASI that they are satisfied with, most of the world will not endorse the result.
You can see this concretely when US AI people espouse about the need to defeat China, or people from one lab talk about the need to defeat another lab. So again, you will naturally cut corners, and then everyone dies.
ControlAI 2025 Impact Report: our progress toward an international ban on ASI
generally just dressing up everything in vibes without making any arguments.
Citing the second card on the page you linked, that you can see by scrolling down once:
Deepfakes can steal your face, your voice, and your identity.
They are often used to create sexually abusive material, commit fraud, and harass individuals.
Anyone with internet access can make a deepfake of whoever they want.
All they need is one photo of you or a 10 second voice clip.This page is part of a public campaign, so it’s not written in LessWrong English. My attempt to translate:
Deepfakes can greatly facilitate identity theft and scams compared to what could be done previously.
Deepfakes can be used to make porn that features people who didn’t give their consent (and just to be clear, the majority of people consider this as extremely morally abhorrent and consider this a moral priority, esp. if there was no x-risk or if they are not aware of x-risks)
It is so easy to make deepfakes that it’s only a matter of time until they become ubiquitous, once models that can output deepfakes are made publicly available.
You are vulnerable even if you’re not a public figure / don’t post a lot of content online.The page does go on to make a few more arguments, that I don’t have time to point out now. These arguments are clearly spelled out near the top of the page.
The latter would involve discussion of considerations like: sometimes lab leaders need to change their minds. To what extent are disparities in their statements and actions evidence of deceptiveness versus changing their minds? Etc. More generally, I think of good critiques as trying to identify standards of behavior that should be met, and comparing people or organizations to those standards, rather than just throwing accusations at them.
This sounds to me like a very weak excuse. If you change your mind on something this important (for example, you are now confident alignment is easy despite expecting RSI in a couple of years and doing your best to accelerate it), you had better say so very clearly and very publicly.
This is what a company / leadership / CEO etc. would do if they had a somewhat strong deontology. Just from observing this lack of candor, one should consider Anthropic to be impossible to coordinate with, which is a) not what you want from a frontier AI company and b) clearly justifies calling it “untrustworthy” unless we want to be nitpicky with language, to a degree that IMO is clearly unnecessary.
In this situation, since they lied in the past, we’re way past the point where we outsiders (including people who work at Anthropic but can’t read the mind of leadership!) can evaluate whether we disagree with Anthropic about critical matters like how difficult alignment is, and try to change their mind (or at least not work for them) if we think they are wrong.
We’re in a situation where Anthropic should be considered adversarial. Even if tomorrow they released a statement that they now think alignment is easy, I can’t take that statement at face value. Maybe they think alignment is easy; maybe they decided it’s impossible to coordinate with anyone and they will lower risk by an epsilon if they are the first to get DSA; maybe they think it’s better if everyone dies than to let China win the race; maybe they just think it’s fun to build ASI and lied to themselves to justify doing it; who knows.
Against an adversarial opponent, we use POSIWID. We assume there is a hidden goal and try to guess the simplest hidden goal from actions while ignoring statements. What is the simplest hidden goal we can guess? Anthropic doesn’t want to be regulated and want to stay in the lead. I can predict that they will keep taking actions that make regulation harder and make them stay in the lead.
control granularity might be higher, and permanence might be higher
That’s pretty much it, just to an extreme extent. Given that ASI could be extremely powerful and that it’s really hard to predict what it could do, I recommend thinking of it as:
The control granularity is basically infinite (the state can read your mind, persuade you to take arbitrary actions, predict what you’ll do long in advance, etc.)
The permanence is basically infinite (eg. until the heat death of the universe)
I keep trying to map this onto Canada’s situation and getting stuck. We’re mid-negotiation with the US on trade and political capital is finite. How does leadership spend it on ASI risks most Canadians aren’t thinking about?
IMO the key here is “most Canadians aren’t thinking about”, this can be changed through awareness campaigns. Most people aren’t aware that AI companies are shooting for ASI, and wouldn’t like it if they knew.
First, timing … If US or Chinese leadership believes ASI is 2-5 years out, they’ll absorb enormous economic costs for a shot at decisive advantage
I think this is reasonable, which is why we include the more extreme measures including recognition of the right to self-defense. I personally would be surprised if we could throw this together so quickly that none of the conditional deterrence measures ever need to be activated...In darker timelines, I think the more extreme economic measures could slow down the superpower AI programs and give time for middle powers to get more serious with their military deterrence, which has a good chance of being effective imo.
The US has extensive tools for pressuring middle powers to defect. The proposal assumes coalition members absorb retaliation costs collectively, but the US can apply pressure bilaterally in ways that make early defection attractive. China has its own methods, but I can only speak confidently about America.
This is a good point. We didn’t have time to address this in the first version of the proposal, but there are potentially some mitigations that can be implemented here, like very heavy penalties for defecting from the agreement.
At the end of the day though… you just must get deep buy in about the x-risks of ASI among middle powers (including softer ones like the possibility of a permanent US and China singleton).
More superficial motivations could be easy to break, but I think it would be difficult to tempt a country where the relevant decision makers think the best case scenario for ASI is for one’s state to be completely dismantled by a US singleton (effectively if not literally).
How middle powers may prevent the development of artificial superintelligence
Modeling the geopolitics of AI development
Very well written horror story! Props :)
In my opinion, it is extremely overdetermined that we all die without a global ASI ban. I don’t have time to flesh out my full view rn, but to summarize, there are at least these 3 layers to it:
We have no idea how to build safe ASI; TAIS projects are not nearly on track to enable us to build safe ASI by the time we build ASI.
Even if you are more optimistic wrt to the feasibility of TAIS, there will be extreme pressure to cut corners on safety in a world without a global ASI ban (+ what Connor Leahy says in his post You can only build safe ASI if ASI is globally banned)
Powerful actors in the world are not going to wait around as you build your safe ASI and perform a pivotal act; they will take control of your ASI project, or declare war on you preemptively before you can get a DSA, or something else in that flavor.
While you could try to forward chain from the present and predict whether spreading awareness has specific negative or positive consequences, I think that backtracking from the desired future state (a global ASI ban) makes the decision much more obvious.
My reasoning goes: I want a robust global ASI ban. The only plan that i’ve ever seen about how to achieve a robust ASI ban involves mass awareness of x-risk from ASI.
The decision to pursue mass awareness is not about calculating the probs of various specific events and seeing if it results in a net positive expectation, it’s a matter of fulfilling an absolute requirement to achieving a global ASI ban, which is the only state the world can be where I’m sure ASI won’t kill everyone.