Contact me at x.com/testdrivenzen
Alex Amadori
FWIW, I think this is a problem for decades in the future. I agree that an ASI ban doesn’t solve the problem indefinitely, but we’ll have extra decades to figure out what to do.
Alex Amadori’s Shortform
I’m glad that me and my colleagues at ControlAI, Andrea and Gabe, managed to publish our posts (The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably, and Anthropic did not call for a pause) just a few hours before Dario Amodei published his essay Policy on the AI Exponential, because it’s quite topical.
Copy pasting my response thread on X seen through the lens of The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably:
This does not address catastrophic risks, and fails all three checks for a plan to address catastrophic risks.
Development, not deployment, of powerful AI needs to be restricted at a global level if we are to survive ASI.
The essay gestures vaguely at loss-of-control and seizures of power, but it’s not enough. And it doesn’t address the risk of war at all.
What do you think will happen if, as Dario suggests, a coalition of democracies bands together to get the lead on ASI? We will still be in a situation where everyone has to cut corners on safety to win the race, and we likely end up with an ASI killing everyone.
And whoever is losing still has a reason to attack preemptively with all their military might before the coalition gets a decisive advantage.
With respect to the post from my colleagues Andrea and Gabe Anthropic did not call for a pause, here’s a good comment from Nate Soares, that I think highlights how AI companies are heavily hedging to play both sides of the PR game:
In contrast with the last Anthropic blog post, Dario’s new one is back to softpedaling: five big subsections about “positive impact” and “securing leadership by democracies”, with one throwaway line on “loss of control of AI systems” buried deep.
If we’re talking about the differential effect of a given lab joining the race, then they could have a positive effect, if we know they have good intentions to benefit humanity.
FWIW, I think that this has mostly the effect of just adding fuel to the fire, because the government takes over the project regardless of the intentions of the company.
For this to be different, the “insight” would have to accelerate progress to ASI so much that the company can build ASI in a very short time while staying under the radar, including having very few employees and using little compute.
If you buy my arguments in the section “Why Technical AI safety agendas do not address this problem”, the appearance of such an insight would actually be extremely bad: AI safety is always more bottlenecked on humans compared to capabilities, and this company has very few humans!
Aside from whether the claims you are citing are true or well-calibrated, I want to point something out.
I actually think that your example scenarios here illustrate exactly the type of scenario that I wanted to disarm with this post. Or if not disarm them, at least give people the tools they need to disarm them.
It is very possible one lab/country gains a decisive advantage before any others. The approach to ASI Is likely to be chaotic and fraught with disagreement. It very well may not be obvious what is happening to other powers until a decisive advantage is gained. If the lab/country with a decisive advantage succeeds at technical alignment you may end up in a world which bypasses many of these concerns.
This very specific scenario, that I think could technically happen but is extremely unlikely, falls to the third filter, “nightmare singleton”: the winning AI company having to create a singleton in order to actually bypass the first and second filters.
You may believe this can be a good outcome, but you are still fundamentally trusting that the singleton established by a private company will be good for you / most people, which I don’t qualify as a satisfying solution to the third filter.
(The reason I think this scenario is unlikely is that for this to work the AI company would’ve had to get a really, really big advantage over anyone else: how did they do this without cutting too many corners on safety and failing at the second filter? And of course, the thing about the government taking over the project.)
It took the Soviets years of espionage to steal atomic secrets. If a lab is approaching ASI, one can expect that the pre-ASI AI will be heavily woven into their security architecture for securing model weights which may effectively prevent nation states from stealing them.
I could spend time arguing why I think this scenario is unlikely, but I think this would miss the main point of the post: it doesn’t address the fact that the winning AI company still needs to make a singleton, and we’re failing at the third filter.
The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably
At the risk of sounding repetitive, the claim at the top of the thread, that most people will see, is “ControlAI leadership has been quite consistently deceptive and engaged in strategic conflation of AI existential risk”
Do you disbelieve that this happened as described? That this kind of request does not implicate a person’s honesty in a way that isn’t trivially undone? Some other objection?
From my point of view, this event truly does not matter. I wasn’t there during the call, and I haven’t asked any of the involved parties for their side. I only have access to public information.
Given public information, the weight you guys attach to this story is completely unreasonable. Look at what I know:
There is a single event, that happened in 2023
Where Gabe asked you guys, in private, to moderate two comments he thinks were “unreasonably mean”. As far as I know, this “asking” extended only to him asking you to do so on a call, in private.
It is unclear to me whether he even asked you to delete the comments altogether, or asked for the tone to be adjusted, or anything else. I have access to very few details about the story.
Given that the details are inaccessible to me, I can’t really predict what I would think given all the information.
And this is weighed against the following publicly available information: since 2023, Gabe has done a shit ton of work to get as many people as possible to speak about x-risk, and has done so himself; a lot of this work is publicly verifiable.
Do you really think that a reasonable third party who looks at Gabe’s track record would agree with your characterization of Gabe being “pushing people to be coy about x-risk” if given more information about what privately happened in the call? This seems ridiculous.
I can’t even imagine what kind of publicly verifiable horror story you’d have to tell them for them to update this way about Gabe.
There is a recurring pattern of motte and bailey around these accusations. It always starts with accusations of very egregious high-level things like: “trying to get people to be coy around x-risks” or “being quite consistently deceptive and engaged in strategic conflation of AI existential risk with other issues to win political battles”.
This is always in the higher level threads or comments, which more people will see. Taken by themselves, these are ridiculous claims, as is shown by Andrea, Gabe and Connor’s track record.
When pushed on this, habryka retreats to much less egregious claims, like: “we once ran a campaign to ban deepfakes” and “there was once a disagreement between him and Gabe on how to moderate comments on a LessWrong post” But the initial accusation at the top of the thread is about coyness around x-risks, not about any other “bad things” that habryka might accuse them of having done.
Because the accusations are quite vague, it’s quite hard to refute them, despite a very strong track record of honesty, and it becomes possible for people to handwave connections between the smaller claims and the accusations.
“Worth a shot” is the type of conclusion that is best applied to things that have positive-skewed outcomes, but seems to be missing a mood when applied to things that could cause big positive or negative effects.
In my opinion, it is extremely overdetermined that we all die without a global ASI ban. I don’t have time to flesh out my full view rn, but to summarize, there are at least these 3 layers to it:
We have no idea how to build safe ASI; TAIS projects are not nearly on track to enable us to build safe ASI by the time we build ASI.
Even if you are more optimistic wrt to the feasibility of TAIS, there will be extreme pressure to cut corners on safety in a world without a global ASI ban (+ what Connor Leahy says in his post You can only build safe ASI if ASI is globally banned)
Powerful actors in the world are not going to wait around as you build your safe ASI and perform a pivotal act; they will take control of your ASI project, or declare war on you preemptively before you can get a DSA, or something else in that flavor.
While you could try to forward chain from the present and predict whether spreading awareness has specific negative or positive consequences, I think that backtracking from the desired future state (a global ASI ban) makes the decision much more obvious.
My reasoning goes: I want a robust global ASI ban. The only plan that i’ve ever seen about how to achieve a robust ASI ban involves mass awareness of x-risk from ASI.
The decision to pursue mass awareness is not about calculating the probs of various specific events and seeing if it results in a net positive expectation, it’s a matter of fulfilling an absolute requirement to achieving a global ASI ban, which is the only state the world can be where I’m sure ASI won’t kill everyone.
my view of how elites will act when informed about AI x-risk is based on actual examples that happened
If you are referring to the founding OpenAI and Anthropic, you should look at how they happened. Sam Altman is a career entrepreneur, whose job is to create businesses. Dario Amodei was an AI researcher, whose job was to research AI. Inertia, my default predictor for individuals, is a very good predictor here, good enough I don’t feel the need to perform any update on other heuristics.
In democratic institutions, things tend to happen quite differently. From the post:
Normally, parts of a country’s executive branch are responsible for international negotiations around urgent issues concerning national and global security. … Branches of the government are generally not in the business of independently taking bold positions and then pursuing those positions to their logical ends. Instead, their stances and actions are mostly shaped by prevailing social currents.
Basically decisions are much more diffused, and this should favor caution about x-risks if people are properly informed about x-risks.
As we point to in the post, our budget is very small compared to major political campaigns, and would still be at least an order of magnitude smaller at $50M / year.
More than half the post is about how funding is the main bottleneck, I would like to hear a more detailed counterargument to this point.
I think the attitude toward even just the EU AI Act on Capitol Hill is mostly derision
Keep in mind “leading by example” is not necessarily by impressing people on the Hill, the effect can also be through fostering popular support, or simply by putting in the initial impetus (big institutions can struggle with initiative) and then having the US take over negotiations later.
We’ve also written about how ASI prevention could happen through more distributed coalitions at asi-prevention.com if you’re interested.
Similarly to Andrea, I don’t expect us getting funded more heavily to significantly change funding or tactics in AI industry lobbying. Like concretely, the scale of AI industry lobbying is already ~100Ms, and potentially a lot more if you include broader marketing and influence operations, not just disclosed campaign and lobbying spend.
We could scale up 2 OOMs from where we currently are, and I don’t think that’s nearly true of AI industry lobbying, as that would take them to at least ~10B.
When it comes to tactics… as it says in the section “an asymmetric war”, it won’t be very effective for them to try to make the issue divisive. Instead it’s more likely there will be a lot of FUDding (https://en.wikipedia.org/wiki/Fear,_uncertainty,_and_doubt), but this is already happening, and I expect it to be their main strategy regardless of whether we receive funding.
The conflict-theoretic approach is to acknowledge that AI’s nature as an amplifier of power will always make it an irresistible temptation to elites.
I think this is not true for specifically for:
ASI
AI that can rapidly self-improve
Or at least whether it’s true is highly contingent on whether elites have been informed about x-risks. The whole thing with x-risk is that AI will not empower you, it will do the opposite: get out of your hands and probably kill you. At multiple layers:
You fail to control ASI and it kills you
Someone else fails to control their ASI and it kills both them and you
Your geopolitical adversary goes to war with you before you manage to build ASI cause they don’t want to run the risk of you getting there
We went into more detail in our geopolitics modeling paper, ai-scenarios.com
Signatures are very valuable rn, but as mentioned in the post, we’re already introduced (and achieved success in pursuing) more ambitious goals for lawmaker outreach.
More in general, goals have indeed been modified in the past, so that they would more closely track what we really cared about, so I’m fairly optimistic about this.
Hi Michael. Naturally a lot of resources will go to the US. For example, in the public awareness section in this post, we were assuming that 60% of ad spend will go to the US, and the largest policy headcount scaling in the US.
That said, an international prohibition will definitely need a coalition with more than just one 1 country supporting the agreement, and more than one country championing it will make it go further, faster.
A group of 2 to 3 middle powers who are extremely motivated to get an international prohibition on ASI could go a long way towards getting it done, even if most of it ends up being about motivating the US or leading by example. And to maximize the chances to produce these champions, we need to put our eggs in as many baskets as possible.
Our estimate of 10% chance comes from the following: two of us authors independently formulated our own guess, and both of these guesses ended up being close to 10% for $50M and 30% for $500M.
In the footnote, we list the conjunctive steps we considered; we’re interested in the readers’ guesses of the total probability after they considered these!
More detailed answer:
I can’t speak to how Andrea did it, but personally: I considered what were the major barriers to the desired outcome before producing the guess, and assigned probabilities of passing each barrier conditional on having passed the previous one.
After doing that for a little though, I started being worried about doing the calculation too formally, since I was adding a lot of conjunctions (when humans do estimates like this, the probability tends to go down arbitrarily by just adding more and more barriers), so instead I just went with what my gut said after having done this. This actually corrected me downwards a bit compared to what I’d have answered if you asked me to produce a gut estimate before this process.
I take your point that the post title can communicate more credibility than you’d assign to this process. Personally, I think the title is still ok. We’re talking about a world model that includes phenomena that are chaotic and not well modeled, like public opinion.
Any model of these phenomena will have the final answer depend fully on gut guesses, even if there’s some superstructure helping people with coherence etc.
EDIT: Either way, we changed the title so that we don’t risk being clickbaity!
Preventing extinction from ASI on a $50M yearly budget
I think this comment is making too many simplifying assumptions that will shatter on contact with the real world.
From the point of view of any entity pursuing an ASI project, in a world with no global ban, you will always want to deploy too early and risk destroying the universe.
If you’re allowed to run an ASI project, so are others. What does this mean for you?
For one thing...
The point of an ASI ban/pause is to create the time to reduce this gap
You can never know how big this gap is. Perhaps, in a world with much more advanced epistemology, you can get a usable estimate ahead of deployment; perhaps, in a world with much better coordination and strategy, you can gather enough information about it from smaller experiments and deployments without destroying the world.
But these worlds are very different from ours. We can write about them for fun, or as an intellectual exercise, but we should never forget that they are fantasy worlds. Any conclusion that starts from assuming we’re in one of these worlds simply does not apply to ours, and we should not confuse this fanfiction for predictions.
From your point of view, you never know how far away you are from building safe ASI, and you should place an unreasonable (in terms of risk) amount of probability on the outcome that if someone else builds it using your state of the art approach, everyone dies.
Do you place absolute trust in all other entities capable of developing ASI to not try? Of course not. So you’re going to cut corners.
And secondly...
Building ASI that is safe from your point of view is not just a technical problem. Other entities will have other views. In most cases, if a small group of people (compared to all 8 billion people on earth) gets an ASI that they are satisfied with, most of the world will not endorse the result.
You can see this concretely when US AI people espouse about the need to defeat China, or people from one lab talk about the need to defeat another lab. So again, you will naturally cut corners, and then everyone dies.
Responding really quickly, only have time to give a couple of thoughts instead of a well thought out answer.
Well, for one, he missed “the ability of internally deployed models to follow Agent-4′s path from AI-2027”.
He also completely fails to address the filters one and three from this post The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably, which are:
If any adversarial country can’t get assurance that you are not building ASI, they will start an existential war with you.
Even if Dario’s plan succeeds on its own terms, it is about creating a singleton, and I don’t fucking trust him to create a singleton (nor do I trust any group of countries to do so, in the current state of the world)