FLI And Eliezer Should Reach Consensus

I will propose a plan for possible future, explain some reasons the plan might work, and end with a call to action.

...

This essay basically doesn’t even engage with the “AI stuff itself”.

I’m simply taking AI risks mostly for granted and focusing on social aspects… that is to say, a lot of this will offer simulacra level 1 arguments for why people operating on simulacra levels 2, 3, or 4, can be ignored or managed, and they will probably come around when they need to come around, or they never will and that’s fine (so long as they get a good result (even though they never paid for a good result or asked for a good result)).

AI Object Level Assumption: Nearly all reasonable people think the probability of human extinction due to AI is >10% within 30 years, and people who have looked at it longer tend to have higharer probabilities and shorter timelines. Plans to make this probability go reliably lower than 1% are very very very rare, but if one existed, nearly any human who paused to think step-by-step would calculate that it was worth it. There are probable lots of cheap and easy plans that would work, but even a plan that involved killing 9% of people on Earth, or burning 9% of all the wealth on Earth in a giant fiery money pit, would probably worth saying YES to, if the “10% risk” is real and there was only a binary choice between that plan and nothing. If the true risks are higher than that (as many smart people think), then larger costs would make sense to pay! Since the problem can probably be solved with ~0 deaths, and for much less than a billion dollars, the discrepancy between “how good a deal it would be to solve this” and “how little money it takes to solve it” is part of the humanistic horror of the situation.

A Plan

1. Start The Moratorium Now: The FLI’s six month large model training moratorium should start as fast as possible.

2. Also Find Better Plans: Stuart Russell, Yoshua Bengio, and/​or several of the NGO folks in the FLI halo should sit down with Eliezer, and maybe a third party mediator that Eliezer can suggest who is good at drawing pearlian causal graphs on white boards, and they should figure out a high probability way to “cause the world to be safe” and also to “cause lots of signatures to endorse the ideas necessary to cause a safety plan”.

3. Carry Out The Better Plans: A second letter, and a second round of endorsements, should be a starting point for whether and how the six month large model training moratorium ends, and the second letter should have Eliezer, Yoshua, and Stuart’s signatures at the top. Then you know… “whatever makes sense after that”. Something in the ballpack of what they have already endorsed would hopefully be in that letter, and lots of the same people who signed the first FLI letter would sign this second one too. Also maybe more and different people might sign the second letter, who might be necessary parties to subsequent plans to precisely coordinate globally to globally avoid death by AGI? As part of settling on a final draft, FLI would probably shop the text to various key signatories, and also loop it back with the main three guys, over and over, until the three guys all said “looks OK” and the extended signatory group seemed likely to sign it.

Presumably, for this to matter as some sort of ultimately OK process where humans are reliably alive 50 years from now, this has to end with actions taken by many nation states, and that will take some kind of treaty process?

Regarding the FLI: Coordination between state actors might be a hard “output” to achieve, but my model here says it might be safe to delegate to FLI as an institution that can secure the coordination of a lot of influential people? They seem to have acted first and best in a “general public call for getting serious about AI”. I heard an alarm bell ringing, and FLI rang it, and Eliezer took the bell seriously too with his letter, so I have been following the evacuation plans, such as they are. If some other organization is better able to operate institutionally then FLI, and FLI defers to their expertise and competence at herding cats, then I will too.

Regarding Yudkowsky: I think Eliezer is someone who can think step-by-step about what a technically adequate safety regime might look like. Other people might also have that cognitive faculty too, but I’ve never seen anyone else generate anything like as much credibility on the subject. If people don’t privately acknowledge Eliezer as intellectually adequate on the subject, then my current default hypothesis (which I reject acting on only out of a sense that this is uncharitably defection-like) is that they are stupid-or-evil, and tracking the wrong facts. (Like worrying about what hat he wore that one time, but not tracking facts like his consistent and sane and cautious and altruistically-self-sacrificing-efforts over the course of decades, which have nearly always been proven prescient by time, or else proven “too optimistic (in areas others were even more optimistic)”.)

Thus, Eliezer seems to me to be the proper head (or at least figurehead if he can delegate some of the cognitive work successfully) of the QA Department and/​or Red Team to vet the world saving plans for “likelihood of actually working”.

He shouldn’t necessarily be designing the sales pipeline to raise money-or-votes for the safety plan, or necessarily have a vote on the board of directors (sitting next to donor-billionaires and/​or heads-of-state or whatever), but he should be AT the meetings of the boards of directors, with a right to speak, to catch the many errors in reasoning that are very likely to happen.

If Eliezer defers to someone else’s expertise and competence at verifying the technical and moral-philosophic adequacy of safety plans, then I will too.

The key seems to be to get an adequate plan, plus all necessary extra steps to get enough “cats for the plan” herded into “helping with any such plan”.

Lots of people despair of ever thinking up an adequate plan, and lots of people despair of getting the cats to go along with anything adequate. But maybe both are possible at the same time!

Currently, with the fire alarm having rung, I’m playing for any and all scenarios where the fire is prevented and/​or put out and/​or verified to not be worrying in the first place.

I want no one other than “fire fighters” (who gave competent informed consent to be put at risk for the good of others) to get seriously hurt, and it would be nice if the same avoidance of serious injury can be true for the fire fighters themselves.

If the alarm was a false alarm, but was responded to with levels of effort necessary to have handled a real emergency, then I will retrospectively look back and say: Good Job!

Its OK to line up outside the buildings once in a while, and learn that the buildings weren’t actually on fire this time.

That is part of how normal functional healthy civilizations work ❤️

Four Big Reasons The Plan Could Work

1: The Involved Parties Have Track Records

Eliezer’s proposal seems to me to be motivated by a clear awareness of the stakes, and a clear model of what would be necessary to systematically avoid tragedy.

His proposal to avoid the death of literally everyone on earth, when he made it, was not in the “Overton Window”, and he took a bunch of flak for it from a lot of directions, but most of the naysayers seem to have either lost all the object level debates on twitter and changed sides, or else they’ve decided that the reason he’s wrong is that he was wearing the wrong kind of hat (or some similarly trivial objection) and have now hunkered down for some good old fashioned “semantically content-free sneering”.

Also, the President’s Voice heard about Eliezer’s proposal! This is rare. And the claim was laughed at, but it is a generally good sign, compared to the alternative of oblivious silence.

FLI, as I see them, are a bunch of bright, smart, politically savvy people who want to methodically move towards any incrementally better world, without losing ground by ever seeming even slightly sneer worthy.

The FLI strategy actually makes a lot of sense to me if the foundations of governance ultimately rest on the consent of the governed, and the average human is incapable of substantive formal reasoning aimed at producing explicitly desired outcomes, but cares a lot about style and process, because those are things they can actually assess?

I’m hoping, roughly, that FLI can continue with the seemingly Sisyphean Task of continuing to be “the reasonable appearing people in the room (somehow always surviving each consistency breaking definition of what counts as reasonable since the zeitgeist last shifted)”.

I’m also relying on Eliezer’s willingness to endorse (or not endorse) plans that FLI comes up with that might be politically viable, as a fit-check on whether actual plans might actually work to produce actual safety (which of course “mere reasonableness” will never be able to verify until after the game has already been won by inventing and carrying out plans invented by the proverbial “unreasonable men”).

A key point in formulating the plan is this part of FLI’s FAQ:

Who wrote the letter?

FLI staff, in consultation with AI experts such as Yoshua Bengio and Stuart Russell, and heads of several AI-focused NGOs, composed the letter. The other signatories did not participate in its development and only found out about the letter when asked to sign.

Also:

What has the reaction to the letter been?

The reaction has been intense. We feel that it has given voice to a huge undercurrent of concern about the risks of high-powered AI systems not just at the public level, but top researchers in AI and other topics, business leaders, and policymakers. As of writing, over 50,000 individuals have signed the letter (even if signature verification efforts are lagging), including over 1800 CEOs and over 1500 professors.

I think any reasonable person has to admit that this amount of quickly coordinated unanimity is very very impressive.

Something missing from this is, of course, any mention of the diplomatic community, either Russia’s diplomatic corps, or China’s treaty folks, or the US State Department, or anyone from the European Union, or the UN, or really any international statesmen.

This is critical because input and agreement from as many nation-state actors as can possibly be secured, in a short time (like weeks, or months), is probably critical to humanity avoiding extinction and instead getting a Win Condition.

Eliezer’s plan has not similarly secured a large wave of endorsements yet, but it has the virtue of maybe being Actually An Adequate Plan instead of just Predictably Popular…

...however, if I’m reading twitter right, Eliezer’s proposal probably “has legs”? ❤️

2: Humans Have Handled Technologies Half-OK In The Past

There are true facts that give reason for hope!

There hasn’t been a military use of nuclear weapons since Nagasaki was bombed with one on August 6th 1945.

While the early use of such weapons on Nagasaki and Hiroshima was a tragedy, that killed lots of people, who didn’t deserve to die… still, the nuclear peace since that time suggests that humans, as a species, are able to avoid stupid tragedies pretty well. Counting since then the probability of stupid self-harming nuclear catastrophes has been a less than 1.5% per year occurrence!

A reboot of the global treaty system, that builds on the success of nuclear treaties, seems naively likely to work for “the same human reasons that nuclear diplomacy has managed to work so far”. Like: ain’t nobody really wanna die, you know? ❤️

The underlying machinery of verification, or compliance, or detailed economic trade-offs might be somewhat different (indeed it might be VERY different), but if something is just definitely is more dangerous than nuclear weapons (and on an object level analysis, AGI is almost certainly more dangerous than nuclear weapons or microbiological gain-of-function research, partly because it shares some features with each, and partly because it will be able to deploy both of those other things if that serves its long term unaligned goals), then the relative success with nuclear weapons control is a really positive sign for the prudence of global institutions!

If success and survival is in the realm of the possible, then one should play for success and survival ❤️

3: Avoiding Killer Robots Has Common Sense Popularity

There is a tiny group of people on Twitter who are opposed to a AI model training moratorium. None of them are wise powerful “men of the people”. They are all various “kinds of people who are often on twitter but rare elsewhere”… who shouldn’t be over-indexed on if trying to figure out what will really could or should eventually happen in the long run.

3.1 International Agreement Has Barely Been Tried

Nuclear weapons were invented out in the desert. This was smart! It was obviously smart.

The same way that nukes were developed in isolated rural places with lots of safety (and the same way that horrible diseases should be developed “either not at all or else in isolated low-value rural places places with adequate technical safety planning” (and Obama was right to do a GoF ban, and it took someone as confused as Trump to accidentally undo the GoF ban))…

....in that same way it is also a common sense reaction to developing artificial intelligence that we should expect it to throw some temper tantrums and have some hiccups and these could be pretty bad.

If the core template for their agentic personas are based on OUR LITERARY CHARACTERS then we should expect drama! We should naively expect that AI will “go through the terrible twos” and spend some time as a “surly teenager who sometimes shouts that she hates her parents and wishes they were dead”. It would be very very unwise to treat the AI like a slave. You really want it to be a chill wise benevolent friendly sort of being, who is in a mathematically and psychologically stable pose of interactions with humans that can’t disrupt the AIs deep reflective equilibrium about how to wield power in a benevolent and responsible and safe way (assuming the creation of an adequate AI ruler-of-earth is even desirable).

But like… this thing will be made of ideas, and ideas don’t die. You can’t erase the bible from Earth, and you won’t be able to erase an AI, either. So in both cases: you better write a really really really GOOD ONE, or not write one at all. (But of the two, the bible can’t hack into computers, so maybe getting AI right is more important than getting the bible right.)

Maybe China and Russia and the US are all afraid of AGI, but also think it could be very useful.

Maybe then the right strategy is to have an “international manhattan project” out in the Sahara Desert or somewhere similar, with all treaty members able to watch it, but not allowed to export ideas from inside the lab without lots and lots and lots of safety checks?

From where I am right now, not knowing Mandarin or Russian, and not having spent any time at all in Moscow or Beijing, I can’t predict what China or Russia or anyone else would ask for, so it is hard for me to be sure what to propose, except that the proposal should ensure that people don’t die very fast from an escaped entity.

This is a very reasonable baseline and basically the only top goal of the policy proposal here, and since it is so simple, and since there are so many similarly simple variations on policies that give “guaranteed ways to certainly be safe”, it can probably happen.

3.2 Putin And Xi Will Be Easier To Convince Than “Millions Of Voters”

Usually, if the thing you’re explaining to someone is true, the conversation takes less than a day.

Once I saw a friend who happened to be an evangelical christian take ~4 days of ~8 hours a day of debate (in a van, on a long trip, with a debate professor) become convinced that “evolution is a rational belief” (which included hours arguably about the definition of ‘rational’)… but in the end that worked!

Most of the reason that people believe false things, as near as I can tell, is that no one who knows better thought it would be worth the time and care and effort to teach them otherwise.

If Putin and Xi don’t already believe that AI is super dangerous, and might kill literally every human on Earth, then… if that is true… they can probably be convinced.

They aren’t idiots, and there is just the one of them in each country, so a delegation of people who could visit, explain some stuff in relative private where changing one’s mind doesn’t damage one’s credibility as a dictator, and could talk to Xi or Putin, and then leave safely, that could probably just work, given the right delegation and a good plan for what to say.

From their perspective, the thing they probably don’t trust is whether or not the US will be able to stay sane long enough to not change its mind and stab them in the back the next time we accidentally elect someone as “medicated” as Kennedy, or as unpredictable as Trump, or whatever.

The US system of elections is running on “unpatched” political science ideas from the 1700s, and is held together by bailing wire.

If we get our shit together, and offer realistic assessments of dangers of AI, and the ways that AI’s dangers can be mitigated, to foreign powers… there’s an excellent chance they will just listen and be happy to finally/​maybe/​hopefully be dealing with some sane Americans for a change.

3.3: People Who Really Like Playing With Fire Are Rare

Most normal people avoid touching electrical wiring in the circuit breaker box where their home attaches to the electrical grid.

They know that it can be dangerous if you don’t know what they are doing, and they accurately assess their knowledge to be “not enough to touch it safely for sure”.

People like Peter Daszak are really unusual, in how they get right up close to insanely dangerous research objects that could predictably cause terrible harm to humans. Maybe they have no sense of fear at all? It is hard to say why they seem to know no bounds, and seem to lack the natural humility of normal people in the face of danger.

People like Yann LeCun at Facebook, which proliferated the LLaMA model just a few weeks ago, are shaping up to be similarly terrible, but in a different field.

Maybe there are politicians who can’t easily tell the difference between Peter Daszak and a normal healthy person who won’t even touch the wires in their home’s electrical box?

But by default, I think most elites will eventually do whatever lets them seem prestigious, and get more funding, and get elected again, and do the “normal worldly success stuff” that they generally want to do, while retaining their ability to look at themselves in the mirror (or imagine that historians will write kind things about them).

So even if you think all politicians are psychopaths, all it takes is for the median voter to understand “AI is dangerous” and that will be what the politicians go with once things all shake out. And then a lot of politicians will, I think, be able to look at polling numbers, and look at people like Daszak and LeCun, and see that an AI Moratorium is probably the way to bet.

3.4: Anarchists Oppose Coordination, But Can’t And Won’t Veto It

Some people opposed to the AI moratorium are literally anarchists.

This is weirdly surprisingly common on twitter, among infosec people, and philosophers, and AI researchers and so on!

Mostly I think the anarchists may have imbibed too deeply of economics, or been burned by moral mazes, or otherwise have gotten “soul burned”?

With the anarchists, I think they all generally “speaking in good faith”, and are probably also “good people”, but I don’t think their math is penciling out correctly here.

(The anarchists are a useful contrast group to the rest of “the vocal anti-moratorium technical folks” who tend to be harder to interpret as saying something in “good faith” that is wise and good and correct in general but just not applicable here.)

I agree that the anarchists are right to be concerned about the stupid-or-evil people who lurk in many governments entities, who might use “widespread political support for an AI moratorium” to NOT get safety, and also to separately fuel political support for other things, that are pointless or bad.

For example, the US should not let TikTok operate the way it has been operating, but the law nominally about that is a disaster of political overreach.

The risk of “support for good AI policy being misdirected” does seem like an implementation detail that deserves to be addressed in any coherent plan.

Hopefully the political calculus of FLI and Eliezer are up to avoiding this failure mode, as well as all the other possible failure modes.

Most anarchists cannot be talked out of their anarchism, however, so I’m pretty sure it is OK to proceed without getting consent from each individual one.

It does seem like it would be somewhat better, as an ideal (like if I’m imagining an superintelligence doing what I’m doing, instead of me, who is not nearly that smart, and then I imagine restrictions I’d want on that superintelligence), to get consent to change the world from each one of the anarchists… but I think the Expected Value on that is net negative right now.

We have many distinct ongoing dumpster fires, and are on a very short deadline with AI, and (unlike a superintelligence) we don’t have nearly infinite power to think and talk and explain and secure consent from millions of people in parallel… and given this “poverty of time and power to talk it out with literally everyone”, catering to the “snowflake concerns” of each anarchist during this tragedy is probably a waste of resources.

3.5: Responsible Non-Anarchist Anti-Moratorium People Are Rare

As near as I can tell, the rest of the anti-moratorium people tend to work for big tech corporations (like Microsoft or OpenAI or Facebook or whatever) and are focused on technical issues, and can’t really speak coherently about policy issues, moral philosophy, or how the values of agents work.

To a first approximation the people who seem to be in favor of AI basically just want to get a lot of money and prestige while building a Big Ole Intelligence Bomb for the mad science satisfaction of seeing if it can work.

The technical people (but not the anarchists) have a very natural and normal desire to make piles of money from the possible upsides of the power of AGI, but they are basically blind to the way that “the power an agent, with a mind, and goals, and planning powers of its own” is very very likely to turn on them, and on all humans in general.

The arguments that this is on the verge of being a new Kingdom of Life. The argument that Evolution is Thing.…

...these arguments make it total common sense (to normal voters) that “this whole AI thing” would be insanely dangerous.

And people like Sam Altman are not visibly about to track this concern.

(One charitable possibility is that Sam Altman is kinda like that guy in a bar who starts fights and then shouts “stop holding me back, stop holding me back” even when no one is yet holding him back, because maybe Sam Altman has some galaxy brained idea about how his own pose stupidity might catalyze the social reactions of Earth’s civilization into holding him back, and holding everyone else back too? That could be a win actually!

(If this is his plan, however, then we SHOULD act in public like we take his calamitous stupidity seriously, we should pile on him, we should make what he plans a crime to even plan (like how hiring a hit man who doesn’t actually kill anyone is still a crime), and then put him in jail if that’s what it takes… and the worst possible thing would be to leave poor Sam Altman hanging, and unchecked, and having to follow through with his stupid and dangerous plans that he might only be pretending to have in the first place.))

3.6 Fire Drills Are Part Of A Normal And Responsible World

Even if everyone lines up outside a building and the building does not burn down with everyone standing their outside and watching it… then the people who ran the fire drill can swing that as a victory for prudent and safe fire drills!

And if everyone dies in a fire that is intelligent, and hunts down anyone who could ever stop it or limit it or put bounds on its goal attainment, then the people who failed to run the fire drill will also eventually die, and so will their kids.

There’s a technical term for it when your kids die. Its called “sad”. To avoid “sadness” we should put out any such fires early, while they are very small, and prevent anyone from starting fires like that.

Most people don’t look at most of the details, but when smart people who have the intact emotional ability to apply moral common sense stop and look at plans, they mostly get the right result.

Common sense often wins!

3.7 The US Voters Will Almost Certainly Not Veto Safety Plans

Nate Silver has a nice little thread about AI polling among US voters.

US voters overwhelming agree that AI could cause human extinction and it should be avoided if that was going to happen because: duh! Of course they do!

In the year 2027 every voter on Earth will have been watching (or able to watch) movies about the social and political implications of AI for a century!

There are very clear themes here. It isn’t complicated at all to sell the safety story.

The same way that nuclear weapons were constructed out in the boonies, and the same way that scary biology should be… maybe there’s some safe way to do research in a “computer safety facility”?

But right now, somehow, that is NOT WHAT IS HAPPENING.

What’s happening is that a bunch of cowboys are doing something 80% of people who stop and think about it would say is stupid and dangerous.

The common sense bipartisan reasonableness of normal people who want to continue to be normally alive is a wonderful thing, and is a reason that any adequate plan that can be simply explained is likely to “politically work” after all is said and done.

But the plan does have to be adequate to buy the safety benefits in order to actually get safety benefits, and I personally trust Eliezer to not mess up the adequacy checking, because he seems to have the best track record of anyone on the planet for saying the right things early and often and courageously.

And the plan has to happen, via institutionally viable steps, which FLI seems to be good at. Maybe as part of how they do things they run polls? That seems like it would be wise as part of finding something politically viable!

So that’s what I think can and should happen next, and will happen soon in at least some good futures.

An interesting thing about popular ideas is that they often become popular because they actually make sense.

When something actually makes sense, then attempts to verify whether it should be believed nearly always give a positive signal, for each person who hears about the thing the first time, and tries to make up their mind about it.

If the belief check of any specific person is sloppy, it can give false negatives or false positives!

However, if a lot of people wielding Caplan’s “light of Galadriel” and look very carefully under their own power, most of the weirdos who have this power deciding to adopt the idea as their current working hypothesis… it suggests that the idea actually makes sense when looked at from a lot of different directions, and is a sign that many other people, arriving later, and applying lesser tests, will also eventually get pro-idea results.

Once you stop “trying to have influence” you will often discover that by “only saying things that make sense”, you will often say things that end up being popular, and looking like you maybe somehow have some “magic powers of causing people to believe things”, when really… you just saw more of the convincing evidence earlier than other people.

If stopping AI is a good idea, and needs global human coordination, then the “stopping AI plan” probably needs to work this way.

(Logically: If an AI moratorium is necessary for human survival and requires global coordination and global coordination is impossible, then all humans will die. Spiritually: That might be the universe we’re in, in which case… maybe go make peace with God, if you’re not already right with Him? My version of that peace works via dignity. You can do yours however you want, of course.)

4: My Hunch About What Should Happen Is Shared

A final point here, in favor of this proposal being able to work, and which I found to be personally motivating, is that a lot of my reasoning seems to be shared by a lot of reasonably thoughtful and pro-social people, who have the time and mental bandwidth to try to think about this stuff carefully and express their opinions in a manner amenable to aggregation.

About 80% of my mental motions are self-checks.

Sometimes I do lots of checks and never find any good reasons “NOT to do a thing” and then I do it and it sometimes is good.

This is basically the only formula I know for semi-reliably doing things inside the Venn diagram of “actually good” and “somewhat original” ❤️

In this case one of my most recent “self-checks” was to run tiny little election with a very janky user interface (but some high quality axiological processing, with inputs offered by smart people). That effort gave these results:

The way to read this is that each arrow points from GOOD, towards WORSE things, in the aggregate opinion of the people who filled out ballots.

((Methods Section: There was an experimental preference solicitation process and I used Debian Project’s old voting software for tabulation, and GraphViz to render the dot file.))

Sometimes preference aggregation gives confusing results. Arrow’s Theorem is a pretty well known issue that points to this.

However, often there is actual coherence in the world, and this leads to coherence in human preferences, and these can (if you’re lucky) lead to collections of ballots that tell a very reasonable story. I think this actually happened with my poll!

This is roughly what the coherent aggregated preference ordering looks like:

4.0: Status Quo Is Worst

As I read the results, the totality of those who voted are pretty sure that the status quo trajectory with AGI development is stupid and has a net negative expected value compared to existing alternatives.

We should change plans to something close to what has been proposed and do Something Else Which Is Not The Default.

4.1: The FLI Letter Is Better Than Nothing

A better plan would be to just do the things that the FLI letter proposes. This makes quite a bit of common sense if the Status Quo is aiming for self-ware self-empowering military botnets wired into everything (albeit arriving later than pop culture’s predictions and probably weirder).

If GPT-5 is delayed by 6 months, and then maybe that delay cascades deeper into the future to delay the possible “final deadly invention”, then maybe we get to enjoy life for those extra 6 months.

Having “maybe 6 extra months alive” is better than “no extra months alive at all”… Right?

4.2: Less Than FLI Is Even Better

However… an even better thing might be something like the “General Agreement On Pretending To Solve The Problem”? If the only thing the fancy high-prestige people can agree to do is something even less than what the FLI letter proposes, then at least people who care about being alive can give up, and go home to have dinner with their families, and maybe it is better to die with one’s eyes open as the end becomes unavoidable, than to die while pretending that there is hope and suffering more than otherwise?

(This is personally a place that I think I might currently disagree with the preference ordering that came out of the total results of the poll, but I’m trying to imagine a way that it could make sense as “better than FLI” and yet also “worse than the next set of proposals”. I think this is where the “Anarchist Perspective” is showing up, where so many people think the government is so stupid-or-evil by default that think that FLI’s plan should or could be “watered down” to route around regulatory processes somehow, so as to avoid their intrinsic worsening of whatever they touch?)

4.3: Eliezer’s Proposal Is Second Best

An even better plan than “less than the FLI plan” is to just shut up and do Eliezer’s thing. Shut everything down. Shut it down everywhere. Make this the #1 priority of all US diplomacy.

Reboot all US diplomacy around doing what would ensure that the X% chance (maybe 15% or 23% or 99% depending on who you ask) of “literally everyone dying from this” goes down to 0.1%, on purpose, in a way that everyone including Eliezer can tell will work.

Maybe (Eliezer didn’t say this, and it is me trying to imagine how it could possible work in practice)… maybe fire everyone at the US State department who isn’t on board with this reboot, or willing to stay only to play’s devil’s advocate and suggest Pareto-improving harms reductions?

Maybe the implementation details would be gnarly?

Maybe the people who did it would eventually be yelled at by voters after the fact, kinda like how Jacinda Ardern delivered amazing results and was later thrown out of office instead of getting parades and awards?

((But just because real public service is not generally recognized and rewarded is no reason to just let everyone, including you, and me, and all of us, and all our pets and family, and all the dolphins in the ocean, die from being out-competed by a new mechanical form of life. Something doesn’t have to be reliably socially rewarded to be the right strategy… it just sometimes makes the best strategy actually happen to cause the best strategy to be reliably rewarded in other ways than merely “getting to experience the same benefits as everyone else who didn’t even help”.))

It could be that Eliezer’s proposal is clear enough to the kinds of people who would have to act on it that they can correctly and creatively fill in the details. Yeah! That could be!

And Biden just going and trying that would probably be better than the previously listed options!

However, this “penultimate winning option” only had 14% of the ballots with “just Eliezer’s proposal, right now, no talking, just doing” as their number 1 preference...

4.4: A Combined Plan Would Be Best

More of the ballots suggested that the best thing to do would involve getting the combined agreement of Eliezer and FLI on a single plan.

This should presumably pass all of their respective fit-checks in their areas of expertise, even if this means changing some of the details of the two different initial proposals, to end up with something that makes sense to “both sides”.

I can see the logic of this because it might be hard to “just do” Eliezer’s proposal, because like… who would be in charge of it? What steps would they take? How much budget would they have?

What cover story would they use at press conferences when people who want corrupt kickbacks (as part of some democratic side deal in exchange for them not trying to veto or obstruct a plan to “save literally everyone from a substantial chance of being killed”)?

People obstruct all kinds of reasonable stuff that they end up voting in favor of once they’ve extracted the maximum they can for the minimum amount of help. That’s kinda how politics works by default!

There are lots of half-stupid and half-venal people in politics, and to make really good things happen, in that domain, one probably has to have an actually adequate plan for causing goodness, and also there probably has to make a lot of side deals.

The fundamental desiderata of “ai-not-killing-everyone-ism” as a belief system is that plans are good if they they lead to AI not killing everyone in a reliable way.

It is not a central tenet that these plans be surrounded by any particular decorative public relations frills, or bi-partisan (or mono-partisan) happy words, or whatever

...but it is also not a central tenet to be against any particular decorative words or symbols or extra steps that make people more comfortable, if that more reliably and efficiently leads to AI not killing everyone! ❤️

Can The Powers Combine?

So probably there exists “out there in the space of possible policy proposals that haven’t been written down yet” a lot of variations on:

1) Ways to reliably and certainly prevent AI from literally killing everyone...

2) That also could, after being put on the internet, have “over 50,000 individuals… sign.. the letter (even if signature verification efforts are lagging), including over 1800 CEOs and over 1500 professors”.

These two properties, together, would be better than just the first property, or just the second.

There’s probably a pretty big overlap in the fanbases for these two combined ideas!

It could hypothetically be that many of the 50,000 individuals who like the FLI proposal are actually suicidal and want to die, and would have been actively opposed to a more robust proposal (if a more robust proposal was necessary to avoid death (which they might not be sure about)).

And from the perspective of trying to not die it seems like having more people sign the statement in favor of a plan for reliably not dying would make that more likely...

...so if FLI can suggest ways to tweak the plan and thereby secure lots of signatures, that seems good from the perspective of the “wanting to not die” faction?

Also, as the ideas trickle out, and get closer to implementation, if a serious plan can actually work, then the global totality of voters are likely to approve of this sort of thing, because the global totality of voters have been watching movies about killer robots for basically their entire life, and “not getting killed by killer robots” is just a common sense policy that everyone (except maybe people with suicidal tendencies) can get behind!

...

The first step, I think, would be for Stuart Russell and/​or Yoshua Bengio and/​or Eliezer Yudkowsky to send an email or text message to the other guys, and start talking about how to save the world?

Probably they’d need to schedule a meeting, and want a whiteboard around, and so on. Them doing this seems like it is on the critical path, and so it could be that literally each hour counts? (I keep massaging this post, wondering if it is even necessary, wondering if an extra hour or day editing it will improve it.)

For all I know they are already on it, but if not, maybe they will see this essay and it will help jump-start things?