Slowing down AI progress is an underexplored alignment strategy

The emotional burden of watching the world end

My current beliefs about AI timelines have made my life significantly worse. I find thoughts about the ever-shrinking timeline to AGI invading nearly every aspect of my life. Every choice now seems to be affected by the trajectory of this horrible technology.

Based on the responses in this post and others, many if not most people on this forum have been affected in a similar way. AI is the black hole swallowing all other considerations about the future.

Frankly, I feel a good deal of anger about the current state of affairs. All these otherwise thoughtful and nice people are working to build potentially world-ending technology as fast as possible. I’m angry so few people are paying any attention to the danger. I’m angry that invasive thoughts about AGI destroying the world are making it harder for me to focus on work that might have a non-zero chance of saving it. And I’m frustrated we’ve been so closed-minded in our approach to solving this problem.

The current situation is almost exactly analogous to the creation of the atomic bomb during World War 2. There’s a bunch of very smart people gathered together, supplied with a fuckton of money by powerful groups, working towards the same destructive goal. There is social proof all around them that they are doing the right thing. There are superficially plausible reasons to think that everything will turn out fine. There is token engagement with the concerns raised by people concerned about the implications of the technology. But at the end of the day, the combination of personal incentives, social proof and outside impetus make everyone turn their head and ignore the danger. On the rare occasion that people are convinced to leave, it’s almost always the most conscientious, most cautious people, ensuring the remaining team is even less careful.

There is so much magical thinking going on among otherwise intelligent people. Everyone seems to be operating on the assumption that no technology can destroy us, that everything will magically turn out fine, despite the numerous historical examples of new inventions destroying or nearly destroying the world (see the Cuban Missile Crisis or the great oxidation event when oxygen-producing bacteria extincted themselves and most other life on earth and caused a 300 million year ice age).

Frankly I’ve found the response from the EA/​rationalist community has been pretty lackluster so far. Every serious solution that has been proposed revolves around solving alignment before we make AGI, yet I know ZERO people who are working on slowing down capabilities progress. Hell, until just a month ago, EA orgs like 80,000 hours were RECOMMENDING people join AI research labs and work on CREATING superintelligence.

The justifications I’ve read for this behavior always seem to be along the lines of “we don’t want to alienate the people working at top AI orgs because we feel that will be counter-productive to our goals of convincing them that AI alignment is important.” Where has this strategy gotten us? Does the current strategy of getting a couple of members of the EA/​Rationalist community onto the safety teams at major AI orgs actually have a chance at working? And is it worth foregoing all efforts to slow down progress towards AGI?

The goals of DeepMind, OpenAI, and all the other top research labs are fundamentally opposed to the goal of alignment. The founding goal of Deepmind is to “solve intelligence, then use that to solve everything else.” That mission statement has been operationalized as paying a bunch of extremely smart people ridiculous salaries to create and distribute the blueprints for (potentially) world-ending AGI as fast as humanly possible. That goal is fundamentally opposed to the goal of alignment because it burns the one common resources that all alignment efforts need to make a solution work: *time*.

We need to buy more time

In the latest Metaculus forecasts, we have 13 years left until some lab somewhere creates AGI, and perhaps far less than that until the blueprints to create it are published and nothing short of a full-scale nuclear war will stop someone somewhere from doing so. The community strategy (insofar as there even is one) is to bet everything on getting a couple of technical alignment folks onto the team at top research labs in the hopes that they will miraculously solve alignment before the mad scientists in the office next door turn on the doomsday machine.

While I admit there is at least a chance this might work, and it IS worth doing technical alignment research, the indications we have so far from the most respected people in the field are that this is an extremely hard problem and there is at least a non-zero chance it is fundamentally unsolvable.

There are a dozen other strategies we could potentially deploy to achieve alignment, but they all depend on someone not turning on the doomsday machine. But thus far we have almost completely ignored the class of strategies that might buy more time. The cutting edge of thought on this front seems to come from [one grumpy former EA founder on Twitter](https://​​​​KerryLVaughan) who isn’t even trying that hard.

Slow down AI with stupid regulations

We have a dozen examples of burdensome regulation stifling innovation and significantly slowing or even reversing progress in fields. In the US, we’ve managed to drive birth rates to below replacement levels and homelessness to record highs with literally nothing more than a few thousand grumpy NIMBYs showing up at city council meetings and lobbying for restrictive zoning and stupid building codes. We’ve managed to significantly erode the capabilities of the US military and stunt progress in the field just by guaranteeing contractors a fixed percentage profit margin on top of their costs. These same contracts led to the great stagnation in the aerospace industry that ensured we haven’t returned to the moon for over 50 years and lost the ability to reach low earth orbit for a decade.

Hell, Germany just shut down their remaining nuclear power plants during the middle of an energy crisis because of a bunch of misguided idiots from the green party think nuclear power is unsafe. They managed to convince government officials to shut down operational nuclear plants and replace the with coal-fired power plants using coal [SUPPLIED BY RUSSIA.](https://​​​​2022/​​04/​​05/​​business/​​germany-russia-oil-gas-coal.html)

Modern societies THRIVE at producing burdensome regulation, even in cases where it’s counter to nearly everyone’s short-term interest. Yet this approach to buying more time has been basically ignored. Why? How many years of time are we losing by foregoing this approach?

I think a basic plan for slowing down AI alignment work would look something like this:

  • Lobby government officials to create a new committee on “AI bias and public welfare”. (creating committees that increase bureaucracy in response to public concern is a favorite pastime of congress). Task this committee with approving the deployment of new machine learning models. Require that all new model deployments (defined in a way so as to include basically all state-of-the-art models) be approved by the committee in the same way that electronic medical records systems have to be approved as “HIPAA compliant”

  • Conduct a public relations campaign to spread awareness of the “bias and danger” created by AI systems. Bring up job loss created by AI. Do a 60 minutes interview with the family of the guy who lost his job to AI and turned to fentanyl and became the latest statistic in the rising cases of deaths of despair. Talk about how racist and biased AI systems are, and how companies can’t be trusted to use them in the public interest. Use easy concrete examples of harm that has already been done, and spread fear about the increasing incidence of this type of harm. Find people who have been personally hurt by AI systems and have them testify in front of lawmakers.

  • Institute an onerous IRB approval process for academic publications on AI where researchers have to demonstrate that their system won’t cause harm. Add new requirements every time something goes wrong in a way that embarrassed the university. Publicly shame universities and funding orgs that don’t follow this process, and accuse them of funding research that allows racism/​sexism/​inequity to persist.

  • Hire think tanks to write white papers about the harm caused by poorly designed, quickly deployed AI systems. Share these with congressional staffers. Emphasize the harm done to groups/​things you know that representative already cares about.

  • Take advantage of the inevitable fuck-ups and disasters caused by narrow AI to press leaders for tighter regulations and more bureaucracy

  • Recruit more EAs from China to join this project there (particularly those with high-level connections in the CCP)

If we can get this kind of legislation passed in the US, which is probably the world leader in terms of AI, I think it will be significantly easier for other countries to follow suit. One of my biggest takeaways from the COVID pandemic is world leaders have a strong herd mentality. They tend to mimic the behavior and regulations of other countries, even in cases where doing so will in expectation lead to tens of thousands of their citizens dying.

I think we have a far easier case to make for regulating AI than, say, preventing challenge trials from taking place or forcing COVID vaccines to go through a year-long approval process while hundreds of thousands of people died.

I’d appreciate other people’s thoughts on this plan, particularly people who work in government or politics.