Is AI going to kill us all? FAQ reply to Yudkowsky, etc

If you haven’t read the letters from The Future of Life Institute and Eliezer Yudkowsky, do that first:

https://futureoflife.org/open-letter/pause-giant-ai-experiments/

https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/

Is AI going to kill us all?

Quite possibly. We shouldn’t panic, but even a 1% risk of ending the world deserves a massive intervention. The risk is much higher than 1%, but experts disagree, and there’s no way to know who is right. More on this later.

Should we, therefore, pause large-model AI research?

Yes, if it were possible.

Is it possible?

Absolutely not. The cat is out of the bag, anyone who understands game theory should grasp that.

Should we try to pause research anyway?

A poorly executed pause may do more harm than good. If the “good guys” stop AI research, we risk letting “bad guys” have superpowers, or just very dangerous toys.

So are we doomed?

Probably not. There are huge unknowns, and the apocalypse is far from certain.

Why would a research pause not work?

1: Large models are overhyped, yet they have proven extremely valuable, and their value is likely to increase in improved versions. Imagine a machine that prints gold. If most people refrain from improving their machines, those who improve their printers in secret obtain even more value. Any person or nation which refuses to go along with the ban obtains tremendous value. Even a global police state cannot stop people from running code on every single computer. It’s preferable for leading public entities to dominate research over million of unknowns

2: While GPT-4 requires massive resources, existing open-source models can be run on a consumer GPU. Improvements in AI models and hardware will continue to narrow this gap. A ban on large model AI research is a massive stimulus to optimizing models to run with fewer resources.

Are there any reasons to be optimistic?

Yes. I believe our options are either quick, global death or a (literally) unimaginable explosion in human flourishing. There is no in-between. Skynet won’t be hunting us down, so at least the end, if it comes, will be quick.

While no one is suggesting that AI is likely to be malevolent, there are reasons to think that AI trained on comprehensive large language models may be benevolent. While humans have no idea how to teach AI ethics or values, the models work by rewarding AI for being “helpful.” The nature of these models means that the AI is very good at understanding the broad context of requests, so it is unlikely to misunderstand “cure cancer” as “cure cancer by killing the hosts.” Maybe. Hopefully.

More generally, a model trained to act according to humanistic values is likely to internalize these values, as opposed to acting humanely while having ulterior motives. (I am speaking about cognitive architecture here, not suggesting that the AI will be sentient or a moral actor.) It is important that when superintelligent AIs are created (probably unexpectedly), they are trained on comprehensive, humanistic data sources.

So what should “we” do?

We cannot pause AI research, and we are powerless if AI decides to kill us. The only power that “we”—that is individuals concerned about the existential risk of AI have is to influence how mainstream AI research is conducted.

Here is what I suggest:

1: My primary suggestion is to invest as much in AI safety research as possible. By my estimate, AI safety research investment is currently about ¹⁄₁₀₀,000th of global GDP. While the answer to any realistic number is “not enough”, the more attention the issue receives, the more support there will be for additional action. Since AI control won’t be solved anytime soon, the result of any safety research will likely create more support for intervention.

2: AI ethics guidelines should focus on the alignment problem and existential risks and all possible pressure should be put on research organizations to follow such guidelines. As opposed to a research pause, I believe this is possible to achieve.

What should AI research guidelines be?

Here is a list of possible principles:

Most Dangerous:

1: Online learning: AI models that can self-modify are far more dangerous than current models, which can only remember limited context from the current sessions.

2: Large models writing their own code: while we are nowhere close to this, it’s an obvious red line.

3: No memory limits: AIs working memory should be limited to prevent too much deviation from training

4: Open-sourcing leading-edge model & their weights: should be obvious why this is a bad idea

Also Potentially Dangerous

5: Feedback loops: like online learning, enabling unlimited loops of response/outcome optimization lead to unknown outcomes

6: Autonomous actions: LLMs will inevitably be allowed to act autonomously to some degree (for example, they may be used as a personal shopping assistant, toy pet, or security firewall.) However, the size of autonomous models should be much smaller than the leading-edge models. The largest models should not be able to “write” to arbitrary Internet endpoints.ous code

AI Research Operating Guidelines:

7: Espionage and adversarial attack: governments should support AI research labs from nation-state espionage to limit the proliferation of danger

8: Models should never pretend to be intelligent or sentient, and should be thoroughly tested for signs of general intelligence, to identify when an AI might exhibit such.

9: Models should always be trained using a humanistic corpus. It is futile to try to stop militaries from using AI, but if they are trained on large models, they should use comprehensive sources. The risk is not rogue murder bots, but superintelligent AI with government-scale funding unfamiliar with humanistic values.

10: Red teaming: new AI models should continually be tested against safeguards.

11: Incremental deployment: new AI models should be deployed to offline, air-gapped users first and stress tested for dangerous activity.