Six AI Risk/Strategy Ideas

Wei Dai27 Aug 2019 0:40 UTC

LW: 64 AF: 27

AI risk ideas are piling up in my head (and in my notebook) faster than I can write them down as full posts, so I’m going to condense multiple posts into one again. I may expand some or all of these into full posts in the future. References to prior art are also welcome as I haven’t done an extensive search myself yet.

The “search engine” model of AGI development

The current OpenAI/DeepMind model of AGI development (i.e., fund research using only investor / parent company money, without making significant profits) isn’t likely to be sustainable, assuming a soft takeoff, but the “search engine” model very well could be. In the “search engine” model, a company (and eventually the AGI itself) funds AGI research and development by selling AI services, while keeping its technology secret. At some point it achieves DSA either by accumulating a big enough lead in AGI technology and other resources to win an open war against the rest of the world, or by being able to simultaneously subvert a large fraction of all cognition done on Earth (i.e., all the AI services that it is offering), causing that cognition to suddenly optimize for its own interests. (This was inspired by / a reply to Daniel Kokotajlo’s Soft takeoff can still lead to decisive strategic advantage.)

Coordination as an AGI service

As a refinement of the above, to build a more impregnable monopoly via network effects, the AGI company could offer “coordination as a service”, where it promises that any company that hires its AGI as CEO will efficiently coordinate in some fair way with all other companies that also hire its AGI as CEO. See also my AGI will drastically increase economies of scale.

Multiple simultaneous DSAs under CAIS

Suppose CAIS turns out to be a better model than AGI. Many AI services may be natural monopolies and have a large market share for its niche. Suppose many high level AI services all use one particular low level AI service, that lower level AI service (or rather the humans or higher level AI services that have write access to it) could achieve a decisive strategic advantage by subverting the service in a way that causes a large fraction of all cognition on Earth (i.e., all the higher level services that depend on it) to start optimizing for its own interests. Multiple different lower level services could simultaneously have this option. (This was inspired by a comment from ryan_b.)

Logical vs physical risk aversion

Some types of risks may be more concerning than others because they are “logical risks” or highly correlated between Everett branches. Suppose Omega appears and says he is appearing in all Everett branches where some version of you exists and offering you the same choice: If you choose option A he will destroy the universe if the trillionth digit of pi equals the trillionth digit of e, and if you choose option B he will destroy the universe if a quantum RNG returns 0 when generating a random digit. It seems to me that option B is better because it ensures that there’s no risk of all Everett branches being wiped out. See The Moral Status of Independent Identical Copies for my intuitions behind this. (How much more risk should we accept under option B until we’re indifferent between the two options?)

More realistic examples of logical risks:

AI safety requires solving metaphilosophy.
AI safety requires very difficult global coordination.
Dangerous synthetic biology is easy.

Examples of physical risk:

global nuclear war
natural pandemic
asteroid strike
AI safety doesn’t require very difficult global coordination but we fail to achieve sufficient coordination anyway for idiosyncratic reasons.

Combining oracles with human imitations

It seems very plausible that oracles/predictors and human imitations (which can be thought of as a specific kind of predictor) are safer (or more easily made safe) than utility maximizers or other kinds of artificial agents. Each of them has disadvantages though: oracles need a human in the loop to perform actions, which is slow and costly, leading to a competitive disadvantage versus AGI agents, and human imitations can be faster and cheaper than humans but not smarter, also leading to a competitive disadvantage versus AGI agents. Combining the two ideas can result in a more competitive (and still relatively easy to make safe) agent. (See this comment for an example.) This is not a particularly novel idea, since arguably quantilizers and IDA already combine oracles/predictors and human imitations to achieve superintelligent agency, but it still seems worth writing down explicitly.

“Generate evidence of difficulty” as a research purpose

How to handle the problem of AI risk is one of, if not the most important and consequential strategic decisions facing humanity. If we err in the direction of too much caution, in the short run resources are diverted into AI safety projects that could instead go to other x-risk efforts, and in the long run, billions of people could unnecessarily die while we hold off on building “dangerous” AGI and wait for “safe” algorithms to come along. If we err in the opposite direction, well presumably everyone here already knows the downside there.

A crucial input into this decision is the difficulty of AI safety, and the obvious place for decision makers to obtain evidence about the difficulty of AI safety is from technical AI safety researchers (and AI researchers in general), but it seems that not many people have given much thought on how to optimize for the production and communication of such evidence (leading to communication gaps like this one). (As another example, many people do not seem to consider that doing research on a seemingly intractably difficult problem can be valuable because it can at least generate evidence of difficulty of that particular line of research.)

The evidence can be in the form of:

Official or semi-official consensus of the field
Technical arguments about the difficulty of AI safety
“AI Safety Experts” who can state or explain the difficulty of AI safety to a wider audience
Amount of visible progress in AI safety per unit of resources expended
How optimistic or pessimistic safety researchers seem when they talk to each other or to outside audiences

Bias about the difficulty of AI safety is costly/dangerous, so we should think about how to minimize this bias while producing evidence of difficulty. Some possible sources of bias:

Personal bias (due to genetics, background, etc.)
Selection effects (people who think AI safety is intractable because it’s too easy or too hard tend to go into other fields)
Incentives (e.g., your job or social status depends on AI safety not being too easy or too hard)

What links here?