The bigger problem here is that as noted in the post, (0) it is always faster to do things in a less secure manner. If you assume (1) multiple competitors trying to build AI (and if this is not your assumption, I would like to hear a basis for it), (2) at least some who believe that the first AI created will be in a position of unassailable dominance (this appears to be the belief of at least some and include, but not necessarily be limited to, those who believe in a high likelihood of a hard takeoff), (3) some overlap between the groups described in 1 and 2 (again, if you don’t think this is going to be the case, I would like to hear a basis for it) and (4) varying levels of conern about the potential damage caused by an unfriendly AI (even if you believe that as we get closer to developing AI, the average and minimum level of concern will rise, variance is likely), the first AI to be produced is likely to be highly insecure (i.e. with non-robust friendliness).
Jeff Rose
The more interesting question is where else do we see something similar occurring?
For example, historically income in retirement was usually discussed in terms of expected value. More recently, we’ve begun to see discussions about retirement focusing on the probability of running out of money. Are there other areas where people focus on expected outcomes as opposed to the probability of X occurring?
Interesting. We are in somewhat the same boat. Fully vaccinated adults with a two year old. I think where we come out is as follows.
(1) The risk to kids of COVID over the short term are clearly lower than for adults. Over the long term, it is presently unknown.
(2) It is highly likely (>90%) that we will be able to vaccinate young children by next year, so any risk reducing measures we take will be temporary. (Also, see (5).)
(3) The risk from outdoor activities and from vaccinated people are very low. Therefore, we are fine with outdoor activities masked or not and with socializing with fully vaccinated people.
(4) There are limited gains from indoor activities with unvaccinated people, so we will not bring our daughter indoors with unmasked unvaccinated people or unnecessarily indoors with people whose vaccine status is unknown.
(5) COVID prevalence here is dropping, whether for reasons of increased vaccination or otherwise. If, due to increased vaccination, those rates stay down, we can relax these restrictions.
“That’s one lesson you could take away. Another might be: governments will be very willing to restrict the use of novel technologies, even at colossal expense, in the face of even a small risk of large harms.”
Governments cooperate a lot more than Eliezer seemed to be suggesting they do. One example is banning CFCs in response to the ozone hole. There was also significant co-operation between central banks in mitigating certain consequences of the 2008 financial system crash.
However, I would tend to agree that there is virtually zero chance of governments banning dangerous AGI research because:
(i) The technology is incredibly militarily significant; and
(ii) Cheating is very easy.
(Parenthetically, this also has a number of other implications which make limiting AGI to friendly or aligned AGI highly unlikely, even if it is possible to do that in a timely fashion.)
In addition, as computing power increases, the ease of conducting such research increases so the number of people and situations that the ban has to cover increases over time. This means that an effective ban would require a high degree of surveillance and control which is incompatible with how at least some societies are organized. And beyond the capacity of other societies.
(The above assumes that governments are focused and competent enough to understand the risks of AGI and react to them in a timely manner. I do not think this is likely.)
GPT-4 is expected to have about 10^14 parameters and be ready in a few years. And, we already know that GPT-3 can write code. The following all seem (to me at least) like very reasonable conjectures:
(i) Writing code is one of the tasks at which GPT-4 will have (at least) human level capability.
(ii) Clones of GPT-4 will be produced fairly rapidly after GPT-4, say 1-3 years.
(iii) GPT-4 and its clones will have a significant impact on society. This will show up in the real economy.
(iv) GPT-4 will be enough to shock governments into paying attention. (But as we have seen with climate change governments can pay attention to an issue for a long time without effectively doing anything about it.)
(v) Someone is going to ask for GPT-4 (clone) to produce code that generates AGI. (Implicitly, if not explicitly.)
I have absolutely no idea whether GPT-4 will succeed at this endeavor. But if not, GPT-5 should be available a few years later....
(And, of course, this is just one pathway.)
You appear to be correct. I will withdraw my comment.
In any reasonable scenario, those communicating with the AI in a box will not be the people empowered to let it out. Ideally, those with the capability to let the AI out would be entirely isolated from those communicating with the AI and would not be able to access the conversations with the AI.
I would also note that restricting the number of bits (a) just makes things go more slowly and (b) doesn’t work very well in the competitive real world where the other guys are less restrictive.
Ultimately, the dangers of the AI in a box aren’t that it can manipulate any human to let it out but that:
(i) it’s really unclear how good our boxing skills are; and
(ii) human beings have different risk reward functions and it is entirely possible that humans will convince themselves to let the AI out of the box even without any manipulation either as a result of perceived additional benefit, competitive pressure or sympathy for the AI.
You kind of assumed away (i), but part of (i) is setting things up as outlined in my first paragraph which points to the fact that even if our boxing skills were good enough, over time we will come to rely on less sophisticated and capable organizations to do the boxing which doesn’t seem like it will end well.
Interesting. I thought the main idea was contained in Question 5.
If we do as well with preventing AGI as we have with nuclear non-proliferation, we fail. And, nuclear non-proliferation has been more effective than some other regimes (chemical weapons, drugs, trade in endangered animals, carbon emissions, etc.). In addition, because of the need for relatively scarce elements, control over nuclear weapons is easier than control over AI.
And, as others have noted the incentives for develpong AI are far stronger than for developing nuclear weapons.
“So we know a strategy that will work. We have actual evidence this is true. Human’s exist and are (generally) aligned with human values. ”
The above is false. Humans aren’t really aligned with human values. Most humans are heavily constrained in their actions. When we see very unconstrained humans (Vladimir Putin, Adolf Hitler, Joseph Stalin, Xi Jinping, Mao Zedong, Deng Xiaoping) a large proportion are not aligned with human values.
(I’ve stayed with the moderns, but a review of ancient rulers will yield similar results.)
There are currently nine countries who have deployed nuclear weapons. At least four of those nine are countries that the non-proliferation regime would have preferred to prevent having nuclear weapons.
An equivalent result in AGI would have four entities deploying AGI. (And in the AGI context, the problem is deployment not using the AGI in any particular way.)
I am skeptical that boxing is a workable strategy long-term, but a competent organization committed to boxing as a strategy will not allow those with the power to unbox the AI to communicate with the AI. Thus, issues of this nature should not arise.
I have a reasonably low value for p(Doom). I also think these approaches (to the extent they are courses of action) are not really viable. However, as long as they don’t increase the probability of p(Doom) its fine to pursue them. Two important considerations here: an unviable approach may still slightly reduce p(Doom) or delay Doom and the resources used for unviable approaches don’t necessarily detract from the resources used for viable approaches.
For example, “we’ll pressure corporations to take these problems seriously”, while unviable as a solution will tend to marginally reduce the amount of money flowing to AI research, marginally increase the degree to which AI researchers have to consider AI risk and marginally enhance resources focused on AI risk. Resources used in pressuring corporations are unlikely to have any effect which increases AI risk. So, while this is unviable, in the absence of a viable strategy suggesting the pursuit of this seems slightly positive.
Even holding difficulty constant, a gradual takeoff will not result in a shorter timeline if investment is saturated in the sudden timeline (i.e. no additional resources could be effectively deployed).
I think GoF research can also be quite threatening to states. COVID-19 has stressed the politics and economies of both the US and China (among others). Imagine the effects of a disease significantly deadlier.
There is no particular reason for the first AGI to believe that the more intelligent AGI (B) will judge the first AGI more favorably because of the first AGI’s treatment of less intelligent life forms. (And, even if it would, by that logic, since humans haven’t been very nice to the the other, less intelligent life forms of Earth....)
Certainly, B might read this as a signal that the first AGI is less of a threat. Alternatively, B might read this as the first AGI being easy to destroy. B may have a moral code that looks at this as positive or negative. This just doesn’t seem to say anything conclusive or persuasive.
Look how much we suffered from a stupid, replicating code with goals adverse to ours; now, imagine how bad it would be if we have an intelligent replicating enemy...
I think that Eliezer thinks p(doom)> 99% and many others here are following in his wake. He is making a lot of speculative inferences. But even allowing for that, and rejecting instrumental convergence, p(doom) is uncomfortably high (though probably not greater than 99%).
You think that it is wrong to say: (i) in 10-20 years there will be (ii) a single AI (iii) that will kill all humans quickly (iv) before we can respond.
Eliezer is not saying point ii. He certainly seems to think there could be multiple AIs. (It doesn’t make a difference, as far as I can tell, whether human extinction occurs at the hands of one or many AIs. You can argue that the existence of multiple AIs will retard human extinction, but you could equally argue that it would speed human extinction. Both arguments are speculation without evidence and shouldn’t change estimates of p(doom).)
I don’t think we will have AGI in 10-20 years. But, if I’m putting guesstimated probabilities on this, I can’t say the chance is less than 10% after 10 years. To put it another way, 10 years from now there is a realistic chance that we will have AGI, even if it isn’t likely. And, if you believe that conditional on having AGI, p(doom) is very high, that is the time frame you care about, because that is when you need to have a way for humanity to prevent this catastrophe (if possible).
It’s better if it takes 30 or 50 years. But, he doesn’t see that we are likely to have a realistic implementable plan to prevent human extinction then either (and neither do I, FWIW). And, unless you think that we will be able to deal with AGI then in a way we can’t now, it doesn’t make a difference to humanity whether the timeline is 10-20 years or 60-80.
In other words, you may be right about point i, but it doesn’t matter.
What really matters are points iii and iv. With regard to point iv, an AGI will have an OODA loop that is faster than humans do. It’s almost definitional. It will be an entity rather than an organization, it will be smarter and it will think faster.
That leaves point iii which breaks down into whether the AGI will kill humanity and whether it can. If the AGI can kill most of humanity and intends to kill all of humanity it will be able to do so. By killing most of humanity, the ability of humanity to fight back will be crippled. You think that the AGI can kill large numbers of people; I’m not sure whether you think it can kill most of humanity, but without appeals to technology substantially in advance of our own I can think of multiple ways for an AGI to achieve this. (Pandemics with viruses designed to be 100% lethal and highly transmissible. Nuclear holocaust. Push climate change into overdrive. Robots to hunt and kill surviving humans. )
Will the AGI decide to kill us all? I think the answer here is maybe while Eliezer seems confident it is yes.
Civilization is a highly complex and fragile system, without with most of humanity will die and humanity will be rendered defenseless. If you want to destroy it, you don’t have to predict or plan what will happen, you just have to hit it hard and fast, preferably from a couple of different directions.
There is an implicit norm here against provided detailed plans to destroy civilization so I won’t, but it is not hard to come up with one (or four) and you will likely have thought of some yourself. The key thing is that if you get to hit again (and the AGI will) you only need to achieve a portion of your objective with each try.
“If you want to outperform—if you want to do anything not usually done—then you’ll need to conceptually divide our civilization into areas of lower and greater competency. ”
The idea quoted above seems wrong in practice. You don’t need to conceptually divide our civilization into areas of comptency—you need to see what is actually being done in the area in which you want to outperform: in particular, (i) whether your proposed activity/solution has already been tried or assessed; and (ii) the degree to which existing evidence says it won’t or will work.
Also, if civilizational competence is intended to cover something beyond an efficient market, it would make sense to use a different example.