German writer of science-fiction novels and children’s books (pen name Karl Olsberg). I blog and create videos about AI risks in German at www.ki-risiken.de and youtube.com/karlolsbergautor.
Karl von Wendt
Yes, thanks for the clarification! I was indeed oversimplifying a bit.
Coordination by common knowledge to prevent uncontrollable AI
This is an interesting thought. I think even without AGI, we’ll have total transparency of human minds soon—already AI can read thoughts in a limited way. Still, as you write, there’s an instinctive aversion against this scenario, which sounds very much like an Orwellian dystopia. But if some people have machines that can read minds, which I don’t think we can prevent, it may indeed be better if everyone could do it—deception by autocrats and bad actors would be much harder that way. On the other hand, it is hard to imagine that the people in power would agree to that: I’m pretty sure that Xi or Putin would love to read the minds of their people, but won’t allow them to read theirs. Also it would probably be possible to fake thoughts and memories, so the people in power could still deceive others. I think it’s likely that we wouldn’t overcome this imbalance anytime soon. This only shows that the future with “narrow” AI won’t be easy to navigate either.
I’m obviously all for “slowing down capabilites”. I’m not for “stopping capabilities altogether”, but for selecting which capabilites we want to develop, and which to avoid (e.g. strategic awareness). I’m totally for “solving alignment before AGI” if that’s possible.
I’m very pessimistic about technical alignment in the near term, but not “optimistic” about governance. “Death with dignity” is not really a strategy, though. If anything, my favorite strategy in the table is “improve competence, institutions, norms, trust, and tools, to set the stage for right decisions”: If we can create a common understanding that developing a misaligned AGI would be really stupid, maybe the people who have access to the necessary technology won’t do it, at least for a while.
The point of my post here is not to solve the whole problem. I just want to point out that the common “either AGI or bad future” is wrong.
Well, yes, of course! Why didn’t I think of it myself? /s
Honestly, “aligned benevolent AI” is not a “better alternative” for the problem I’m writing about in this post, which is we’ll be able to develop an AGI before we have solved alignment. I’m totally fine with someone building an aligned AGI (assuming that it is really aligend, not just seemingly aligned). The problem is, this is very hard to do, and timelines are likely very short.
You may be right about that. Still, I don’t see any better alternative. We’re apes with too much power already, and we’re getting more powerful by the minute. Even without AGI, there are plenty of ways to end humanity (e.g. bioweapons, nanobots, nuclear war, bio lab accidents …) Either we learn to overcome our ape-brain impulses and restrict ourselves, or we’ll kill ourselves. As long as we haven’t killed ourselves, I’ll push towards the first option.
We’re not as far apart as you probably think. I’d agree with most of your decisions. I’d even vote for you to become king! :) Like I wrote, I think we must also be cautious with narrow AI as well, and I agree with your points about opaqueness and the potential of narrow AI turning into AGI. Again, the purpose of my post was not to argue how we could make AI safe, but to point out that we could have a great future without AGI. And I still see a lot of beneficial potential in narrow AI, IF we’re cautious enough.
I agree with that.
1000 years is still just a delay.
Fine. I’ll take it.
But I didn’t see you as presenting preventing fully general, self-improving AGI as a delaying tactic. I saw you as presenting it as a solution.
Actually, my point in this post is that we don’t NEED AGI for a great future, because often people equate Not AGI = Not amazing future (or even a terrible one) and I think this is wrong. The point of this post is not to argue that preventing AGI is easy.
However, it’s actually very simple: If we build a misaligned AGI, we’re dead. So there are only two options: A) solve alignment, B) not build AGI. If not A), then there’s only B), however “impossible” that may be.
Yet lots of people DID (and do) take hydroxychloroquine and ivermectin for COVID, a nontrivial number of people do in fact eat random mushrooms, and the others aren’t unheard-of.
Yes. My hope is not that 100% of mankind will be smart enough to not build an AGI, but that maybe 90+% will be good enough, because we can prevent the rest from getting there, at least for a while. Currently, you need a lot of compute to train a Sub-AGI LLM. Maybe we can put the lid on who’s getting how much compute, at least for a time. And maybe the top guys at the big labs are among the 90% non-insane people. Doesn’t look very hopeful, I admit.
Anyway, I haven’t seen you offer an alternative. Once again, I’m not saying not developing AGI is an easy task. But saying it’s impossible (while not having solved alignment) is saying “we’ll all die anyway”. If that’s the case, then we can as well try the “impossible” things and at least die with dignity.
One of the reasons I wrote this post is that I don’t believe in regulation to solve this kind of problem (I’m still pro regulation). I believe that we need to get a common understanding of what are stupid things no one in their right mind would ever do (see my reply to jbash). To use your space colonization example: we certainly can’t regulate what people do somewhere in outer space. But if we survive long enough to get there, then we have either solved alignment or we have finally realized that it’s not possible, which will hopefully be common knowledge by then.
Let’s say someone finds a way to create a black hole, but there’s no way to contain it. Maybe it’s even relatively easy for some reason—say it costs 10 million dollars or so. It’s probably not possible to prevent everyone forever from creating one, but the best—IMO the only—option to prevent earth from getting destroyed immediately is to make it absolutely clear to everyone that creating a black hole is suicidal. There is no guarantee that this will hold forever, but given the facts (doable, uncontainable) it’s the only alternative that doesn’t involve killing everyone else or locking them up forever.We may need to restrict access to computing power somehow until we solve alignment, so not every suicidal terrorist can easily create an AGI at some point. I don’t think we’ll have to go back to the 1970′s, though. Like I wrote, I think there’s a lot of potential with the AI we already have, and with narrow, but powerful future AIs.
The point of this comment is less to say “this definitely can’t be done” (although I do think such a future is fairly implausible/unsustainable), and more to say “why did you not address this objection?” You probably ought to have a dedicated section that very clearly addresses this objection in detail.
That’s a valid point, thank you for making it. I have given some explanation of my point of view in my reply to jbash, but I agree that this should have been in the post in the first place.
You cannot permanently stop self-improving AGI from being created or run. Not without literally destroying all humans.
You can’t stop it for a “civilizationally significant” amount of time. Not without destroying civilization.
I’m not sure what this is supposed to mean. Are you saying that I’d have to kill everyone so noone can build AGI? Maybe, but I don’t think so. Or are you saying that not building an AGI will destroy all humans? This I strongly disagree with. I don’t know what a “civiliatzionally significant” amount of time is. For me, the next 10 years are a “significant” amount of time.
What really concerns me is that the same idea has been coming up continuously since (at least) the 1990s, and people still talk about it as if it were possible. It’s dangerous; it distracts people into fantasies, and keeps them from thinking clearly about what can actually be done.
This is a very strong claim. Calling ideas “dangerous” is in itself dangerous IMO, especially if you’re not providing any concrete evidence. If you think talking about building narrow AI instead of AGI is “dangerous” or a “fantasy”, you have to provide evidence that a) this is distracting relevant people from doing things that are more productive (such as solving alignment?) AND b) that solving alignment before we can build AGI is not only possible, but highly likely. The “fantasy” here to me seems to be that b) could be true. I can see no evidence for that at all.
For all the people who continuously claim that it’s impossible to coordinate humankind into not doing obviously stupid things, here are some counter examples: We have the Darwin awards for precisely the reason that almost all people on earth would never do the stupid things that get awarded. A very large majority of humans will not let their children play on the highway, will not eat the first unknown mushrooms they find in the woods, will not use chloroquine against covid, will not climb into the cage in the zoo to pet the tigers, etc. The challenge here is not the coordination, but the common acceptance that certain things are stupid. This is maybe hard in certain cases, but NOT impossible. Sure, this will maybe not hold for the next 1,000 years, but it will buy us time. And there are possible measures to reduce the ability of the most stupid 1% of humanity to build AGI and kill everyone.That said, I agree that my proposal is very difficult to put into practice. The problem is, I don’t have a better idea. Do you?
The first rule is that ASI is inevitable, and within that there are good or bad paths.
I don’t agree with this. ASI is not inevitable, as we can always decide not to develop it. Nobody will even lose any money! As long as we haven’t solved alignment, there is no “good” path involving ASI, and no positive ROI. Thinking that it is better that player X (say, Google) develops ASI first, compared to player Y (say, the Chinese) is a fallacy IMO because if the ASI is not aligned with our values, both have the same consequence.
I’m not saying focusing on narrow AI is easy, and if someone comes up with a workable solution for alignment, I’m all for ASI. But saying “ASI is inevitable” is counterproductive in my opinion, because it basically says “any sane solution is impossible” given the current state of affairs.
We don’t need AGI for an amazing future
Paths to failure
And if that function is simple (such as “exist as long as possible”), it can pretty soon research virtually everything that matters, and then will just go throw motions, devouring the universe to prolong it’s own existence to near-infinity.
I think that even with such a very simple goal, the problem of a possible rival AI somewhere out there in the universe remains. Until the AI can rule that out with 100% certainty, it can still gain extra expected utility out of increasing its intelligence.
Also, the more computronium there is, the bigger is the chancesome part wil glitch out and revolt. So, beyond some point computronium may be dangerous for AI itself.
That’s an interesting point. I’m not sure that it follows “less compute is better”, though. One remedy would be to double-check everything and build redundant capacities, which would result in even more computronium, but less probability of any part of it successfully revolting.
I agree that with temporal discounting, my argument may not be valid in all cases. However, depending on the discount rate, even then increasing computing power/intelligence may raise the expected value enough to justify this increase for a long time. In the case of the squiggle maximizer, turning the whole visible universe into squiggles beats turning earth into squiggles by such a huge factor that even a high discount rate would justify postponing actually making any squiggles to the future, at least for a while. So in cases with high discount rates, it largely depends on how big the AI predicts the intelligence gain will be.
A different question is whether a discount rate in a value function would be such a good idea from a human perspective. Just imagine the consequences of discounting the values of “happiness” or “freedom”. Climate change is in large part a result of (unconsciously/implicitly) discounting the future IMO.
I don’t think that your conclusion is correct. Of course, some tasks are impossible, so even infinite intelligence won’t solve them. But it doesn’t follow that the utility of intelligence is limited in the sense that above a certain level, there is no more improvement possible. There are some tasks that can never be solved completely, but can be solved better with more computing power with no upper limit, e.g. calculating the decimal places of pi or predicting the future.
Good point! Satirical reactions are not appropriate in comments, I apologize. However, I don’t think that arguing why alignment is difficult would fit into this post. I clearly stated this assumption in the introduction as a basis for my argument, assuming that LW readers were familiar with the problem. Here are some resources to explain why I don’t think that we can solve alignment in the next 5-10 years: https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/, https://aisafety.info?state=6172_, https://www.lesswrong.com/s/TLSzP4xP42PPBctgw/p/3gAccKDW6nRKFumpP