A few thoughts on a Friendly AGI (safe vs friendly, other minds problem, ETs and more)

Friendly AI is an idea that I find to be an admirable goal. While I’m not yet sure an intelligence explosion is likely, or whether FAI is possible, I’ve found myself often thinking about it, and I’d like for my first post to share a few those thoughts on FAI with you.

Safe AGI vs Friendly AGI
-Let’s assume an Intelligence Explosion is possible for now, and that an AGI with the ability to improve itself somehow is enough to achieve it.
-Let’s define a safe AGI as an above-human general AI that does not threaten humanity or terran life (eg. FAI, Tool AGI, possibly Oracle AGI)
-Let’s define a Friendly AGI as one that *ensures* the continuation of humanity and terran life.
-Let’s say an unsafe AGI is all other AGIs.
-Safe AGIs must supress unsafe AGIs in order to be considered Friendly. Here’s why:

-If we can build a safe AGI, we probably have the technology to build an unsafe AGI too.
-An unsafe AGI is likely to be built at that point because:
-It’s very difficult to conceive of a way that humans alone will be able to permanently stop all humans from developing an unsafe AGI once the steps are known**
-Some people will find the safe AGI’s goals unnacceptable
-Some people will rationalise or simply mistake that their AGI design is safe when it is not
-Some people will not care if their AGI design is safe, because they do not care about other people, or because they hold some extreme beliefs
-Most imaginable unsafe AGIs would outcompete safe AGIs, because they would not neccessarily be “hamstrung” by complex goals such as protecting us meatbags from destruction. Tool or Oracle AGIs would obviously not stand a chance due to their restrictions.
-Therefore, If a safe AGI does not prevent unsafe AGIs from coming into existence, humanity will very likely be destroyed.

-The AGI most likely to prevent unsafe AGIs from being created is one that actively predicted their development and terminates that development before or on completion.
-So to summarise

-An AGI is very likely only a Friendly AI if it actively supresses unsafe AGI.
-Oracle and Tool AGIs are not Friendly AIs, they are just safe AIs, because they don’t suppress anything.
-Oracle and Tool AGIs are a bad plan for AI if we want to prevent the destruction of humanity, because hostile AGIs will surely follow.

(**On reflection I cannot be certain of this specific point, but I assume it would take a fairly restrictive regime for this to be wrong. Further comments on this very welcome.)

Other minds problem—Why should be philosophically careful when attempting to theorise about FAI

I read quite a few comments in AI discussions that I’d probably characterise as “the best utility function for a FAI is one that values all consciousness”. I’m quite concerned that this persists as a deeply held and largely unchallenged assumption amongst some FAI supporters. I think in general I find consciousness to be an extremely contentious, vague and inconsistently defined concept, but here I want to talk about some specific philosophical failures.

My first concern is that while many AI theorists like to say that consciousness is a physical phenomenon, which seems to imply Monist/​Physicalist views, they at the same time don’t seem to understand that consciousness is a Dualist concept that is coherent only in a Dualist framework. A Dualist believes there is a thing called a “subject” (very crudely this equates with the mind) and then things called objects (the outside “empirical” world interpreted by that mind). Most of this reasoning begins with Descartes’ cogito ergo sum or similar starting points ( https://​​en.wikipedia.org/​​wiki/​​Cartesian_dualism ). Subjective experience, qualia and consciousness make sense if you accept that framework. But if you’re a Monist, this arbitrary distinction between a subject and object is generally something you don’t accept. In the case of a Physicalist, there’s just matter doing stuff. A proper Physicalist doesn’t believe in “consciousness” or “subjective experience”, there’s just brains and the physical human behaviours that occur as a result. Your life exists from a certain point of view, I hear you say? The Physicalist replies, “well a bunch of matter arranged to process information would say and think that, wouldn’t it?”.

I don’t really want to get into whether Dualism or Monism is correct/​true, but I want to point out even if you try to avoid this by deciding Dualism is right and consciousness is a thing, there’s yet another more dangerous problem. The core of the problem is that logically or empirically establishing the existence of minds, other than your own is extremely difficult (impossible according to many). They could just be physical things walking around acting similar to you, but by virtue of something purely mechanical—without actual minds. In philosophy this is called the “other minds problem” ( https://​​en.wikipedia.org/​​wiki/​​Problem_of_other_minds or http://​​plato.stanford.edu/​​entries/​​other-minds/​​). I recommend a proper read of it if the idea seems crazy to you. It’s a problem that’s been around for centuries, and yet to-date we don’t really have any convincing solution (there are some attempts but they are highly contentious and IMHO also highly problematic). I won’t get into it more than that for now, suffice to say that not many people accept that there is a logical/​​empirical solution to this problem.

Now extrapolate that to an AGI, and the design of its “safe” utility functions. If your AGI is designed as a Dualist (which is neccessary if you wish to encorporate “consciousness”, “experience” or the like into your design), then you build-in a huge risk that the AGI will decide that other minds are unprovable or do not exist. In this case your friendly utility function designed to protect “conscious beings” fails and the AGI wipes out humanity because it poses a non-zero threat to the only consciousness it can confirm—its own. For this reason I feel “consciousness”, “awareness”, “experience” should be left out of FAI utility functions and designs, regardless of the truth of Monism/​Dualism, in favour of more straight-forward definitions of organisms, intelligence, observable emotions and intentions. (I personally favour conceptualising any AGI as a sort of extension of biological humanity, but that’s a discussion for another day) My greatest concern is there is such strong cultural attachment to the concept of consciousness that researchers will be unwilling to properly question the concept at all.

What if we’re not alone?

It seems a little unusual to throw alien life into the mix at this point, but I think its justified because an intelligence explosion really puts an interstellar existence well within our civilisation’s grasp. Because it seems that an intelligence explosion implies a very high rate of change, it makes sense to start considering even the long term implication early, particularly if the consequences are very serious, as I believe they may be in this realm of things.

Let’s say we successfully achieved a FAI. In order to fufill its mission of protecting humanity and the biosphere, it begins expanding, colonising and terraforming other planets for potential habitation by Earth originating life. I would expect this expansion wouldn’t really have a limit, because the more numourous the colonies, the less likely it is we could be wiped out by some interstellar disaster.

Of course, we can’t really rule out the possibility that we’re not alone in the universe, or even the galaxy. If we make it as far as AGI, then its possible another alien civilisation might reach a very high level of technological advancement too. Or there might be many. If our FAI is friendly to us but basically treats them as paperclip fodder, then potentially that’s a big problem. Why? Well:

-Firstly, while a species’ first loyalty is to itself, we should consider that it might be morally unsdesirable to wipe out alien civilisations, particularly as they might be in some distant way “related” (see panspermia) to own biosphere.
-Secondly, there is conceivable scenarios where alien civilisations might respond to this by destroying our FAI/​Earth/​the biosphere/​humanity. The reason is fairly obvious when you think about it. An expansionist AGI could be reasonably viewed as an attack or possibly an act of war.

Let’s go into a tiny bit more detai. Given that we’ve not been destroyed by any alien AGI just yet, I can think of a number of possible interstellar scenarios:

(1) There is no other advanced life
(2) There is advanced life, but it is inherently non-expansive (expand inwards, or refuse to develop dangerous AGI)
(3) There is advanced life, but they have not discovered AGI yet. There could potentially be a race-to-the-finish (FAI) scenario on.
(4) There is already expanding AGIs, but due to physical limits on the expansion rate, we are not aware of them yet. (this could use further analysis)
One civilisation, or an allied group of civilisations have develop FAIs and are dominant in the galaxy. They could be either:

(5) Whack-a-mole cilivisations that destroy all potential competitors as soon as they are identified
(6) Dominators that tolerate civilisations so long as they remain primitive and non-threatening by comparison.
(7) Some sort of interstellar community that allows safe civilisations to join (this community still needs to stomp on dangerous potential rival AGIs)

In the case of (6) or (7), developing a FAI that isn’t equipped to deal with alien life will probably result in us being liquidated, or at least partially sanitised in some way. In (1) (2) or (5), it probably doesn’t matter what we do in this regard, though in (2) we should consider being nice. In (3) and probably (4) we’re going to need a FAI capable of expanding very quickly and disarming potential AGIs (or at least ensuring they are FAIs from our perspective).

The upshot of all this is that we probably want to design safety features into our FAI so that it doesn’t destroy alien civilisations/​life unless its a significant threat to us. I think the understandable reaction to this is something along the lines of “create an FAI that values all types of life” or “intelligent life” or something along these lines. I don’t exactly disagree, but I think we must be cautious in how we formulate this too.

Say there are many different civilisations in the galaxy. What sort of criteria would ensure that, given some sort of zero-sum scenario, Earth life wouldn’t be destroyed. Let’s say there was some sort of tiny but non-zero probability that humanity could evade the FAI’s efforts to prevent further AGI development. Or perhaps there was some loophole in the types of AGI’s that humans were allowed to develop. Wouldn’t it be sensible, in this scenario, for a universalist FAI to wipe out humanity to protect the countless other civilisations? Perhaps that is acceptable? Or perhaps not? Or less drastically, how does the FAI police warfare or other competition between civilisations? A slight change in the way life is quantified and valued could change drastically the outcome for humanity. I’d probably suggest we want to weight the FAI’s values to start with human and Earth biosphere primacy, but then still give some non-zero weighting to other civilisations. There is probably more thought to be done in this area too.

Simulation

I want to also briefly note that one conceivable way we might postulate as a safe way to test Friendly AI designs is to simulate a worlds/​universes of less complexity than our own, make it likely that it’s inhabitants invent a AGI or FAI, and then closely study the results of these simluations. Then we could study failed FAI attempt with much greater safety. It also occured to me that if we consider the possibilty of our universe being a simulated one, then this is a conceivable scenario under which our simulation might be created. After all, if you’re going to simulate something, why not something vital like modelling existential risks? I’m not sure yet sure of the implications exactly. Maybe we need to consider how it relates to our universe’s continued existence, or perhaps it’s just another case of Pascal’s Mugging. Anyway I thought I’d mention it and see what people say.

A playground for FAI theories

I want to lastly mention this link (https://​​www.reddit.com/​​r/​​LessWrongLounge/​​comments/​​2f3y53/​​the_ai_game/​​). Basically its a challenge for people to briefly describe an FAI goal-set, and for others to respond by telling them how that will all go horribly wrong. I want to suggest this is a very worthwhile discussion, not because its content will include rigourous theories that are directly translatable into utility functions, because very clearly it won’t, but because a well developed thread of this kind would be mixing pot of ideas and good introduction to common known mistakes in thinking about FAI. We should encourage a slightly more serious verison of this.

Thanks

FAI and AGI are very interesting topics. I don’t consider myself able to really discern whether such things will occur, but its an interesting and potentially vital topic. I’m looking forward to a bit of feedback on my first LW post. Thanks for reading!