A concise sum-up of the basic argument for AI doom

Mergimio H. Doefevmil24 Apr 2023 17:37 UTC

11 points

Distillation & Pedagogy Inner Alignment AI Risk AI Superintelligence

1 - An artificial super-optimizer is likely to be developed soon.

2 - There is no known way of programming goals into an advanced optimizer, only outwardly observable behaviors of which we have no idea why they are being carried out or what motivates them.

3 - Most utility functions do not have optima with humans in them. Most utility functions do not have a term for humans at all.

4 - “Why haven’t we exterminated all mice/bugs/cows then?” draws quite a poor analogy. Firstly, we are not superoptimizers. Secondly, and more importantly, we care about living beings somewhat. The optimum of the utility function of the human civilization quite possibly does have mice/bugs/cows, perhaps even genetically engineered to not experience suffering. We are not completely indifferent to them.

The relationship between most possible superoptimizers and humanity is not like the relationship between humanity and mice at all—it is much more like the relationship between humanity and natural gas. Natural gas, not mice, is a good example of something humans are truly indifferent to—there is no term for it in our utility function. We don’t hate it, we don’t love it, it is just made of atoms that we can, and do, use for something else.

Moreover, the continued existence of natural gas probably does not pose the same threat to us as the continued existence of us would pose to a superoptimizer which does not have a term for humans in its utility function—just like we don’t have a term for natural gas in ours. Natural gas can not attempt to turn us off, and it can not create “competition” for us in form of a rival species with capabilities similar to, or surpassing, our own.

P.S. If you don’t like or find confusing the terminology of “optimizers” and “utility functions”, feel free to forget about all that. Think instead of physical states of the universe. Out of all possible states the universe could find itself in, very, very, very few contain any humans. Given a random state of the universe, creating a superintelligence that steered the universe towards that state would result in an almost guaranteed apocalypse. Of course, we want our superintelligence to steer the universe towards particular states—that’s kind of the whole point of the entire endeavor. Were it not so, we would not be attempting to build a superintelligence - a sponge would suffice. We essentially want to create a super-capable universe-steerer. The problem is getting it to steer the universe towards states we like—and this problem is very hard because (among other things) currently, given any internal desire to steer the universe towards any state at all, it is impossible to program that desire into anything at all.

Mergimio H. Doefevmil24 Apr 2023 17:37 UTC

11 points

6 comments2 min readLW link

Distillation & Pedagogy Inner Alignment AI Risk AI Superintelligence

TAG 28 Apr 2023 15:56 UTC
5 points
−4

There is no known way of programming goals into an advanced optimizer, only outwardly observable behaviors of which we have no idea why they are being carried out or what motivates them.

If true, what do we conclude? That an optimiser will have its own goals, or that no AI will have any goals ? (Probably meaning that there are no possible optimisers, since optimisers need goals.)

An artificial super-optimizer is likely to be developed soon.

OK, you believe in optimisers … so where do they get their goals from?
What links here?
- TAG's comment on [SEE NEW EDITS] No, *You* Need to Write Clearer by Nicholas Kross (30 Apr 2023 12:43 UTC; 2 points)
Gesild Muka 1 May 2023 12:35 UTC
2 points
2
Good write up, I’ll definitely use this to introduce others to LW. Maybe one more numbered point to stress the scope of this issue would help explain the inherent danger. I tried to read this post from the perspective of someone who is new to this topic and, for me, it leaves the door open for the ‘not my problem’ argument or the position that ‘this will never affect my town/city/country so why should I care?’
A hypothetical point #5 could perhaps stress the uncertainty of an isolated vs global disaster and/or explain that, unlike other technologies, we only get one chance to get it right. I’m sure there’s a more elegant way to word it, the possibility of a ‘rival species’ mentioned at the end should be enough to make this point but it’s easy to overlook the implications for someone new to this type of thinking.
Mergimio H. Doefevmil 3 May 2023 2:45 UTC
0 points
0
I took it as self evident that a superintelligent optimizer with a utility function the optimum of which does not contain any humans would put the universe in a state which does not contain any humans. Hence, if such an optimizer is developed, the entire human population will end and there will be no second chances.
One point deserving of being stressed is that this hypothetical super-optimizer would be more incentivized to exterminate humanity in particular than to exterminate (almost) any other existing structure occupying the same volume of space. In other words, the reason that we occupy space that could be re-structured to the target configuration defined by the optimum of the utility function of the super-optimizer, and the reason that we both vitally require, and are ourselves made of resources that are instrumentally valuable (There is hydrogen in the oceans that could be fused. There is an atmosphere that could be burned. There is energy in our very bodies that could be harvested.) are not the only reasons why the super-optimizer would want to kill us. There is another important reason, namely avoiding competition. After all, we would have already demonstrated ourselves to be capable of creating an artificial super-optimizer, and we thus probably could, if allowed to, create another—with a different utility function. The already existing super-optimizer would have very little reason to take that risk.
- TAG 3 May 2023 10:36 UTC
  6 points
  1
  Parent
  
  with a utility function the optimum of which does not contain any humans
  
  Most theoretically possible UF ’s don’t contain humans, but that doesn’t mean that an AI see construct will have such a UF, because we are not taking a completely random shot into mindshare … We could not, even i we wanted to. That’s one of the persistent holes in the argument. (Another is the assumption that an AI will necessarily have a UF).
  
  The argument doesn’t need summarising: it needs to be rendered valid by closing the gaps.
  - Mergimio H. Doefevmil 4 May 2023 1:46 UTC
    0 points
    0
    Parent
    Yes, we would be even worse off if we randomly pulled out a superintelligent optimizer out of the space of all possible optimizers. That would, with almost absolute certainty, cause swift human extinction. The current techniques are somewhat better than taking a completely random shot in the dark. However, especially given point No.2, that can be of only very little comfort to us.
    All optimizers have at least one utility function. At any given moment in time, an optimizer is behaving in accordance with some utility function. It might not be explicitly representing this utility function, it might not even be aware of the concept of utility functions at all—but at the end of the day, it is behaving in a certain way as opposed to another. It is moving the world towards a particular state, as opposed to another, and there is some utility function that has an optimum in precisely that state. In principle, any object at all can be modeled as having a utility function, even a rock.
    Naturally, an optimizer can have not just one, but multiple utility functions. That makes the problem even worse, because then, all of those utility functions need to be aligned.
    - TAG 4 May 2023 11:18 UTC
      1 point
      0
      Parent
      
      All optimizers have at least one utility function.
      
      That’s definitional. It doesn’t show there are any optmisers, that all AI’s are optimisers,etc.