Rob Bensinger comments on AGI Ruin: A List of Lethalities

Rob Bensinger 6 Jun 2022 20:22 UTC
LW: 5 AF: 1
2
AF
If we go down that path then it becomes the sort of conversation where I have no idea what common assumptions do we have, if any, that we could use to agree. As a general rule, I find it unconstructive, for the purpose of trying to agree on anything, to say things like “this (intuitively compelling) assumption is false” unless you also provide a concrete argument or an alternative of your own. Otherwise the discussion is just ejected into vacuum.
Fair enough! I don’t think I agree in general, but I think ‘OK, but what’s your alternative to agency?’ is an especially good case for this heuristic.
Which is to say, I find it self-evident that “agents” are exactly the sort of beings that can “want” things, because agency is about pursuing objectives and wanting is about the objectives that you pursue.
The first counter-example that popped into my head was “a mind that lacks any machinery for considering, evaluating, or selecting actions; but it does have machinery for experiencing more-pleasurable vs. less pleasurable states”. This is a mind we should be able to build, even if it would never evolve naturally.
Possibly this still qualifies as an “agent” that “wants” and “pursues” things, as you conceive it, even though it doesn’t select actions?
- Vanessa Kosoy 7 Jun 2022 6:23 UTC
  LW: 9 AF: 1
  1
  AF Parent
  My 0th approximation answer is: you’re describing something logically incoherent, like a p-zombie.
  
  My 1st approximation answer is more nuanced. Words that, in the pre-Turing era, referred exclusively to humans (and sometimes animals, and fictional beings), such as “wants”, “experiences” et cetera, might have two different referents. One referent is a natural concept, something tied into deep truths about how the universe (or multiverse) works. In particular, deep truths about the “relatively simple core structure that explains why complicated cognitive machines work”. The other referent is something in our specifically-human “ontological model” of the world (technically, I imagine that to be an infra-POMDP that all our hypotheses our refinements of). Since the latter is a “shard” of the former produced by evolution, the two referents are related, but might not be the same. (For example, I suspect that cats lack natural!consciousness but have human!consciousness.)
  
  The creature you describe does not natural!want anything. You postulated that it is “experiencing more pleasurable and less pleasurable states”, but there is no natural method that would label its states as such, or that would interpret them as any sort of “experience”. On the other hand, maybe if this creature is designed as a derivative of the human brain, then it does human!want something, because our shard of the concept of “wanting” mislabels (relatively to natural!want) weird states that wouldn’t occur in the ancestral environment.
  
  You can then ask, why should we design the AI to follow what we natural!want rather than what we human!want? To answer this, notice that, under ideal conditions, you converge to actions that maximize your natural!want, (more or less) according to definition of natural!want. In particular, under ideal conditions, you would build an AI that follows your natural!want. Hence, it makes sense to take a shortcut and “update now to the view you will predictably update to later”: namely, design the AI to follow your natural!want.