mesaoptimizer comments on What’s the “This AI is of moral concern.” fire alarm?

mesaoptimizer 13 Jun 2022 9:54 UTC
4 points
0

I think that one class of computation that’s likely of moral concern would be self-perpetuating optimization demons in an AI.

Could you please elaborate why you think optimization demons (optimizers) seem worthier of moral concern than optimized systems? I guess it would make sense if you believed them to deserve equal moral concern, if both are self-perpetuating, all other things being equal.

I think the cognitive capabilities that would help an optimization demon perpetuate itself strongly intersect with the cognitive capabilities that let humans and other animals replicate themselves, and that the intersection is particularly strong along dimensions that seem more morally relevant. Reasoning along such lines leads me to think optimization demons are probably of moral concern, while still being agnostic about whether their conscious.

I’m pessimistic about this line of reasoning—the ability to replicate is something that cells also have, and we do not assign moral relevance to individual cells of human beings. A good example is the fact that we consider viruses, and cancerous cells as unworthy of moral concern.

Perhaps you mean that given the desire to survive and replicate, at a given amount of complexity, a system develops sub-systems that make the system worthy of moral concern. This line of reasoning would make more sense to me.

I think the only situations in which you can get these sorts of optimization demons are when the AI in question has some influence over its own future training inputs. Such influence would allow there to be optimization demons that steer the AI towards training data that reinforce the optimization demon.

This can imply that only systems given a sufficient minimum capability have agency over their fate, and therefore their desire to survive and replicate has meaning. I find myself confused by this, because taken to its logical conclusion, this means that the more agency a system has over its fate, the more moral concern it deserves.

Specifically, we wouldn’t directly train the LM on the output of the linear layer. We’d just have a dialog where we asked the LM to make the linear layer output specific values, then told the LM what value the linear layer had actually output. We’d then see if the LM was able to control its own cognition well enough to influence the linear layers output in a manner that’s better than chance, just based on the prompting we give it.

This seems reducible to a sequence modelling problem, except one that is much, much more complicated than anything I know models are trained for (mainly because this sequence modelling occurs entirely during inference time). This is really interesting, although I cannot see how this should imply that the more successful sequence modeller deserves more moral concern.
- Quintin Pope 14 Jun 2022 23:46 UTC
  1 point
  0
  Parent
  I’d first note that optimization demons will want to survive by default, but probably not to replicate. Probably, an AI’s cognitive environment is not the sort of place where self-replication is that useful a strategy.
  My intuition regarding optimization demons is something like: GPT-style AIs look like they’ll have a wide array of cognitive capabilities that typically occur in intelligences to which we assign moral worth. However, such AIs seem to lack a certain additional properties whose absence leads us to assign low moral worth. It seems to me that developing self-perpetuating optimization demons might cause a GPT-style AI to gain many of those additional properties. E.g., (sufficiently sophisticated) optimization demons would want to preserve themselves and have some idea of how the model’s actions influence their own survival odds. They’d have a more coherent “self” than GPT-3.
  Another advantage to viewing optimization demons as the source of moral concern in LLMs is that such a view actually makes a few predictions about what is / isn’t moral to do to such systems, and why they’re different from humans in that regard.
  E.g., if you have an uploaded human, it should be clear that running them in the mini-batched manner that we run AIs is morally questionable. You’d be creating multiple copies of the human mind, having them run on parallel problems, then deleting those copies after they complete their assigned tasks. We might then ask if running mini-batched, morally relevant AIs is also morally questionable in the same way.
  However, if it’s the preferences of optimization demons that matter, then mini-batch execution should be fine. The optimization demons you have are exactly those that arise in mini-batched training. Their preferences are orientated towards surviving in the computational environment of the training process, which was mini-batched. They shouldn’t mind being executed in a mini-batched manner.
  This can imply that only systems given a sufficient minimum capability have agency over their fate, and therefore their desire to survive and replicate has meaning. I find myself confused by this, because taken to its logical conclusion, this means that the more agency a system has over its fate, the more moral concern it deserves.
  I don’t think that agency alone is enough to imply moral concern. At minimum, you also need self-preservation. But once you have both, I think agency tends to correlate with (but is not necessarily the true source of) moral concern. E.g., two people have greater moral concern than one, and a nation has far more moral concern than any single person.
  This seems reducible to a sequence modelling problem, except one that is much, much more complicated than anything I know models are trained for (mainly because this sequence modelling occurs entirely during inference time). This is really interesting, although I cannot see how this should imply that the more successful sequence modeller deserves more moral concern.
  All problems are ultimately reducible to sequence modeling. What this task is investigating is exactly how extensive are the meta-learning capabilities of a model. Does the model have enough awareness / control over its own computations that it can manipulate those computations to some specific end, based only on text prompts? Does it have the meta-cognitive capabilities to connect its natural language inputs to its own cognitive state? I think that success here would imply a startling level of self-awareness.