Noosphere89 comments on Empathy as a natural consequence of learnt reward models

Noosphere89 9 Feb 2023 18:48 UTC
1 point
0
I want to pour cold water on this, because the results here have limits on how far this can be generalized, because of certain impossibility results here on the subject. While the empathy may be learned, it doesn’t on it’s own have arbitrarily high capabilities with arbitrarily high compute, due to the No Free Lunch theorem for value learning. Worse simplicity priors, basically the reason why more compute reliably leads to more capabilities, do not solve the problem, which is the reason why value/empathy learning problems will not be solved by default with more capabilities, so it’s not a universal learning machine.

However, the more interesting question is how much strength do we need in our assumptions to get empathy/value learning working.

In the optimistic scenario, we need few and weak assumptions.

In the pessimistic scenario, either many or strong assumptions are necessary in order to do value learning/empathy.

There’s a link on the optimistic/pessimistic scenario:

https://www.lesswrong.com/posts/6XLyM22PBd9qDtin8/learning-human-preferences-optimistic-and-pessimistic

Another on how our empathy modules are similar enough to model each other:

https://www.lesswrong.com/posts/LkytHQSKbQFf6toW5/anthropomorphisation-vs-value-learning-type-1-vs-type-2