Why Do I Think I Have Values?

Epistemic status: I just thought of this today, and it made a huge amount of my confusion on this topic disappear.

Values are this weird concept. Humans aren’t utility maximizers, but we think of ourselves as them. There’s a paradox in the values-beliefs framework: humans can identify the parts of themselves which correspond to values and beliefs, but this should actually be impossible? Here I propose the following answer:

Humans model themselves with an algorithm which has evolved to model other humans as value-belief based systems.

So why would this be the case? I can think of a few reasons.

Minimaxing

The minimax algorithm as applied to chess (or other finite-option, total-information games) means assuming your opponent will play the best move for them on every turn. This is the optimal[1] way to play in the infinite compute limit, and is also pretty close to optimal in most other situations (actual algorithms sometimes attempt to compute the probability of their opponent playing each given move).

So what do you do if you’re a hominid playing politics against other hominids? It seems likely that the most successful strategy would be to (by default) assume your opponent is as powerful as possible. This you’ll take the fewest risks, and that you’ll be the most accurate when it comes to the most dangerous opponents.

Since coherent decisions imply consistent utility, the most powerful opponents are expected utility maximizers, especially those who want things which cause them to compete with you. If your political opponent isn’t well approximated by a utility maximizer, they won’t be a threat. Even more subtly, if your opponent takes a mixture of actions, the most “important” actions will look like them acting as an expected utility maximizer. All the other actions will be lost as noise.

Low Compute

Modelling another human as an expected utility maximizer is also probably quite efficient. Starting with a prior that they want the exact same things as a typical human, and then updating away from that, is pretty cheap. Starting with a prior that they will act towards their desires based on all the information they can see is also pretty cheap. Combined they do a decent job of guessing another human’s actions. Especially the set of actions which are most relevant to you.

This only requires modelling another’s senses, keeping track of their knowledge, any deviations from the “default” desires, and then running your own version of a utility maximizing algorithm on that to predict their actions.

Updating Beliefs and Keeping Risk Low

Since humans are able to update their beliefs about others, being wrong in the right direction is important. It’s much better to overestimate a threat than underestimate a threat.

As an illustration: imagine someone is just really stupid about their utility maximization. You’ll treat them as a threat until you learn they’re an idiot, which is a pretty low-cost mistake to make. If they want something really weird, like to collect some cool rocks, that probably means they’re not competing for your political position. Again, a low-cost mistake, and once you figure out what’s really going on you can just ignore them.

Self-modelling

A key part of humans is our ability to self model. It has been argued that this is the source of “consciousness” (but I’m not getting into that mess again), and it’s definitely important.

Until now I was aware of concepts like the blue-minimizing robot, but it seemed to be begging the question. Why would a blue-minimizing-strategy-executing robot have a self-modelling algorithm which is predisposed to be wrong in some important sense?

Now I feel I have an explanation (not necessarily correct, but gears-level at least!) for why the strategy-executing, reinforcement-learning, other-thing-doing human brain comes equipped with a tendency to model itself as this highly directed utility maximizing thing.

Also, I simply cannot shake the feeling that people (by default) model themselves as a “perfectly rational bayesian homunculus” and then add on patches to the model account for “irrationality”. You just are cognitive biases.

  1. ^

    In the sense of maximizing probability of wining the game.