# Stuart_Armstrong

Karma: 20,054

NewTop

I think it depends on the individual. Certainly, before realising the points above, I would occasionally mentally do the “assume human values solved” in my mind, in an unrigorous and mentally misleading way.

What do you mean by “you actually have Y values”? What are you defining values to be?

Because once we have these parameters, we can learn the values of any given human. In contrast, it we learn the values of a given human, we don’t get to learn the values of any other one.

I’d argue further: these parameters form part of a

*definition*of human values. We can’t just “learn human values”, as these don’t exist in the world. Whereas “learn what humans model each other’s values (and rationality) to be” is something that makes sense in the world.

If we want to apply it to humans, something much more complicated than that, which uses some measure of how complex humans see actions, takes into account how and when we search for alternate solutions. There’s a reason most models don’t use bounded rationality; it ain’t simple.

Corrected, thanks!

# A hundred Shakespeares

I agree it’s part of the story, but only a part. And real humans don’t act as if there was a set of actions of size n, and they could consider all of them with equal ease. Sometimes humans have much smaller action sets, sometimes they can produce completely unexpected actions, and most of the time we have a pretty small set of obvious actions and a much larger set of potential actions we might be able to think up at the cost of some effort.

# Bounded rationality abounds in models, not explicitly defined

# Figuring out what Alice wants: non-human Alice

# Assuming we’ve solved X, could we do Y...

# Why we need a *theory* of human values

Thanks for introducing me to the box topology—seeing it defined so explicitly, and seeing what properties it fails, cleared up a few of my intuitions.

A chess tree search algorithm would never hit upon killing other processes. An evolutionary chess-playing algorithm might learn to do that. It’s not clear whether goal-directed is relevant to that distinction.

Hum, should be compact by Tychonoff’s theorem (see also the Hilbert Cube, which is homeomorphic to ).

For your proof, I think that is not open in the product topology. The product topology is the coarsest topology where all the projection maps are continuous.

To make all the projection maps continuous we need all sets in to be open, where we define iff there exists an , such that is open in and .

Let be the set of finite intersection of these sets. For any , there exists a finite set such that if and for , then as well.

If we take to be the arbitrary union of , this condition will be preserved. Thus is not contained in the arbitrary unions and finite intersections of , so it seems it is not an open sent.

Also, is second-countable. From the wikipedia article on second-countable:

Any countable product of a second-countable space is second-countable

Note that this starts from the assumption of goal-directed behavior and derives that the AI will be an EU maximizer along with the other convergent instrumental subgoals.

The result is actually stronger than that, I think: if the AI is goal-directed

*at least in part*, then that part will (tend to) purge the non-goal directed behaviours and then follow the EU path.I wonder if we could get theorems as to what kinds of minimal goal directed behaviour will result in the agent becoming a completely goal-directed agent.

But still? A hundred Shakespeares?

I’d wager there are thousands of Shakespeare-equivalents around today. The issue is that Shakespeare was not only talented, he was successful—wildly popular, and able to live off his writing. He was a superstar of theatre. And we can only have a limited amount of superstars, no matter how large the population grows. So if we took only his first few plays (before he got the fame feedback loop and money), and gave them to someone who had, somehow, never heard of Shakespeare, I’d wager they would find many other authors at least as good.

This is a mild point in favour of explanation 1, but it’s not that the number of devoted researchers is limited, it’s that the slots at the top of the research ladder are limited. In this view, any very talented individual who was also a superstar, will produce a huge amount of research. The number of very talented individuals has gone up, but the number of superstar slots has not.

But I also sense a privileging of a particular worldview, namely a human one, that may artificially limit the sorts of useful categories we are willing to consider.

This is deliberate—a lot what I’m trying to do is figure out human values, so the human worldviews and interpretations will generally be the most relevant.

# Humans can be assigned any values whatsoever…

politicians like being able to actually change public behavior

But the ways in which they want to/can change them is strongly influenced by moral preferences among voters, donor, and civil servants. Why did they shift recycling or bring in clean air/water acts, rather than bringing in any of a million other policy changes they could have?

In the example in https://www.lesswrong.com/posts/rcXaY3FgoobMkH2jc/figuring-out-what-alice-wants-part-ii , I give examples of two algorithms with the same outputs but where we would attribute different preferences to them. This sidesteps the impossibility result, since it allows us to consider extra information, namely the internal structure of the algorithm, in a way relevant to value-computing.