ME, morals can be derived from game theory, but very probably they will not be exactly the same morals that most people agree with. There are many situations when an act which clearly presents a benefit for the species is almost unanimously considered immoral. Like killing of a 60 years old woman, when you can use her organs to save lifes of ten other women who are still in their reproductive age. The main universally accepted morals are evolved and evolution doesn’t reproduce the game theory perfectly.
prase
Roko,
“ME, morals can be derived from game theory… ”—I disagree. Game theory doesn’t tell you what you should do, it only tells you how to do it. - That was almost what I intended to say, but I somehow failed to formulate it well so you understood I had said the contrary...
Of course, what you have said isn’t sufficiently precise to be either correct or incorrect—words like “murder”, “intelligent” are very much in need of defining precisely. - I’m not sure that defining precisely what is murder is important for this debate. Obviously you can make the chains of definitions as long as you wish, but somewhere you have to stop and consider the words “primary” with “intuitive meaning”. If you think murder is too ambiguous, imagine something else which most people find wrong, the arguments remain the same.
Laws exist so that society functions correctly. - What does mean “correctly” in this statement?
An AI that randomly murdered people would not benefit from having those people around, so it would not be as intelligent/successful as a similar system which didn’t murder. - How can you know what would constitute a “benefit” for the AI? Most species on Earth would benefit (in the evolutionary sense) from human extinction, why not an AI?
The nice thing with believing in no objective morality is that you needn’t to solve such poorly intelligible questions. I hope Eliezer is trying to demonstrate the absurdity of believing in objective morality, if so, then good luck!
“I mean… if an external objective morality tells you to kill babies, why should you even listen?”—this is perhaps a dangerous question, but still I like it. Why should you do what you should do? Or put differently, what is the meaning of “should”?
Concerning the four interpretations:
I am not sure what exactly the first two mean. If the first one means that the current will or want is not caused by itself, that seems to me as true, but it is sort of truism which doesn’t include much information (nothing is caused by itself, as far as the word cause is used in ordinary language).
If the second means that the alternate choices of what we want are not reachable (in sense we do not make conscious decision process to choose our wants), it is true concerning the primary desires (like the desire to survive) and false concerning more complicated (or derived) desires (e.g. desire to get some particular job); but if one takes the distinction between primary and derived desires as defined by the fact whether we consciously decide about them, then the second interpretation is empty.
The third is probably false. I was too lazy to think about what Eliezer precisely means by “controlling our passions”, but if we interpret it in a way how it would be interpreted by a random person with no special interest in philosophy, then it is false.
The fourth is true as long as there is some sharp border between “we” and “our desires”, otherwise also rather empty.
Altogether the four interpretations, altough more specific than the original sentence, seem to me only slightly less ambiguous. As a result I know neither what Schopenhauer intended to say nor what Eliezer intended to say.
My interpretation (of Schopenhauer, not of EY’s interpretation thereof) is that the processes in our brains can be divided into formation of desires and practical decisions. The practical decisions are caused by the desires (we can do what we want), but it has no sense to say that the desires are caused by themselves—the only input of the creation of the set of all desires comes from the outside world (we cannot want what we want). It is probably closest to the first EY’s interpretation.
Perhaps I’m being dim, but a prior is a probability distribution, isn’t it? Whereas Occam’s Razor and induction aren’t: they’re rules for how to estimate prior probability.
But we can think about probability that Occam’s razor produces correct answers, this probability is a prior.
Our ‘absolutely universal’ laws can be shown to have predictive power over an infinitesimal speck of the cosmos. Our ability to observe even natural experiments in the rest of the universe is extremely limited. … Experience teaches us that, at any given time, the majority of our beliefs will be wrong, and the only thing that makes even approximate correctness possible is precisely what we cannot apply to the universe as a whole.
Our ability to observe natural experiments even on Earth is extremely limited (e.g. we surely haven’t seen most of elementary particles that can be produced here on Earth if one had sufficient energy). But what’s the problem with that? Experience teaches us that most of our beliefs have limited domain of validity, rather than are wrong. Newtonian physics is not “wrong”, it is still predictive and useful, even when we have got now better theories valid in larger set of situations.
Perhaps it’s better to reformulate Eliezer’s statement about universal laws something like that “all phenomena we encountered could be described by relatively simple set of laws; every newly discovered phenomenon makes the laws more precise, instead of totally dicarding them”. I think this is not completely trivial statement, as I can imagine a world where the laws were as complicated as the phenomena themselves, thus a world where nothing was predictable.
A particularly interesting question is, what would people of e.g. Roman empire or mediaeval France think about today’s society. We can compare the morality of the past with contemporary standards, but we can’t see the future. I wonder whether mediaeval people would find our morality less despicable than we find theirs. If such comparison was possible, one could define some sort of objective (or subjectively objective?) criterion—simply put together two societies with different moral codes and watch how many will convert from first to the second and vice versa. Anyway, it is probable that different moral codes are not all equally well suited for human nature. If so, tha apex can be defined as the moral code which is perfectly stable, i.e. does not evolve (given we stop human biological evolution) and in contact with different moral code becomes dominant.
Larry D’Anna: The reason that we say it is too big is because there are subsets of Mindspace that do admit universally compelling arguments, such as (we hope) neurologically intact humans.
What precisely is neurological intactness? It rather seems to me that the majority agrees on some set of “self-evident” terminal values, and those few people that do not are called psychopaths. If by “human” we mean what usually people understand by this term, then there are no compelling arguments even for humans. Althoug I gladly admit your statement is approximatively valid, I am not sure how to formulate it to be exactly true and not simultaneously a tautology.
Caledonian, Those conditions weren’t created in the laboratory, because the individual strategy dominated over the group; ergo, the conditions necessary for that to happen were not met.
I am not sure how to interpret this. When we remove whole groups from the gene pool because of some group characteristic (i.e. averaged over the population of that group), it sounds for me natural to call that a group selection. Do you have some different meaningful definition of group selection? What does it mean in general when you say that the individual strategy dominates?
You cannot define fairness entirely in terms of “That which everyone agrees is ‘fair’.” This isn’t just nonterminating. It isn’t just ill-defined if Dennis doesn’t believe that ‘fair’ is “that which everyone agrees is ‘fair’”. It’s actually entirely empty, like the English sentence “This sentence is true.”
I don’t think the definition based on universal agreement is a particularly clever definition, nevertheless I don’t see how it is empty. If there was something that everyone agreed was fair, then such definition would be meaningful and non-empty. It doesn’t follow that the definition itself must be fair. It is your demand of fairness of the definition of fairness that makes it self-referential.
If there was something that everyone agreed was xyblz, …
Fixed that for you.
That’s almost exactly the sort of answer I expected, except I don’t see how it fixes anything.
If the definition consists of nothing but the observation that people agree, then it provides information about people, not the ostensible subject.
Depends on what we know in the beginning. If we knew the opinions of people, then it provides an information about the meaning of the word. This is the way how language is learned in the childhood—by observing what meaning other people attach to words. Even much later we learn to employ dictionaries and strict definitions.
If you define “red” as “whatever everyone agrees is red”, it is for most people and everyday purposes more informative than the definition “emitting light of wavelength about 700 nm”, and the definitions are practically equivalent. The difference is that we use a representative sample of population instead of a double-slit experimental setting.
The strategy “proceed as far as possible in the moment without any long term planning or cooperation” is quite common, not only in exiting planes and buses:
The counterargument is, in part, that some classifiers are better than others, even when all of them satisfy the training data completely. The most obvious criterion to use is the complexity of the classifier.
The point is, probably, that humans tend to underestimate the complexity of classifiers they use. The categories like “good” are not only difficult to precisely define, they are difficult to define at all, because they are too complicated to be formulated in words. To point out that in classification we use structures based on the architecture of human brain (or whatever uniquely human) is not, in my opinion, a relativist fallacy.
To use a bit stretched analogy, to program a 3D animation on computer with an advanced graphic card and an obsolete processor may be simpler for the programmer than to program quicksort. Simplicity is not a mind-independent criterion.
I would make a random decision (using random number generator), since the only alternative I see is an infinite recursion of thoughts about the Maximizer. There is not enough information in this case to even assign non-trivial probabilities to Maximizer’s C and D possible decisions because I don’t know how the Maximizer thinks. But even assuming that the Maximizer uses the same reason as I do, only with different value system, still I see no escape from the recursion. So a:=random(0..1);if a>0.5 then C else D (is it rational to expect such probabilities?) and press enter… If the decision process doesn’t converge, we should take either possibility, I think.
A.Crossman: Prase, Chris, I don’t understand. Eliezer’s example is set up in such a way that, regardless of what the paperclip maximizer does, defecting gains one billion lives and loses two paperclips. This is standard defense of defecting in a prisonner’s dilemma, but if it were valid then the dilemma wouldn’t be really a dilemma.
If you can assume that the maximizer uses the same decision algorithm as we do, we can also assume that it will come to the same conclusion. Given this, it is better to cooperate, since it will gain billion lives (and a paperclip). But we don’t know whether the paperclipper uses the same algorithm.
Zubon,
When do you think Clippy is planning to start defecting?
If Clippy decides the same way as I do, then I expect he starts defecting at the same turn as I do. The result is 100x C,C. There is no way how identical deterministic algorithms with the same input can result in different outputs, so in each turn, C,C or D,D are the only possibilities. It’s rational to C.
However, “realistic” Clippy uses different algorithm which is unknown to me. Here I genuinely don’t know what to do. To have some preference to choose C over D or conversely, I would need at least some rough prior probability distribution on the space of all possible decision algorithms suitable for Clippy. But I can hardly imagine such a space.
Reminds me a bit the problem of two envelopes where you know that one of them has 10 times greater amount of money than the second, but otherwise these amounts are random. (V.Nesov, do you know the canonical name of this paradox?) You open the first, find some amount, and then have to choose between accepting it or taking the second envelope. You cannot resolve that without having some idea about what “random” here means, how the amounts of money were distributed into the envelopes. If you don’t know anything about the process, you face questions like “what is the most natural probability distribution on the interval (0,\infty)?”, that I don’t know how to answer.
Anyway, I think these dilemmas are typical illustration of insufficient information for any rational decision. Without information any decision is ruled by bias.
V.Nesov: There is nothing symmetrical about choices of two players. One is playing for paperclips, another for different number of lives. One selects P2.Decision, another selects P1.Decision. How to recognize the “symmetry” of decisions, if they are not called by the same name?
The decision processes can be isomorphic. We can think about the paperclipper being absoulutely the same as we are, except valuing paperclips instead of our values. This of course assumes we can separate the thinking into “values part” and “algorithmic part” (and that the utility function of the paperclipper is such that the payoff matrix is symmetric), which seems unrealistic and that’s why I wrote I don’t know what strategy is the best.
“After observing empirically that the LHC had failed 100 times in a row, would you endorse a policy of keeping the LHC powered up, but trying to fire it again only in the event of, say, nuclear terrorism or a global economic crash?”
After observing 100 failures in a row I would expect that a failure would occur after the next attempt to switch it on too. So it doesn’t seem as a reliable means to prevent terrorism or economic crash even if anthropic multi-world “ideology” were true.
On the other hand, if somebody were able to show that the amplitude of LHC’s unexpected failure for technical reasons was significantly lower than the amplitude of terrorist-free future...
BH will grow slowly, but exponentialy. By some assumptions it could take 27 years to eat the earth. So we will have time to understand our mistake and to suffer from it.
I am curious about these assumptions. BH with mass of the whole Earth has the Schwartzschild radius about 1cm. At start the BH should be much lighter, so it’s not clear to me how could this BH, sitting in the centre of Earth, eat anything.
roko, Given that at least some phycisists have come up with vaguely plausible mechanisms for stable micro black hole creation, you should think about outrageous or outspoken claims made in the past by a small minority of scientists. How often has the majority view been overturned? I suspect that something like 1/1000 is a good rough guess for the probability of the LHC destroying us. This reasoning gives the probability 1/1000 for any conceivable minority hypothesis, which is inconsistent. In general, I think this debate only illustrates the fact that people are not good at all in guessing extremely low or extremely high probabilities and usually end up in some sort of inconsistency.
Unknown, maybe we don’t need to give the AI some special ethical programming, but we will surely need to give it basic ethical assumptions (or axioms, data, whatever you call that) if we want it to make ethical conclusions. And the AI will process the information given these assumptions and return answers according to these assumptions—or maybe collapse when the assumptions are self-contradictory—but I can’t imagine how could the AI given “murder is wrong” as an axiom reach the conclusion “murder is OK” or vice versa.
Regarding Roko’s suggestion that the AI should contain information about what people think and conclude whose opinion is correct—the easiest way to do this is to count each opinion and pronounce the majority’s view correct. This is of course not much intelligent, so you can compare the different opinions, make some consistency checks, perhaps modify the analysing procedure itself during the run (I believe the will be no strict boundary between the “data” and “code” in the AI), but still the result is determined by the input. If people can create an AI which says “murder is wrong”, they can surely create also an AI which tells the contrary, and the latter would be no less intelligent than the former.