I am glad Stanislav Petrov, contemplating his military oath to always obey his superiors and the appropriate guidelines, never read this post.
Yvain2
“Historically speaking, it seems likely that, of those who set out to rob banks or murder opponents “in a good cause”, those who managed to hurt themselves, mostly wouldn’t make the history books. (Unless they got a second chance, like Hitler after the failed Beer Hall Putsch.) Of those cases we do read about in the history books, many people have done very well for themselves out of their plans to lie and rob and murder “for the greater good”. But how many people cheated their way to actual huge altruistic benefits—cheated and actually realized the justifying greater good? Surely there must be at least one or two cases known to history—at least one king somewhere who took power by lies and assassination, and then ruled wisely and well—but I can’t actually name a case off the top of my head. By and large, it seems to me a pretty fair generalization that people who achieve great good ends manage not to find excuses for all that much evil along the way.”
History seems to me to be full of examples of people or groups successfully breaking moral rules for the greater good.
The American Revolution, for example. The Founding Fathers committed treason against the crown, started a war that killed thousands of people, and confiscated a lot of Tory property along the way. Once they were in power, they did arguably better than anyone else of their era at trying to create a just society. The Irish Revolution also started in terrorism and violence and ended in a peaceful democractic state (at least in the south); the war of Israeli independence involved a lot of terrorism on the Israeli side and ended with a democratic state that, regardless of what you think of it now, didn’t show any particularly violent tendencies before acquiring Palestine in the 1967 war.
Among people who seized power violently, Augustus and Cyrus stand out as excellent in the ancient world (and I’m glad Caligula was assassinated and replaced with Claudius). Ho Chi Minh and Fidel Castro, while I disagree with their politics, were both better than their predecessors and better than many rulers who came to power by more conventional means in their parts of the world.
There are all sorts of biases that would make us less likely to believe people who “break the rules” can ever turn out well. One is the halo effect. Another is availability bias—it’s much easier to remember people like Mao than it is to remember the people who were quiet and responsible once their revolution was over, and no one notices the genocides that didn’t happen because of some coup or assassination. “Violence leads only to more violence” is a form of cached deep wisdom. And there’s probably a false comparison effect: a post-coup government may be much better than the people they replaced while still not up to first-world standards.
And of course, “history is written by the victors”. When the winners do something bad, it’s never interpreted as bad after the fact. Firebombing a city to end a war more quickly, taxing a populace to give health care to the less fortunate, intervening in a foreign country’s affairs to stop a genocide: they’re all likely to be interpreted as evidence for “the ends don’t justify the means” when they fail, but glossed over or treated as common sense interventions when they work. Consider the amount of furor raised over our supposedly good motives in going into Iraq and failing vs. the complete lack of discussion about going into Yugoslavia and succeeding.
“I need to beat my competitors” could be used as a bad excuse for taking unnecessary risks. But it is pretty important. Given that an AI you coded right now with your current incomplete knowledge of Friendliness theory is already more likely to be Friendly than that of some competitor who’s never really considered the matter, you only have an incentive to keep researching Friendliness until the last possible moment when you’re confident that you could still beat your competitors.
The question then becomes: what is the minimum necessary amount of Friendliness research at which point going full speed ahead has a better expected result than continuing your research? Since you’ve been researching for several years and sound like you don’t have any plans to stop until you’re absolutely satisfied, you must have a lot of contempt for all your competitors who are going full-speed ahead and could therefore be expected to beat you if any were your intellectual equals. I don’t know your competitors and I wouldn’t know enough AI to be able to judge them if I did, but I hope you’re right.
Given that full-scale nuclear war would either destroy the world or vastly reduce the number of living people, Petrov, Arkhipov, and all the other “heroic officer makes unlikely decision to avert nuclear war” stories Recovering Irrationalist describes above make a more convincing test case for the anthropic principle than an LHC breakdown or two.
Just realized that several sentences in my previous post make no sense because they assume Everett branches were separate before they actually split, but think the general point still holds.
Originally I was going to say yes to the last question, but after thinking over why a failure of the LHC now (before it would destroy Earth) doesn’t let me conclude anything by the anthropic principle, I’m going to say no.
Imagine a world in which CERN promises to fire the Large Hadron Collider one week after a major terrorist attack. Consider ten representative Everett branches. All those branches will be terrorist-free for the next few years except number 10, which is destined to suffer a major terrorist attack on January 1, 2009.
On December 31, 2008, Yvains 1 through 10 are perfectly happy, because they live in a world without terrorist attacks.
On January 2, 2009, Yvains 1 through 9 are perfectly happy, because they still live in worlds without terrorist attacks. Yvain 10 is terrified and distraught, both because he just barely escaped a terrorist attack the day before, and because he’s going to die in a few days when they fire the LHC.
On January 8, 2009, CERN fires the LHC, killing everyone in Everett branch 10.
Yvains 1 through 9 aren’t any better off than they would’ve been otherwise. Their universe was never destined to have a terrorist attack, and it still hasn’t had a terrorist attack. Nothing has changed.
Yvain 10 is worse off than he would have been otherwise. If not for the LHC, he would be recovering from a terrorist attack, which is bad but not apocalyptically so. Now he’s dead. There’s no sense in which his spirit has been averaged out over Yvains 1 through 9. He’s just plain dead. That can hardly be considered an improvement.
Since it doesn’t help anyone and it does kill a large number of people, I’d advise CERN against using LHC-powered anthropic tricks to “prevent” terrorism.
IMHO, the idea that wealth can’t usefully be measured is one which is not sufficiently worthwhile to merit further discussion.
The “wealth” idea sounds vulnerable to hidden complexity of wishes. Measure it in dollars and you get hyperinflation. Measure it in resources, and the AI cuts down all the trees and converts them to lumber, then kills all the animals and converts them to oil, even if technology had advanced beyond the point of needing either. Find some clever way to specify the value of all resources, convert them to products and allocate them to humans in the level humans want, and one of the products will be highly carcinogenic because the AI didn’t know humans don’t like that. The only way to get wealth in the way that’s meaningful to humans without humans losing other things they want more than wealth is for the AI to know exactly what we want as well or better than we do. And if it knows that, we can ignore wealth and just ask it to do what it knows we want.
“The counterargument is, in part, that some classifiers are better than others, even when all of them satisfy the training data completely. The most obvious criterion to use is the complexity of the classifier.”
I don’t think “better” is meaningful outside the context of a utility function. Complexity isn’t a utility function and it’s inadequate for this purpose. Which is better, tank vs. non-tank or cloudy vs. sunny? I can’t immediately see which is more complex than the other. And even if I could, I’d want my criteria to change depending on whether I’m in an anti-tank infantry or a solar power installation company, and just judging criteria by complexity doesn’t let me make that change, unless I’m misunderstanding what you mean by complexity here.
Meanwhile, reading the link to Bill Hibbard on the SL4 list:
“Your scenario of a system that is adequate for intelligence in its ability to rule the world, but absurdly inadequate for intelligence in its inability to distinguish a smiley face from a human, is inconsistent.”
I think the best possible summary of Overcoming Bias thus far would be “Abandon all thought processes even remotely related to the ones that generated this statement.”
I was one of the people who suggested the term h-right before. I’m not great with mathematical logic, and I followed the proof only with difficulty, but I think I understand it and I think my objections remain. I think Eliezer has a brilliant theory of morality and that it accords with all my personal beliefs, but I still don’t understand where it stops being relativist.
I agree that some human assumptions like induction and Occam’s Razor have to be used partly as their own justification. But an ultimate justification of a belief has to include a reason for choosing it out of a belief-space.
For example, after recursive justification hits bottom, I keep Occam and induction because I suspect they reflect the way the universe really works. I can’t prove it without using them. But we already know there are some things that are true but can’t be proven. I think one of those things is that reality really does work on inductive and Occamian principles. So I can choose these two beliefs out of belief-space by saying they correspond to reality.
Some other starting assumptions ground out differently. Clarence Darrow once said something like “I hate spinach, and I’m glad I hate it, because if I liked it I’d eat it, and I don’t want to eat it because I hate it.” He’s was making a mistake somewhere! If his belief is “spinach is bad”, it probably grounds out in some evolutionary reason like insufficient energy for the EEA. But that doesn’t justify his current statement “spinach is bad”. His real reason for saying “spinach is bad” is that he dislikes it. You can only choose “spinach is bad” out of belief-space based on Clarence Darrow’s opinions.
One possible definition of “absolute” vs. “relative”: a belief is absolutely true if people pick it out of belief-space based on correspondence to reality; if people pick it out of belief-space based on other considerations, it is true relative to those considerations.
“2+2=4” is absolutely true, because it’s true in the system PA, and I pick PA out of belief-space because it does better than, say, self-PA would in corresponding to arithmetic in the real world. “Carrots taste bad” is relatively true, because it’s true in the system “Yvain’s Opinions” and I pick “Yvain’s Opinions” out of belief-space only because I’m Yvain.
When Eliezer say X is “right”, he means X satisfies a certain complex calculation. That complex calculation is chosen out of all the possible complex-calculations in complex-calculation space because it’s the one that matches what humans believe.
This does, technically, create a theory of morality that doesn’t explicitly reference humans. Just like intelligent design theory doesn’t explicitly reference God or Christianity. But most people believe that intelligent design should be judged as a Christian theory, because being a Christian is the only reason anyone would ever select it out of belief-space. Likewise, Eliezer’s system of morality should be judged as a human morality, because being a human is the only reason anyone would ever select it out of belief-space.
That’s why I think Eliezer’s system is relative. I admit it’s not directly relative, in that Eliezer isn’t directly picking “Don’t murder” out of belief-space every time he wonders about murder, based only on human opinion. But if I understand correctly, he’s referring the question to another layer, and then basing that layer on human opinion.
An umpire whose procedure for making tough calls is “Do whatever benefits the Yankees” isn’t very fair. A second umpire whose procedure is “Always follow the rules in Rulebook X” and writes in Rulebook X “Do whatever benefits the Yankees” may be following a rulebook, but he is still just as far from objectivity as the last guy was.
I think the second umpire’s call is “correct” relative to Rulebook X, but I don’t think the call is absolutely correct.
...yeah, this was supposed to go in the new article, and I was just checking something in this one and accidentally posted it here. Please ignore embarrassed
I was one of the people who suggested the term h-right before. I’m not great with mathematical logic, and I followed the proof only with difficulty, but I think I understand it and I think my objections remain. I think Eliezer has a brilliant theory of morality and that it accords with all my personal beliefs, but I still don’t understand where it stops being relativist.
I agree that some human assumptions like induction and Occam’s Razor have to be used partly as their own justification. But an ultimate justification of a belief has to include a reason for choosing it out of a belief-space.
For example, after recursive justification hits bottom, I keep Occam and induction because I suspect they reflect the way the universe really works. I can’t prove it without using them. But we already know there are some things that are true but can’t be proven. I think one of those things is that reality really does work on inductive and Occamian principles. So I can choose these two beliefs out of belief-space by saying they correspond to reality.
Some other starting assumptions ground out differently. Clarence Darrow once said something like “I hate spinach, and I’m glad I hate it, because if I liked it I’d eat it, and I don’t want to eat it because I hate it.” He’s was making a mistake somewhere! If his belief is “spinach is bad”, it probably grounds out in some evolutionary reason like insufficient energy for the EEA. But that doesn’t justify his current statement “spinach is bad”. His real reason for saying “spinach is bad” is that he dislikes it. You can only choose “spinach is bad” out of belief-space based on Clarence Darrow’s opinions.
One possible definition of “absolute” vs. “relative”: a belief is absolutely true if people pick it out of belief-space based on correspondence to reality; if people pick it out of belief-space based on other considerations, it is true relative to those considerations.
“2+2=4” is absolutely true, because it’s true in the system PA, and I pick PA out of belief-space because it does better than, say, self-PA would in corresponding to arithmetic in the real world. “Carrots taste bad” is relatively true, because it’s true in the system “Yvain’s Opinions” and I pick “Yvain’s Opinions” out of belief-space only because I’m Yvain.
When Eliezer say X is “right”, he means X satisfies a certain complex calculation. That complex calculation is chosen out of all the possible complex-calculations in complex-calculation space because it’s the one that matches what humans believe.
This does, technically, create a theory of morality that doesn’t explicitly reference humans. Just like intelligent design theory doesn’t explicitly reference God or Christianity. But most people believe that intelligent design should be judged as a Christian theory, because being a Christian is the only reason anyone would ever select it out of belief-space. Likewise, Eliezer’s system of morality should be judged as a human morality, because being a human is the only reason anyone would ever select it out of belief-space.
That’s why I think Eliezer’s system is relative. I admit it’s not directly relative, in that Eliezer isn’t directly picking “Don’t murder” out of belief-space every time he wonders about murder, based only on human opinion. But if I understand correctly, he’s referring the question to another layer, and then basing that layer on human opinion.
An umpire whose procedure for making tough calls is “Do whatever benefits the Yankees” isn’t very fair. A second umpire whose procedure is “Always follow the rules in Rulebook X” and writes in Rulebook X “Do whatever benefits the Yankees” may be following a rulebook, but he is still just as far from objectivity as the last guy was.
I think the second umpire’s call is “correct” relative to Rulebook X, but I don’t think the call is absolutely correct.
To say that Eliezer is a moral relativist because he realizes that a primality sorter might care about primality rather than morality, is equivalent to calling him a primality relativist because he realizes that a human might care about morality rather than primality.
But by Eliezer’s standards, it’s impossible for anyone to be a relativist about anything.
Consider what Einstein means when he says time and space are relative. He doesn’t mean you can just say whatever you want about them, he means that they’re relative to a certain reference frame. An observer on Earth may think it’s five years since a spaceship launched, and an observer on the spaceship may think it’s only been one, and each of them is correct relative to their reference frame.
We could define “time” to mean “time as it passes on Earth, where the majority of humans live.” Then an observer on Earth is objectively correct to believe that five years have passed since the launch. An observer on the spaceship who said “One year has passed” would be wrong; he’d really mean “One s-year has passed.” Then we could say time and space weren’t really relative at all, and people on the ground and on the spaceship were just comparing time to s-time. The real answer to “How much time has passed” would be “Five years.”
Does that mean time isn’t really relative? Or does it just mean there’s a way to describe it that doesn’t use the word “relative”?
Or to give a more clearly wrong-headed example: English is objectively the easiest language in the world, if we accept that because the word “easy” is an English word it should refer to ease as English-speakers see it. When Kyousuke says Japanese is easier for him, he really means it’s mo wakariyasui translated as “j-easy”, which is completely different. By this way of talking, the standard belief that different languages are easier, relative to which one you grew up speaking, is false. English is just plain the easiest language.
Again, it’s just avoiding the word “relative” by talking in a confusing and unnatural way. And I don’t see the difference between talking about “easy” vs. “j-easy” and talking about “right” vs. “p-right”.
Why “ought” vs. “p-ought” instead of “h-ought” vs. “p-ought”?
Sure, it might just be terminology. But change
“So which of these two perspectives do I choose? The human one, of course; not because it is the human one, but because it is right.”
to
“So which of these two perspectives do I choose? The human one, of course; not because it is the human one, but because it is h-right.”
and the difference between “because it is the human one” and “because it is h-right” sounds a lot less convincing.
But that’s clearly not true, except in the sense that it’s “arbitrary” to prefer life over death. It’s a pretty safe generalization that actions which are considered to be immoral are those which are considered to be likely to cause harm to others.
From an reproductive fitness point of view, or a what-humans-prefer point of view, there’s nothing at all arbitrary about morality. Yes, it does mostly contain things that avoid harm. But from an objective point of view, “avoid harm” or “increase reproductive fitness” is as arbitrary as “make paperclips” or “pile pebbles in prime numbered heaps”.
Not that there’s anything wrong with that. I still would prefer living in a utopia of freedom and prosperity to being converted to paperclips, as does probably everyone else in the human race. It’s just not written into the fabric of the universe that I SHOULD prefer that, or provable by an AI that doesn’t already know that.
Things I get from this:
Things decided by our moral system are not relative, arbitrary or meaningless, any more than it’s relative, arbitrary or meaningless to say “X is a prime number”
Which moral system the human race uses is relative, arbitrary, and meaningless, just as there’s no reason for the pebble sorters to like prime numbers instead of composite numbers, perfect numbers, or even numbers.
A smart AI could follow our moral system as well or better than we ourselves can, just as the Pebble-Sorters’ AI can hopefully discover that they’re using prime numbers and thus settle the 1957 question once and for all.
But it would have to “want” to first. If the Pebble-Sorters just build an AI and say “Do whatever seems right to you”, it won’t start making prime-numbered heaps, unless an AI made by us humans and set to “Do whatever seems right to you” would also start making prime-numbered pebble-heaps. More likely, a Pebble-Sorter AI set do “Do whatever seems right to you” would sit there inertly, or fail spectacularly.
So the Pebble-Sorters would be best off using something like CEV.
This is something that’s bothered me a lot about the free market. Many people, often including myself, believe that a bunch of companies which are profit-maximizers (plus some simple laws against use of force) will cause “nice” results. These people believe the effect is so strong that no possible policy directly aimed at niceness will succeed as well as the profit-maximization strategy does. There seems to be a lot of evidence for this. But it also seems too easy, as if you could take ten paper-clip maximizers competing to convert things into differently colored paperclips, and ended out with utopia. It must have something to do with capitalism including a term for the human utility function in the form of demand, but it still seems miraculous.
No, I still think there’s a difference, although the omnipotence suggestion might have been an overly hasty way of explaining it. One side has moving parts, the other is just a big lump of magic.
When a statement is meaningful, we can think of an experiment that confirms it such that the experiment is also built out of meaningful statements. For example, my experiment to confirm the cake-in-the-sun is for a person on August 1 to go to the center of the sun, and see if it tastes delicious. So, IF Y is in the center of the sun, AND IF Y is there on August 1, AND IF Y perceives a sensation of deliciousness, THEN the cake-in-the-sun theory is true.
Most reasonable people will agree that “Today is August 1st” is meaningful, “This is the center of the sun” is meaningful, and “That’s delicious!” is meaningful, so from those values we can calculate a meaningful value for “There’s a cake in the center of the sun August 1st”. If someone didn’t believe that “Today is August 1st” is meaningful, we could verify it by saying “IF the calendar says ‘August 1’, THEN it is August 1st” in which we specify a way of testing that. If someone doesn’t even agree that “The calendar says ‘August 1’” is meaningful, we reduce it to “IF your sensory experience includes an image of a calendar with the page set to August 1st, THEN the calendar says ‘August 1’.” In this way, the cake-in-the-sun theory gets reduced to direct sensory experience.
To determine the truth value of the uncle statement, I need to see if the Absolute has an uncle. Mmmkay. So. I’ll just go and....hmmmm.
If you admit that direct sensory experience is meaningful, and that statements composed of operations on meaningful statements are also meaningful, then the cake-in-the-sun theory is meaningful and the uncle theory isn’t.
(I do believe that questions about the existence of an afterlife are meaningful. If I wake up an hour after dying and find myself in a lake of fire surrounded by red-skinned guys with pointy pitchforks, that’s going to concentrate my probability mass on the afterlife question pretty densely to one side.)
There are different shades of positivism, and I think at least some positivists are willing to say any statement for which there is a decision procedure even possible in principle for an omnipotent being is meaningful.
Under this interpretation, as Doug S. says, the omnipotent being can travel back in time, withstand the heat of the sun, and check the status of the cake. The omnipotent being could also teleport to the spaceship past the cosmological horizon and see if it’s still there or not.
However, an omnipotent being still wouldn’t have a decision procedure with which to evaluate whether Shakespeare’s works show signs of post-colonial alienation (although closely related questions like whether Shakespeare meant for his plays to reflect alienation could be solved by going back in time and asking him).
This sort of positivism, I think, gets the word “meaningful” exactly right.
- Nov 11, 2010, 9:57 PM; 2 points) 's comment on The Strong Occam’s Razor by (
Wow. And this is the sort of thing you write when you’re busy...
I’ve enjoyed these past few posts, but the part I’ve found most interesting are the attempts at evolutionary psychology-based explanations for things, like teenage rebellion and now flowers. Are these your own ideas, or have you taken them from some other source where they’re backed up by further research? If the latter, can you tell me what the source is? I would love to read more of them (I’ve already read “Moral Animal”, but most of these are still new to me).
If one defines morality in a utilitarian way, in which a moral person is one who tries for the greatest possible utility of everyone in the world, that sidesteps McCarthy’s complaint. In that case, the apex of moral progress is also, by definition, the world in which people are happiest on average.
It’s easy to view moral progress up to this point as progress towards that ideal. Ending slavery increases ex-slaves’ utility, hopefully less than it hurts ex-slaveowners. Ending cat-burning increases cats’ utility, hopefully less than it hurts that of cat-burning fans.
I guess you could argue this has a hidden bias—that 19th century-ers claimed that keeping slavery was helping slaveowners more than it was hurting slaves, and that we really are in a random walk that we’re justifying by fudging terms in the utility function in order to look good. But you could equally well argue that real moral progress means computing the utilities more accurately.
Since utility is by definition a Good Thing, it’s less vulnerable to the Open Question argument than some other things, though I wouldn’t know how to put that formally.
This is a beautiful comment thread. Too rarely do I get to hear anything at all about people’s inner lives, so too much of my theory of mind is generalizations from one example.
For example, I would never have guessed any of this about reflectivity. Before reading this post, I didn’t think there was such a thing as people who hadn’t “crossed the Rubicon”, except young children. I guess I was completely wrong.
Either I feel reflective but there’s higher level of reflectivity I haven’t reached and can’t even imagine (which I consider unlikely but am including for purposes of fake humility), I’m misunderstanding what is meant by this post, or I’ve just always been reflective as far back as I can remember (6? 7?).
The only explanation I can give for that is that I’ve always had pretty bad obsessive-compulsive disorder which takes the form of completely irrational and inexplicable compulsions to do random things. It was really, really easy to identify those as “external” portions of my brain pestering me, so I could’ve just gotten in the habit of believing that about other things.
As for the original article, it would be easier to parse if I’d ever heard a good reduction of “I”. Godel Escher Bach was brilliant, funny, and fascinating, but for me at least didn’t dissolve this question.