Vladimir, I understand the PD and similar cases. I’m just saying that the Newcomb paradox is not actually a member of that class. Any agent faced with either version—being told ahead of time that they will face the Predictor, or being told only once the boxes are on the ground—has a simple choice to make; there’s no paradox and no PD-like situation. It’s a puzzle only if you believe that there really is backwards causality.
Phil_Goetz5
“You speculate about why Eurisko slowed to a halt and then complain that Lenat has wasted his life with CYC, but you ignore that Lenat has his own theory which he gives as the reason he’s been pursuing CYC. You should at least explain why you think his theory wrong; I find his theory quite plausible.”
Around 1990, Lenat predicted that Cyc would go FOOM by 2000. In 1999, he told me he expected it to go FOOM within a couple of years. Where’s the FOOM?
Cyc has no cognitive architecture. It’s a database. You can ask it questions. It has templates for answering specific types of questions. It has (last I checked, about 10 years ago) no notion of goals, actions, plans, learning, or its own agenthood.
If I want to predict that the next growth curve will be an exponential and put bounds around its doubling time, I need a much finer fit to the data than if I only want to ask obvious questions like...”Do the optimization curves fall into the narrow range that would permit a smooth soft takeoff?”
This implies that you have done some quantitative analysis giving a probability distribution of possible optimization curves, and finding that only a low-probability subset of that distribution allows for soft takeoff.Presenting that analysis would be an excellent place to start.
Note for readers: I’m not responding to Phil Goetz and Jef Allbright. And you shouldn’t infer my positions from what they seem to be arguing with me about—just pretend they’re addressing someone else.
Is that on this specific question, or a blanket “I never respond to Phil or Jef” policy?Huh. That doesn’t feel very nice.
Nor very rational, if one’s goal is to communicate.
All the discussion so far indicates that Eliezer’s AI will definitely kill me, and some others posting here, as soon as he turns it on.
It seems likely, if it follows Eliezer’s reasoning, that it will kill anyone who is overly intelligent. Say, the top 50,000,000 or so.
(Perhaps a special exception will be made for Eliezer.)
Hey, Eliezer, I’m working in bioinformatics now, okay? Spare me!
Eliezer: If you create a friendly AI, do you think it will shortly thereafter kill you? If not, why not?
He may have some model of an AI as a perfect Bayesian reasoner that he uses to justify neglecting this. I am immediately suspicious of any argument invoking perfection.
It may also be that what Eliezer has in mind is that any heuristic that can be represented to the AI, could be assigned priors and incorporated into Bayesian reasoning.Eliezer has read Judea Pearl, so he knows how computational time for Bayesian networks scales with the domain, particularly if you don’t ever assume independence when it is not justified, so I won’t lecture him on that. But he may want to lecture himself.
(Constructing the right Bayesian network from sense-data is even more computationally demanding. Of course, if you never assume independence, then the only right network is the fully-connected one. I’m pretty certain that suggesting that a non-narrow AI will be reasoning over all of its knowledge with a fully-connected Bayesian network is computationally implausible. So all arguments that require AIs to be perfect Bayesian reasoners are invalid.)
I’d like to know how much of what Eliezer says depends on the AI using Bayesian logic as its only reasoning mechanism, and whether he believes that is the best reasoning mechanism in all cases, or only one that must be used in order to keep the AI friendly.
Kaj: I will restate my earlier question this way: “Would AIs also find themselves in circumstances such that game theory dictates that they act corruptly?” It doesn’t matter whether we say that the behavior evolved from accumulated mutations, or whether an AI reasoned it out in a millisecond. The problem is still there, if circumstances give corrupt behavior an advantage.
Good point, Jef—Eliezer is attributing the validity of “the ends don’t justify the means” entirely to human fallibility, and neglecting that part accounted for by the unpredictability of the outcome.
He may have some model of an AI as a perfect Bayesian reasoner that he uses to justify neglecting this. I am immediately suspicious of any argument invoking perfection.
I don’t know what “a model of evolving values increasingly coherent over increasing context, with effect over increasing scope of consequences” means.
The tendency to be corrupted by power is a specific biological adaptation, supported by specific cognitive circuits, built into us by our genes for a clear evolutionary reason. It wouldn’t spontaneously appear in the code of a Friendly AI any more than its transistors would start to bleed.
This is critical to your point. But you haven’t established this at all. You made one post with a just-so story about males in tribes perceiving those above them as corrupt, and then assumed, with no logical justification that I can recall, that this meant that those above them actually are corrupt. You haven’t defined what corrupt means, either.I think you need to sit down and spell out what ‘corrupt’ means, and then Think Really Hard about whether those in power actually are more corrupt than those not in power;and if so, whether the mechanisms that lead to that result are a result of the peculiar evolutionary history of humans, or of general game-theoretic / evolutionary mechanisms that would apply equally to competing AIs.
You might argue that if you have one Sysop AI, it isn’t subject to evolutionary forces. This may be true. But if that’s what you’re counting on, it’s very important for you to make that explicit. I think that, as your post stands, you may be attributing qualities to Friendly AIs, that apply only to Solitary Friendly AIs that are in complete control of the world.
Eliezer: I don’t get your altruism. Why not grab the crown? All things being equal, a future where you get to control things is preferable to a future where you don’t, regardless of your inclinations. Even if altruistic goals are important to you, it would seem like you’d have better chances of achieving them if you had more power. … If all people, including yourself, become corrupt when given power, then why shouldn’t you seize power for yourself? On average, you’d be no worse than anyone else, and probably at least somewhat better; there should be some correlation between knowing that power corrupts and not being corrupted. … Benevolence itself is a trap. The wise treat men as straw dogs; to lead men, you must turn your back on them.
These are all Very Bad Things to say to someone who wants to construct the first AI.Do we know that not-yet-powerful Stalin would have disagreed (internally) with a statement like “preserving Communism is worth the sacrifice of sending a lot of political opponents to gulags”?
Let’s think about the Russian revolution. You have 3 people, arrayed in order of increasing corruption before coming to power: Trotsky, Lenin, Stalin. Lenin was nasty enough to oust Trotsky. Stalin was nasty enough to dispose of everybody who was a threat to him. Steven’s point is good—that these people were all pre-corrupted—but we also see the corrupt rise to the top.
In the Cuban revolution, Fidel was probably more corrupt than Che from the start. I imagine Fidel would likely have had Che killed, if he in fact didn’t.
So we now have 4 hypotheses:
Males are inclined to perceive those presently in power as corrupt. (Eliezer)
People are corrupted by power.
People are corrupt. (Steven)
Power selects for people who are corrupt.
How can we select from among these?
I’m unclear whether you’re saying that we perceive those in power to be corrupt, or that they actually are corrupt. The beginning focuses on the former; the second half, on the latter.
The idea that we have evolved to perceive those in power over us as being corrupt faces the objection that the statement, “Power corrupts”, is usually made upon observing all known history, not just the present.
Has Eliezer explained somewhere (hopefully on a web page) why he doesn’t want to post a transcript of a successful AI-box experiment?
Have the successes relied on a meta-approach, such as saying, “If you let me out of the box in this experiment, it will make people take the dangers of AI more seriously and possibly save all of humanity; whereas if you don’t, you may doom us all”?
David—Yes, a human-level AI could be very useful. Politics and economics alone would benefit greatly from the simulations you could run.
(Of course, all of us but manual laborers would soon be out of a job.)
could you elaborate on the psychology of mythical creatures? That some creatures are “spiritual” sounds to me like a plausible distinction. I count vampires, but not unicorns. To me, a unicorn is just another chimera. Why do you think they’re more special than mermaids? magic powers? How much of a consensus do you think exists?
Sorry I missed this!I think it may have to do with how heavy a load of symbolism the creature carries. Unicorns were used a lot to symbolize purity, and acquired magical and non-magical properties appropriate to that symbolism. Dragons, vampires, and werewolves are also used symbolically. Mermaids, basilisks, not so much. Centaurs have lost their symbolism (a Greek Apollo/Dionysus dual-nature-of-man thing, I think), and CS Lewis did much to destroy the symbolism associated with fauns by making them nice chaps who like tea and dancing.
Now that I think about it, Lewis and Tolkien both wrote fantasy that was very literal-minded, and replaced symbolism with allegory.
Thousands of years ago, philosophers began working on “impossible” problems. Science began when some of them gave up working on the “impossible” problems, and decided to work on problems that they had some chance of solving. And it turned out that this approach eventually lead to the solution of most of the “impossible” problems.
Eliezer,
If you tried to approximate The Rules because they were too computationally expensive to use directly, then, no matter how necessary that compromise might be, you would still end doing less than optimal.
You say that like it’s a bad thing. Your statement implies that something that is “necessary” is not necessary. Just this morning I gave a presentation on the use of Bayesian methods for automatically predicting the functions of newly sequenced genes. The authors of the method I presented used the approximation P(A, B, C) ~ P(A) x P(B|A) x P(C|A) because it would have been difficult to compute P(C | B, A), and they didn’t think B and C were correlated. Your statement condemns them as “less than optimal”. But a sub-optimal answer you can compute is better than an optimal answer that you can’t.Do only that which you must do, and which you cannot do in any other way.
I am willing to entertain the notion that this is not utter foolishness, if you can provide us with some examples—say, ten or twenty—of scientists who had success using this approach. I would be surprised if the ratio of important non-mathematical discoveries made by following this maxim, to those made by violating it, was greater than .05. Even mathematicians often have many possible ways of approaching their problems.
David,
Building an AGI and setting it at “human level” would be of limited value. Setting it at “human level” plus epsilon could be dangerous. Humans on their own are intelligent enough to develop dangerous technologies with existential risk. (Which prompts the question: Are we safer with AI, or without AI?)
If the probability of AI (or grey goo, or some other exotic risk) existential risks were low enough (neglecting the creation of hell-worlds with negative utility), then you could neglect in favor of those other risks.
Asteroids don’t lead to a scenario in which a paper-clipping AI takes over the entire light-cone and turns it into paper clips, preventing any interesting life from ever arising anywhere, so they aren’t quite comparable.Still, your point only makes me wonder how we can justify not devoting 10% of GDP to deflecting asteroids. You say that we don’t need to put all resources into preventing unfriendly AI, because we have other things to prevent. But why do anything productive? How do you compare the utility of preventing possible annihilation to the utility of improvements in life? Why put any effort into any of the mundane things that we put almost all of our efforts into? (Particularly if happiness is based on the derivative of, rather than absolute, quality of life. You can’t really get happier, on average; but action can lead to destruction. Happiness is problematic as a value for transhumans.)
This sounds like a straw man, but it might not be. We might just not have reached (or acclimatized ourselves to) the complexity level at which the odds of self-annihilation should begin to dominate our actions. I suspect that the probability of self-annihilation increases with complexity. Rather like how the probability of an individual going mad may increase with their intelligence. (I don’t think that frogs go insane as easily as humans do, though it would be hard to be sure.) Depending how this scales, it could mean that life is inherently doomed. But that would result in a universe where we were unlikely to encounter other intelligent life… uh...
It doesn’t even need to scale that badly; if extinction events have a power law (they do), there are parameters for which a system can survive indefinitely, and very similar parameters for which it has a finite expected lifespan. Would be nice to know where we stand. The creation of AI is just one more point on this road of increasing complexity, which may lead inevitably to instability and destruction.
I suppose the only answer is to say that destruction is acceptable (and possibly inevitable); total area under the utility curve is what counts. Wanting an interesting world may be like deciding to smoke and drink and die young—and it may be the right decision. The AIs of the future may decide that dooming all life in the long run is worth it.
In short, the answer to “Eliezer’s wager” may be that we have an irrational bias against destroying the universe.
But then, deciding what are acceptable risk levels in the next century depends on knowing more about cosmology, the end of the universe, and the total amount of computation that the universe is capable of.
I think that solving aging would change people’s utility calculations in a way that would discount the future less, bringing them more in line with the “correct” utility computations.
Re. AI hell-worlds: SIAI should put “I have no mouth, and I must scream” by Harlan Ellison on its list of required reading.
We are entering into a Pascal’s Wager situation.
“Pascal’s wager” is the argument that you should be Christian, because if you compute the expected value of being a Christian vs. of being an atheist, then for any finite positive probability that Christianity is correct, that finite probability multiplied by (infinite +utility minus infinite -utility) outweights the other side of the equation.
The similar Yudkowsky wager is the argument that you should be an FAIer, because the negative utility of destroying the universe outweighs the other side of the equation, whatever the probabilities are. It is not exactly analogous, unless you believe that the universe can support infinite computation (if it isn’t destroyed), because the negative utility isn’t actually infinite.
I feel that Pascal’s wager is not a valid argument, but have a hard time articulating a response.
I’ve seen too many cases of overfitting data to trust the second theory. Trust the validated one more.
The question would be more interesting if we said that the original theory accounted for only some of the new data.
If you know a lot about the space of possible theories and “possible” experimental outcomes, you could try to compute which theory to trust, using (surprise) Bayes’ law. If it were the case that the first theory applied to only 9 of the 10 new cases, you might find parameters such that you should trust the new theory more.
In the given case, I don’t think there is any way to deduce that you should trust the 2nd theory more, unless you have some a priori measure of a theory’s likelihood, such as its complexity.
It’s true that we don’t like to think people better-off than us might be better than us. But two caveats:
Just because the cream is concentrated at the top, doesn’t mean that most of the cream (or the best cream) is at the top.
Causation probably runs both ways on this one. There is a lot of evidence that richer and more-respected people are happier and healthier. Various explanations have been tried to explain this, including the explanation that health causes career success. That explanation turned out to have serious problems, although I can’t now remember what they are, other than that I heard them summarized in a talk from a SAGE (anti-aging) conference circa 2004, which I can no longer find any information via Google on because there is now a different organization called SAGE that holds conferences on LGBT aging that totally dominates Google search results.
I think that, if we could measure the degree to which a culture is able to promote based on merit, it would turn out to be a powerful economic indicator—particularly for knowledge-based economies.
- 3 Apr 2013 20:34 UTC; 38 points) 's comment on Open Thread, April 1-15, 2013 by (
“You didn’t know, but the predictor knew what you’ll do, and if you one-box, that is your property that predictor knew, and you’ll have your reward as a result.”
No. That makes sense only if you believe that causality can work backwards. It can’t.
“If predictor can verify that you’ll one-box (after you understand the rules of the game, yadda yadda), your property of one-boxing is communicated, and it’s all it takes.”
Your property of one-boxing can’t be communicated backwards in time.
We could get bogged down in discussions of free will; I am assuming free will exists, since arguing about the choice to make doesn’t make sense unless free will exists. Maybe the Predictor is always right. Maybe, in this imaginary universe, rationalists are screwed. I don’t care; I don’t claim that rationality is always the best policy in alternate universes where causality doesn’t hold and 2+2=5.
What if I’ve decided I’m going to choose based on a coin flip? Is the Predictor still going to be right? (If you say “yes”, then I’m not going to argue with you anymore on this topic; because that would be arguing about how to apply rules that work in this universe in a different universe.)