On a more positive note, Holden and Dario’s research on this issue gave me a much better understanding about how the regression would work and how it could suggest that GWWC should be putting more emphasis on evaluating individual charities, relative to relying on (appropriately adjusted) cost-effectiveness estimates.
Nick_Beckstead
Besides, intuitively if you were really offered with perfect confidence a bet with a 99.99...90% chance of winning you a dollar but if you lose all the souls in the world will burn in hell forever do you really think it would be a worthwhile bet to take?
While I genuinely appreciate how implausible it is to consider that bet worthwhile, there is a flipside. See the The LIfespan Dilemma. The point is that people have inconsistent preferences here, so it is easy to find intuitive counterexamples to any position one could take. I find the bounded approach to be the least of all evils.
What is bad about Pascal’s mugging isn’t the existence of arbitrarily bad outcomes, it’s the existence of arbitrarily bad expectations. What we really want is that the integral of utility relative to the probability measure is never infinite.
It is actually pretty hard to achieve this. Consider a sequence of lives L1, L2, etc. such that (i) the temporary quality of each life increases over time in each life, and increases at the same rate, (ii) the duration of the lives in the sequence is increasing and approaching infinity, (iii) the utility of each lives in the sequence is approaching infinity. And now consider an infinitely long life L whose temporary quality increases at the same rate as L1, L2, etc. Plausibly, L is better than L1, L2, etc. This means L has a value which exceeds any finite value, and it means you will take any chance of L, no matter how small, to any of the Li, no matter how certain it would be. When choosing among different available actions which may lead to some of these lives, you would, in practice, consider only the chance of getting an infinitely long life, turning to other considerations only to break ties. Upshot: weak background assumptions + unbounded utility function --> obsessing over infinities, neglecting finite considerations except to break ties.
To be clear, the background assumptions are: that L, L1, L2, etc. are alternatives you could reasonably have non-zero probability in; that L is better than L1, L2, etc.; normal axioms of decision theory.
(You may also consider a variation of this argument involving histories of human civilization, rather than lives.)
Dramatic deconversion here.
What is your p that I can’t find a charity that (1) gives money to people in Africa and (2) the majority of the donated funds doesn’t go to warlords? How about this one: http://givedirectly.org/?
Most philosophers who are non-consequentialists are pluralists that think consequences are very important, so they can still use standard arguments to support the idea that reducing x-risk is important. A lot of non-consequentialism is about moral prohibitions rather than prescriptions, so I suspect most of it would have little to say about altruistic considerations. And of course a lot of it is loose, vague, and indeterminate, so it would be hard to draw out any comparative claims anyway.
Biases from Wikipedia’s list of cognitive biases. Cue: example of the bias; Response: name of the bias, pattern of reasoning of the bias, normative model violated by the bias.
Edit: put this on the wrong page accidentally.
Biases from Wikipedia’s list of cognitive biases. Cue: example of the bias; Response: name of the bias, pattern of reasoning of the bias, normative model violated by the bias.
What’s the evidence that this is the “leading theory of choice in the human brain”? (I am not saying I have evidence that it isn’t, but it’s important for this post that some large relevant section of the scientific community thinks this theory is awesome.)
This is not related to the main topic of the post, but here is a nitpick:
As Michael Vassar says, “Evidence that people are crazy is evidence that things are easier than you think.”
Evidence that people are crazy is also evidence that you are crazy. So for this to work, we need to have ways of avoiding craziness that others lack. (Without such confidence, I fear the persuasiveness of this thought can be chalked up to the tendency to think that others are more affected by biases and such than oneself.)
While I have sympathy with the complaint that SI’s critics are inarticulate and often say wrong things, Eliezer’s comment does seem to be indicative of the mistake Holden and Wei Dai are describing. Most extant presentations of SIAI’s views leave much to be desired in terms of clarity, completeness, concision, accessibility, and credibility signals. This makes it harder to make high quality objections. I think it would be more appropriate to react to poor critical engagement more along the lines of “We haven’t gotten great critics. That probably means that we need to work on our arguments and their presentation,” and less along the lines of “We haven’t gotten great critics. That probably means that there’s something wrong with the rest of the world.”
- A Scholarly AI Risk Wiki by 25 May 2012 20:53 UTC; 28 points) (
- 11 May 2012 23:11 UTC; 26 points) 's comment on Thoughts on the Singularity Institute (SI) by (
- 11 May 2012 8:19 UTC; 7 points) 's comment on Thoughts on the Singularity Institute (SI) by (
In fairness I should add that I think Luke M agrees with this assessment and is working on improving these arguments/communications.
I don’t think anyone will be able to. Here is my attempt at a more precise definition than what we have on the table:
An agent models the world and selects actions in a way that depends on what its modeling says will happen if it selects a given action.
A tool may model the world, and may select actions depending on its modeling, but may not select actions in a way that depends on what its modeling says will happen if it selects a given action.
A consequence of this definition is that some very simple AIs that can be thought of as “doing something,” such as some very simple checkers programs or a program that waters your plants if and only if its model says it didn’t rain, would count as tools rather than agents. I think that is a helpful way of carving things up.
- 12 Jun 2012 17:43 UTC; 2 points) 's comment on Reply to Holden on ‘Tool AI’ by (
I would be interested to see if you could link to posts where you made versions of these objections.
This link
Values vs. parameters: Eliezer has suggested using…
is broken.
There are two ways to read Holden’s claim about what happens if 100 experts check the proposed FAI safety proof. On one reading, Holden is saying that if 100 experts check it and say, “Yes, I am highly confident that this is in fact safe,” then activating the AI kills us all with 90% probability. On the other reading, Holden is saying that even if 100 experts do their best to find errors and say, “No, I couldn’t identify any way in which this will kill us, though that doesn’t mean it won’t kill us,” then activating the AI kills us all with 90% probability. I think the first reading is very implausible. I don’t believe the second reading, but I don’t think it’s obviously wrong. I think the second reading is the more charitable and relevant one.
How much of this is counting toward the 50,000 words of authorized responses?
Eliezer argued that looking at modern software does not support Holden’s claim that powerful tool AI is likely to come before dangerous agent AI. I’m not sure I think the examples he gave support his claim, especially if we broaden the “tool” concept in a way that seems consistent with Holden’s arguments. I’m not to sure about this, but I would like to hear reactions.
Eliezer:
At one point in his conversation with Tallinn, Holden argues that AI will inevitably be developed along planning-Oracle lines, because making suggestions to humans is the natural course that most software takes. Searching for counterexamples instead of positive examples makes it clear that most lines of code don’t do this. Your computer, when it reallocates RAM, doesn’t pop up a button asking you if it’s okay to reallocate RAM in such-and-such a fashion. Your car doesn’t pop up a suggestion when it wants to change the fuel mix or apply dynamic stability control. Factory robots don’t operate as human-worn bracelets whose blinking lights suggest motion. High-frequency trading programs execute stock orders on a microsecond timescale.
Whether this kind of software counts as agent-like software or tool software depends on what we mean by “tool.” Holden glosses the distinction as follows:
In short, Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.
Defined in this way, it seems that most of this software is neither agent-like software nor tool software. I suggested an alternative definition in another comment:
An agent models the world and selects actions in a way that depends on what its modeling says will happen if it selects a given action. A tool may model the world, and may select actions depending on its modeling, but may not select actions in a way that depends on what its modeling says will happen if it selects a given action.
In this sense, I think all of Eliezer’s examples of software is tool-like rather than agent-like (qualification: I don’t know enough about the high-frequency trading stuff to say whether this is true there as well). I don’t see these examples as strong support for the view that agent-like AGI is the default outcome.
More Eliezer:
Software that does happen to interface with humans is selectively visible and salient to humans, especially the tiny part of the software that does the interfacing; but this is a special case of a general cost/benefit tradeoff which, more often than not, turns out to swing the other way, because human advice is either too costly or doesn’t provide enough benefit. Modern AI programmers are generally more interested in e.g. pushing the technological envelope to allow self-driving cars than to “just” do Google Maps.
It’s clearly right that software does a lot of things without getting explicit human approval, and there are control/efficiency tradeoffs that explain why this is so. However, I suspect that the self-driving cars are also not agents in Holden’s definition, or the one I proposed, and don’t give a lot of support to the view that AGI will be agent-like. All this should be taken since a grain of salt since I don’t too much about these cars. But I’m imagining these cars work by having a human select a place to go to, and then displaying a route, having the human accept the route, and then following a narrow set of rules to get the human there (e.g., stop if there’s a red light such and such distance in front of you, brake if there’s an object meeting such and such characteristics in your trajectory, etc.). I think the crucial thing here is the step where the human gets a helpful summary and then approves. That seems to fit my expansion of the “tool” concept, and seems to fit Holden’s picture in the most important way: this car isn’t going to do anything too crazy without our permission.
However, I can see an argument that advanced versions of this software would be changed to be more agent-like, in order to handle cases where the software has to decide what to do in split second situations that couldn’t have easily been described in advance, such as whether to make some emergency maneuver to avoid an infrequent sort of collision. Perhaps examples of this kind would become more abundant if we thought about it; high frequency trading sounds like a good potential case for this.
Branches of AI that invoke human aid, like hybrid chess-playing algorithms designed to incorporate human advice, are a field of study; but they’re the exception rather than the rule, and occur primarily where AIs can’t yet do something humans do, e.g. humans acting as oracles for theorem-provers, where the humans suggest a route to a proof and the AI actually follows that route.
Quick thought: If it’s hard to get AGIs to generate plans that people like, then it would seem that AGIs fall into this exception class, since in that case humans can do a better job of telling whether they like a given plan.
For context, I pointed this out because it looks like Eliezer is going for the first reading and criticizing that.
Please note that AIXI with outputs connected only to a monitor seems like an instance of the Tool AI.
As I read Holden, and on my proposed way of making “agent” precise, this would be an agent rather than a tool. The crucial thing is that this version of AIXI selects actions on the basis of how well they serve certain goals without user approval. If you had a variation on AIXI that identified the action that would maximize a utility function and displayed the action to a user (where the method of display was not done in an open-ended goal-directed way), that would count as a tool.
Here’s what I think is true and important about this post: some people will try to explicitly estimate expected values in ways that don’t track the real expected values, and when they do this, they’ll make bad decisions. We should avoid these mistakes, which may be easy to fall into, and we can avoid some of them by using regressions of the kind described above in the case of charity cost-effectiveness estimates. As Toby points out, this is common ground between GiveWell and GWWC. Let me list a what I take to be a few points of disagreement.
I think that after making an appropriate attempt to gather evidence, the result of doing the best expected value calculation that you can is by far the most important input into a large scale philanthropic decision. We should think about the result of the calculation makes sense, we should worry if it is wildly counterintuitive, and we should try hard to avoid mistakes. But the result of this calculation will matter more than most kinds of informal reasoning, especially if the differences in expected value are great. I think this will be true for people who are competent with thinking in terms of subjective probabilities and expected values, which will rule out a lot of people, but will include a lot of the people who would consider whether to make important philanthropic decisions on the basic of expected value calculations.
I think this argument unfairly tangles up making decisions explicitly on the basis of expected value calculations with Pascal’s Mugging. It’s not too hard to choose a bounded utility function that doesn’t tell you to pay the mugger, and there are independent (though not clearly decisive) reasons to use a bounded utility function for decision-making, even when the probabilities are stable. Since the unbounded utility function assumption can shoulder the blame, the invocation of Pascal’s Mugging doesn’t seem all that telling. (Also, for reasons Wei Dai gestures at I don’t accept Holden’s conjecture that making regression adjustments will get us out of the Pascal’s Mugging problem, even if we have unbounded utility functions.)
Though I agree that intuition can be a valuable tool when trying to sanity check an expected value calculation, I am hesitant to rely too heavily on it. Things like scope insensitivity and ambiguity aversion could easily make me unreasonably queasy about relying a perfectly reasonable expected value calculation.
Finally, I classify several of the arguments in this post as “perfect world” arguments because they involve thinking a lot about what would happen if everyone behaved in a certain kind of way. I don’t want to rest too much weight on these arguments because my behavior doesn’t causally or acausally affect the way enough people would behave in order for these arguments to be directly relevant to my decisions. Even if I accepted perfect world arguments, some of these arguments appear not to work. For example, if all donors were rational altruists, and that was common knowledge, then charities that were effective would have a strong incentive to provide evidence of their effectiveness. If some charity refused to share information, that would be very strong evidence that the charity was not effective. So it doesn’t seem to be true, as Holden claims, that if everyone was totally reliant on explicit expected value calculations, we’d all give to charities about which we have very little information. (Deciding not to be totally transparent is not such good evidence now, since donors are far from being rational altruists.)
Though I have expressed mostly disagreement, I think Holden’s post is very good and I’m glad that he made it.