Wei Dai(Wei Dai)
Last I saw, you were only advocating cooperation in one-shot PD for two superintelligences that happen to know each other’s source code (http://lists.extropy.org/pipermail/extropy-chat/2008-May/043379.html). Are you now saying that human beings should also play cooperate in one-shot PD?
Does evolutionary psychology provide an explanation for hyperbolic discounting? I found one explanation at http://www.daviddfriedman.com/Academic/econ_and_evol_psych/economics_and_evol_psych.html#fnB27 but it doesn’t seem to apply to the example of preference reversal between sleeping early and staying up.
- 30 Mar 2009 12:32 UTC; 0 points) 's comment on Akrasia, hyperbolic discounting, and picoeconomics by (
Stuart, I think this paradox is not related to Counterfactual Mugging, but is purely Sleeping Beauty. Consider the following modification to your setup, which produces a decision problem with the same structure:
If the coin comes up heads, Omega does not simulate you, but simply asks you to give him £100. If you agree, he gives you back £360. If you don’t, no consequences ensue.
Things are actually a bit worse than this, because there is also no theorem that says there is only one valley, so there’s no guarantee that even after you climb out of this valley, your next step won’t cause you to go off a precipice.
BTW, there’s a very similar issue in economics, which goes under the name of the Theory of the Second Best. Markets will allocate resources efficiently if they are perfectly competitive and complete, but there is no guarantee that any incremental progress towards that state, such creating some markets that were previously missing, or making some markets more competitive, will improve social welfare.
- 27 Jul 2009 1:37 UTC; 0 points) 's comment on The Second Best by (
It’s not clear that reflective consistency is feasible for human beings.
Consider the following thought experiment. You’re about to be copied either once (with probability .99) or twice (with probability .01). After that, one of your two or three instances will be randomly selected to be the decision-maker. He will get to choose from the following options, without knowing how many copies were made:
A: The decision-maker will have a pleasant experience. The other(s) will have unpleasant experience(s).
B: The decision-maker will have an unpleasant experience. The other(s) will have pleasant experience(s).
Presumably, you’d like to commit your future self to pick option B. But without some sort of external commitment device, it’s hard to see how you can prevent your future self from picking option A.
- 7 Apr 2009 3:03 UTC; 2 points) 's comment on Newcomb’s Problem standard positions by (
I think your summary is understandable enough, but I don’t agree that observations should never change the optimal global solution or preference order on the global solutions, because observations can tell you which observer you are in the world, and different observers can have different utility functions. See my counter-example in a separate comment at http://lesswrong.com/lw/90/newcombs_problem_standard_positions/5u4#comments.
I structured my thought experiment that way specifically to avoid superrationality-type justifications for playing Cooperate in PD.
Among the four axioms used to derive the von Neumann-Morgenstern theorem, one stands out as not being axiomatic when applied to the aggregation of individual utilities into a social utility:
Axiom (Independence): Let A and B be two lotteries with A > B, and let t \in (0, 1] then tA + (1 − t)C > tB + (1 − t)C .
In terms of preferences over social outcomes, this axiom means that if you prefer A to B, then you must prefer A+C to B+C for all C, with A+C meaning adding another group of people with outcome C to outcome A.
It’s the social version of this axiom that implies “equity of utility, even among equals, has no utility”. To see that considerations of equity violates the social Axiom of Independence, suppose my u(outcome) = difference between the highest and lowest individual utilities in outcome. In other words, I prefer A to B as long as A has a smaller range of individual utilities than B, regardless of their averages. It should be easy to see that adding a person C to both A and B can cause A’s range to increase more than B’s, thereby reversing my preference between them.
- 1 Oct 2021 18:36 UTC; 4 points) 's comment on We need a new philosophy of progress by (
- 8 Apr 2009 17:50 UTC; 2 points) 's comment on Average utilitarianism must be correct? by (
- 8 Apr 2009 17:41 UTC; 1 point) 's comment on Average utilitarianism must be correct? by (
- 3 Feb 2013 12:24 UTC; -2 points) 's comment on Pinpointing Utility by (
- 14 Dec 2012 0:04 UTC; -2 points) 's comment on Why you must maximize expected utility by (
(a) Random inspections probably won’t work. It’s easy to have code/hardware that look innocent as individual parts, but together have the effect of being a backdoor. You won’t detect the backdoor unless you can see the entire system as a whole.
Tim Freeman’s “proof by construction” method is the only viable solution to the “prove your source code” problem that I’ve seen so far.
(b) is interesting, and seems to be a new idea. Have you written it up in more detail somewhere? If AIs stop verifying each other’s source code, won’t they want to modify their source code to play Defect again?
Scenario 2 seems to share some similarities with Rolf Nelson’s AI Deterrence Problem. You might want to check it out if you haven’t already.
This is a bit tangential, but perhaps a bounded rationalist should represent his beliefs by a family of probability functions, rather than by an approximate probability function. When he needs to make a decision, he can compute upper and lower bounds on the expected utilities of each choice, and then either make the decision based on the beliefs he has, or decide to seek out or recall further information if the upper and lower expected utilities point to different choices, and the bounds are too far apart compared to the cost of getting more information.
I found one decision theory that uses families of probability functions like this (page 35 of http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.1906), although the motivation is different. I wonder if such decision systems have been considered for the purpose of handling bounded rationality.
Many governments, including the US, are concerned right now that their computers have hardware backdoors, so the current lack of research results on this topic is not just due to lack of interest, but probably intrinsic difficulty. Even if provable hardware is physically possible and technically feasible in the future, there is likely a cost attached, for example running slower than non-provable hardware or using more resources.
Instead of confidently predicting that AIs will Cooperate in one-shot PD, wouldn’t it be more reasonable to say that this is a possibility, which may or may not occur, depending on the feasibility and economics of various future technologies?
This is a good point. I’ve clarified the example a bit to address this.
Suppose someone offered you a bet on P = NP. How should you go about deciding whether or not to take it?
Does it make sense to assign probabilities to unproved mathematical conjectures? If so, what is the probability of P = NP? How should we compute this probability, given what we know so far about the problem? If the answer is no, what is a rational way to deal with mathematical uncertainty?
- Towards a New Decision Theory by 13 Aug 2009 5:31 UTC; 83 points) (
- 2 Apr 2011 20:41 UTC; 4 points) 's comment on Where does uncertainty come from? by (
I suspect the answer is no, at least not the kind of formal languages that have been suggested so far. The problem is this: as soon as you define a formal language, I can say “the lexicographically first object which can’t be described in less than a million bits in your language”. Given the uniqueness of this object, why should it be a priori as unlikely as a random million-bit string?
Let’s say you’re the first person to work on P = NP. What then? (Assume that you have enough mathematical ability to produce most of the results on it that we have so far.)
My answer is yes, I’d probably do it, assuming the profit is large enough and the lie isn’t on a topic that I greatly care about the truth of. But if everyone did this, everyone will be worse off, since their bargaining advantages will cancel each other, but their false beliefs will continue to carry a cost. It’s like a prisoner’s dilemma game, except that it won’t be obvious who is cooperating and who is defecting, so cooperation (i.e. not self-modifying to believe strategic falsehoods) probably can’t be enforced by simple tit-for-tat.
We as a society will need to find a solution to this problem, if it ever becomes a serious one.
Suppose you test Fermat’s Last Theorem for n up to 10^10, and don’t find a counterexample. How much evidence does that give you for FLT being true? In other words, how do you compute P(a counterexample exists with n<=10^10 | FLT is false), since that’s what’s needed to do a Bayesian update with this inductive evidence? (Assume this is before the proof was found.)
I don’t dispute that mathematicians do seem to reason in ways that are similar to using probabilities, but I’d like to know where these “probabilities” are coming from and whether the reasoning process really is isomorphic to probability theory. What you call “heuristic” and “intuition” are the results of computations being done by the brains of mathematicians, and it would be nice to know what the algorithms are (or should be), but we don’t have them even in an idealized form.
Yes, and that’s sort of intentional. I was trying to come up with a mathematical model of an agent that can deal with uncomputable physics. The physics of our universe seems likely to be computable, but there is no a priori reason to assume that it must be. We may eventually discover a law of physics that’s not computable, or find out that we are in a simulation running inside a larger universe that has uncomputable physics. Agents using UTM-based priors can’t deal with these scenarios.
So I tried to find a “better”, i.e., more expressive, language for describing objects, but then realized that any fixed formal language has a similar problem. Here’s my current idea for solving this: make the language extensible instead of fixed. That is, define a base language, and a procedure for extending the language. Then, when the agent encounters some object that can’t be described concisely using his current language, he recursively extends it until a short description is possible. What the extension procedure should be is still unclear.
Unless there is some reason for the perception of competence to be systematically biased (can anyone think of a reason?), the only way to credibly feign incompetence is to be in situations where acting competently would benefit you, yet you act as if you’re not competent. And you have to do this in every such situation where your actions are observed.
Having to feign incompetence substantially reduces the benefits of being competent (depending on how often you’re observed), while the costs of becoming competent still has to be borne. As a positive theory, this explains why competence might not be as common as we’d otherwise expect.
As a normative theory, it suggests that if you expect to be in a truel-like situation, you should consider not becoming competent in the first place, or if the costs of becoming competent is already sunk, but you’re not yet known to be competent, then you should feign incompetence, by behaving incompetently whenever such behavior can be observed.