“I once received a letter from an eminent logician, Mrs. Christine Ladd-Franklin, saying that she was a solipsist, and was surprised that there were no others. Coming from a logician and a solipsist, her surprise surprised me.”
Bertrand Russell
“I once received a letter from an eminent logician, Mrs. Christine Ladd-Franklin, saying that she was a solipsist, and was surprised that there were no others. Coming from a logician and a solipsist, her surprise surprised me.”
Bertrand Russell
The rationalist checked his gun—a repeated, almost compulsive gesture before the coming battle, even though he already knew the weapon better than himself. He was running out of Burdensome detail bullets, but still had enough Causal Chain Specifics to pepper the Big Thingy with. He gingerly fingered the Non-Rehearsed Evidence hanging from his belt—he hoped he wouldn’t have to use them, they were dangerous and exploded all over the place.
Maybe this wouldn’t be so tough after all… There was no reason to suspect this Big Thingy would be a strong one, was there? Happiness mounting from this unprovable claim, he quickly swallowed a rational combat pill to keep it at bay. Reason returned, and he chanted the mantra against unreason: “I will not fear, I will not doubt, but I will not refuse to admire. When the refusal to admire is gone there will be nothing; only I will remain.” He readied his weapons...
And he hoped, above all else, that this Big Thingy had already been cut into manageable pieces. Because if it hadn’t, if it was huge and whole, then there was nothing for it: he’d have to deploy illegal BNPS (biased negative points searches), or even call down the big safe box...
I finished the research agenda on constructing a preference utility function for any given human, and presented the ideas to CHAI and MIRI. Woot!
Yep. Deontologies have useful… consequences.
I have to say, I find these criticisms a bit weak. Going through them:
III. FDT sometimes makes bizarre recommendations
I’d note that successfully navigating Parfit’s hitchhiker also involve violating “Guaranteed Payoffs”: you pay the driver at a time when there is no uncertainty, and where you get better utility from not doing so. So I don’t think Guaranteed Payoffs is that sound a principle.
Your bomb example is a bit underdefined, since the predictor is predicting your actions AND giving you the prediction. If the predictor is simulating you and asking “would you go left after reading a prediction that you are going right”, then you should go left; because, by the probabilities in the setup, you are almost certainly a simulation (this is kind of a “counterfactual Parfit hitchhiker” situation).
If the predictor doesn’t simulate you, and you KNOW they said to go right, you are in a slightly different situation, and you should go right. This is akin to waking up in the middle of the Parfit hitchhiker experiment, when the driver has already decided to save you, and deciding whether to pay them.
IV. FDT fails to get the answer Y&S want in most instances of the core example that’s supposed to motivate it
This section is incorrect, I think. In this variant, the contents of the boxes are determined not by your decision algorithm, but by your nationality. And of course two-boxing is the right decision in that situation!
the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.
But it does depend on things like this. There’s no point in one-boxing unless your one-boxing is connected with the predictor believing that you’d one-box. In a simulation, that’s the case; in some other situations where the predictor looks at your algorithm, that’s also the case. But if the predictor is predicting based on nationality, then you can freely two-box without changing the predictor’s prediction.
V. Implausible discontinuities
There’s nothing implausible about discontinuity in the optimal policy, even if the underlying data is continuous. If is the probability that we’re in a smoking lesion vs a Newcomb problem, then as changes from to , the expected utility of one-boxing falls and the expected utility of two-boxing rises. At some point, the optimal action will jump discontinuously from one to the other.
VI. FDT is deeply indeterminate
I agree FDT is indeterminate, but I don’t agree with your example. Your two calculators are clearly isomorphic, just as if we used a different numbering system for one versus the other. Talking about isomorphic algorithms avoids worrying about whether they’re the “same” algorithm.
And in general, it seems to me, there’s no fact of the matter about which algorithm a physical process is implementing in the absence of a particular interpretation of the inputs and outputs of that physical process.
Indeed. But since you and your simulation are isomorphic, you can look at what the consequences are of you outputting “two-box” while your simulation outputs “deux boites” (or “one-box” and “une boite”). And {one-box, une boite} is better than {two-box, deux boites}.
But why did I use those particular interpretations of me and my simulation’s physical processes? Because those interpretations are the ones relevant to the problem at hand. Me and my simulation will have a different weight, consume different amounts of power, are run at different times, and probably at different speeds. If those were relevant to the Newcomb problem, then the fact we are different becomes relevant. But since they aren’t, we can focus in on the core of the matter. (you can also consider the example of playing the prisoner’s dilemma against an almost-but-not-quite-identical copy of yourself).
Hey, thanks for posting this!
And I apologise—I seem to have again failed to communicate what we’re doing here :-(
“Get the AI to ask for labels on ambiguous data”
Having the AI ask is a minor aspect of our current methods, that I’ve repeatedly tried to de-emphasise (though it does turn it to have an unexpected connection with interpretability). What we’re trying to do is:
Get the AI to generate candidate extrapolations of its reward data, that include human-survivable candidates.
Select among these candidates to get a human-survivable ultimate reward functions.
Possible selection processes include being conservative (see here for how that might work: https://www.lesswrong.com/posts/PADPJ3xac5ogjEGwA/defeating-goodhart-and-the-closest-unblocked-strategy ), asking humans and then extrapolating the process of what human-answering should idealise to (some initial thoughts on this here: https://www.lesswrong.com/posts/BeeirdrMXCPYZwgfj/the-blue-minimising-robot-and-model-splintering), removing some of the candidates on syntactic ground (e.g. wireheading, which I’ve written quite a bit on how it might be syntactically defined). There are some other approaches we’ve been considering, but they’re currently under-developed.
But all those methods will fail if the AI can’t generate human-survivable extrapolations of its reward training data. That is what we are currently most focused on. And, given our current results on toy models and a recent literature review, my impression is that there has been almost no decent applicable research done in this area to date. Our current results on HappyFaces are a bit simplistic, but, depressingly, they seem to be the best in the world in reward-function-extrapolation (and not just for image classification) :-(
Based on my appreciation of the scientific method and my research into the weaknesses of models and experts, I take the median IPCC estimate as correct, but assume the uncertainties are greater than they claim. This is somewhat scary, as uncertainties cut in both directions, and moderate climate change is something we can cope with, but extreme climate change is very much worse.
I see by the karma bombing we can’t even ask.
It’s more that the post isn’t well written. It mentions omnipotence (for God), some thoughts that past philosophers had on then, and then rambles about things being difficult to conceive (without any definitions or decomposition of the problem) and then brings in Omega, with an example equivalent to “1) Assume Omega never lies, 2) Omega lies”.
Then when we get to the actual point, it’s simply “maybe the Newcomb problem is impossible”. With no real argument to back that up (and do bear in mind that if copying of intelligence is possible, then the Newcomb problem is certainly possible; and I’ve personally got a (slightly) better-than-chance record at predicting if people 1-box or 2-box on Newcomb-like problems, so a limited Omega certainly is possible).
You might be in one of those trial and errors...
Another form of argumentus interruptus is when the other suddenly weakens their claim, without acknowledging the weakening as a concession.
I fear this is something we’ll have to live with. I’ve won many, many arguments by whittling down the opponent’s position until there is nothing substantive left of it. At this point, the only thing I can do that will mess everything up is… to press them on this, and force them to acknowledge their ‘defeat’. Because defeat is how they will perceive it, and will fight back ferociously. You might ‘break’ some of them into a completely new way of thinking, but most likely you will simply undo all your hard work up till then.
Much better to just let them leave with their dignity intact, and with hopefully a better understanding that will precolate through their worldviews and come out a few weeks later in their own words. Think of it as… leaving them a line of retreat.
This is worth an entire post by itself. Cheers.
Your approach to AI seems to involve solving every issue perfectly (or very close to perfection). Do you see any future for more approximate, rough and ready approaches, or are these dangerous?
Are you searching for positive examples of positive bias right now, or sparing a fraction of your search on what positive bias should lead you to not see? Did you look toward light or darkness?
Your hypothesis is that positive biases are generally bad. It is thus my duty to try and disprove your idea, and see what emerges from the result.
Let’s take your example, but now the sequences are ten numbers long and the initial sequence is 2-4-6-10-12-14-16-18-20-22 (the rule is still the same). Picking a sequence at random from a given set of numbers, we have only one chance in 10! = 3628800 of coming up with one that obeys the rule. Someone following the approach you recommended would probably fist try one instance of “x,x+2,x+4...” or “x,2x,3x,...”, then start checking a few random sequences (getting “No” on each one, with near certainty). In this instance, disregarding positive bias doesn’t help (unless you do a really brutal amount of testing). This is not just an artifact of “long” sequences—had we stuck with the sequence of three numbers, but the rule was “all in ascending order, or one number above ten trillion”, then finding the right rule would be just as hard. What gives?
Even worse, suppose you started with two assumptions: 1) the sequence is x,2x,3x,4x,5x,… 10x 2) the sequence is x, x+2, x+4,… x+18
You do one or two (positive) tests of 1). They comes up “yes”. You then remember to try and disprove the hypothesis, try a hundred random sequences, coming up with “no” every time. You then accept 1).
However, had you just tried to do some positive testing of 1) and 2), you would very quickly have found out that something was wrong.
Analysis: Testing is indeed about trying to disprove a hypothesis, and gaining confidence when you fail. But your hypothesis covers uncountably many different cases, and you can test (positively or negatively) only a very few. Unless you have some grounds to assume that this is enough (such as the uniform time and space assumptions of modern science, or some sort of nice ordering or measure on the space of hypotheses or of observations), then neither positive nor negative testing are giving you much information.
However, if you have two competing hypothesis about the world, then a little testing is enough to tell which one is correct. This is the easiest way of making progress, and should always be considered.
Verdict: Awareness of positive bias causes us to think “I may be wrong, I should check”. The correct attitude in front of these sorts of problems is the subtly different “there may be other explanations for what I see, I should find them”. The two sentiments feel similar, but lead to very different ways of tackling the problem.
In your open thread inbox, less wrong comments have the options “context” and “report” (in that order), whereas private messages have “report” and “reply” (in that order). Many times I’ve accidentally pressed “report” on a private message, and fortunately caught myself before continuing.
I’d suggest reversing the order of “report” and “reply”, so that they fit with the comments options.
Right, that’s my tiny suggestion for this month :-)
Indeed :-) but just like modern nature lovers will tell you all about it on their cell-phone, there are some artifices he just won’t count as being artificial...
Plus, the Machines just dropped a love interest straight on him...
CEV, until designed and defined properly, is just a black box that everyone universally agrees is ‘good’, but has little else in term of defining features.
Saying X=”I’m dropping out of school to join a doomsday cult” in a blatantly ironic way gives you the benefits of implying ‘to unschooled eyes, it would appear that X—don’t be unschooled’ along with ‘I’m sophisticated enough to be aware that certain aspects my decision look as if X’ and ‘I’m confident enough about my decision to make light of this’, before finally concluding ‘but of course, it’s not actually true that X’.
Irony signals a lot.
On a more serious note: cut up your Great Thingy into smaller independent ideas, and treat them as independent.
For instance a marxist would cut up Marx’s Great Thingy into a theory of value of labour, a theory of the political relations between classes, a theory of wages, a theory on the ultimate political state of mankind. Then each of them should be assessed independently, and the truth or falsity of one should not halo on the others. If we can do that, we should be safe from the spiral, as each theory is too narrow to start a spiral on its own.
Same thing for every other Great Thingy out there.
But you can’t have it both ways—as a matter of probability theory, not mere fairness.
You’ve proved your case—but there’s still enough wriggle room that it won’t make much practical difference. One example from global warming, which predicts higher temperature on average in Europe—unless it diverts the gulf stream, in which case it predicts lower average temperatures. Consider the two statements: 1) If average temperatures go up in Europe, or down, this is evidence for global warming. 2) If average temperatures go up in Europe, and the gulf stream isn’t diverted, or average temperatures go down, while the gulf stream is diverted, this is evidence of global warming.
1) is nonsense, 2) is true. Lots of people say statements that sound like 1), when they mean something like 2). Add an extra detail, and the symmetry is broken.
This weakens the practical power of your point; if an accused witch is afraid, that shows she’s guilty; if she’s not afraid, in a way which causes the inquisitor to be suspicious, she’s also guilty. That argument is flawed, but it isn’t a logical flaw (since the similar statement 2) is true).
Then we’re back to arguing the legitimacy of these “extra details”.
Completed the survey.
Two issues: I put a very low donation to charity, even though I consider working for the FHI to be a donation in kind.
Second, I messed up the probabilities, sorry, because I could not give any answer to P(simulation) and P(MWI) other than “NAN” (not a number). I can explain that stance in detail if you want.