Seeing red is more than a role or disposition. That is what you have left out.
Do you have any evidence for this claim, besides a subjective feeling of certainty?
FDT says you should not pay because, if you were the kind of person who doesn’t pay, you likely wouldn’t have been blackmailed. How is that even relevant? You are being blackmailed.
I’m quoting this because, even though it’s wrong, it’s actually an incredibly powerful naive intuition. I think many people who have internalized TDT/UDT/FDT-style reasoning have forgotten just how intuitive the quoted block is. The unstated underlying assumption here (which is unstated because Schwarz most likely doesn’t even realize it is an assumption) is extremely persuasive, extremely obvious, and extremely wrong:
If you find yourself in a particular situation, the circumstances that led you to that situation are irrelevant, because they don’t change the undeniable fact that you are already here.
This is the intuition driving causal decision theory, and it is so powerful that mainstream academic philosophers are nearly incapable of recognizing it as an assumption (and a false assumption at that). Schwarz himself demonstrates just how hard it is to question this assumption: even when the opposing argument was laid right in front of him, he managed to misunderstand the point so hard that he actually used the very mistaken assumption the paper was criticizing as ammunition against the paper. (Note: this is not intended to be dismissive toward Schwarz. Rather, it’s simply meant as an illustrative example, emphasizing exactly how hard it is for anyone, Schwarz included, to question an assumption that’s baked into their model of the world.) And if even if you already understand why FDT is correct, it still shouldn’t be hard to see why the assumption in question is so compelling:
How could what happened in the past be relevant for making a decision in the present? The only thing your present decision can affect is the future, so how could there be any need to consider the past when making your decision? Surely the only relevant factors are the various possible futures each of your choices leads to? Or, to put things in Pearlian terms: it’s known that the influence of distant causal nodes is screened off by closer nodes through which they’re connected, and all future nodes are only connected to past nodes through the present—there’s no such thing as a future node that’s directly connected to a past node while not being connected to the present, after all. So doesn’t that mean the effects of the past are screened off when making a decision? Only what’s happening in the present matters, surely?
Phrased that way, it’s not immediately obvious what’s wrong with this assumption (which is the point, of course, since otherwise people wouldn’t find it so difficult to discard). What’s actually wrong is something that’s a bit hard to explain, and evidently the explanation Eliezer and Nate used in their paper didn’t manage to convey it. My favorite way of putting it, however, is this:
In certain decision problems, your counterfactual behavior matters as much—if not more—than your actual behavior. That is to say, there exists a class of decision problems where the outcome depends on something that never actually happens. Here’s a very simple toy example of such a problem:
Omega, the alien superintelligence, predicts the outcome of a chess game between you and Kasparov. If he predicted that you’d win, he gives you $500 in reality; otherwise you get nothing.
Strictly speaking, this actually isn’t a decision problem, since the real you is never faced with a choice to make, but it illustrates the concept clearly enough: the question of whether or not you receive the $500 is entirely dependent on a chess game that never actually happened. Does that mean the chess game wasn’t real? Well, maybe; it depends on your view of Platonic computations. But one thing it definitely does not mean is that Omega’s decision was arbitrary. Regardless of whether you feel Omega based his decision on a “real” chess game, you would in fact either win or not win against Kasparov, and whether you get the $500 really does depend on the outcome of that hypothetical game. (To make this point clearer, imagine that Omega actually predicted that you’d win. Surprised by his own prediction, Omega is now faced with the prospect of giving you $500 that he never expected he’d actually have to give up. Can he back out of the deal by claiming that since the chess game never actually happened, the outcome was up in the air all along and therefore he doesn’t have to give you anything? If your answer to that question is no, then you understand what I’m trying to get at.)
So outcomes can depend on things that never actually happen in the real world. Cool, so what does that have to do with the past influencing the future? Well, the answer is that it doesn’t—at least, not directly. But that’s where the twist comes in:
Earlier, when I gave my toy example of a hypothetical chess game between you and Kasparov, I made sure to phrase the question so that the situation was presented from the perspective of your actual self, not your hypothetical self. (This makes sense; after all, if Omega’s prediction method was based on something other than a direct simulation, your hypothetical self might not even exist.) But there was another way of describing the situation:
You’re going about your day normally when suddenly, with no warning whatsoever, you’re teleported into a white void of nothingness. In front of you is a chessboard; on the other side of the chessboard sits Kasparov, who challenges you to a game of chess.
Here, we have the same situation, but presented from the viewpoint of the hypothetical you on whom Omega’s prediction is based. Crucially, the hypothetical you doesn’t know that they’re hypothetical, or that the real you even exists. So from their perspective, something random just happened for no reason at all. (Yes, yes, if Omega used some method other than a simulation to make his prediction, the hypothetical you wouldn’t have existed and wouldn’t have had a perspective—but hey, that doesn’t stop me from writing from their perspective, right? After all, real people write from the perspectives of unreal people all the time; that’s just called writing fiction. And besides, we’ve already established that real or unreal, the outcome of the game really does determine whether you get the $500, so the thoughts and feelings of the hypothetical you are nonetheless important in that they partially determine the outcome of the game.)
And now we come to the final, crucial point that makes sense of the blackmail scenario and all the other thought experiments in the paper, the point that Schwarz and most mainstream philosophers haven’t taken into account:
Every single one of those thought experiments could have been written from the perspective, not of the real you, but a hypothetical, counterfactual version of yourself.
When “you’re” being blackmailed, Schwarz makes the extremely natural assumption that “you” are you. But there’s no reason to suppose this is the case. The scenario never stipulates why you’re being blackmailed, only that you’re being blackmailed. So the person being blackmailed could be either the real you or a hypothetical. And the thing that determines whether it’s the real you or a mere hypothetical is...
...your decision whether or not to pay up, of course.
If you cave into the blackmail and pay up, then you’re almost certainly the real deal. On the other hand, if you refuse to give in, it’s very likely that you’re simply a counterfactual version of yourself living in an extremely low-probability (if not outright inconsistent) world. So your decision doesn’t just determine the future; it also determines (with high probability) which you “you” are. And so then the problem simplifies into this: which you do you want to be?
If you’re the real you, then life kinda sucks. You just got blackmailed and you paid up, so now you’re down a bunch of money. If, on the other hand, you’re the hypothetical version of yourself, then congratulations: “you” were never real in the first place, and by counterfactually refusing to pay, you just drastically lowered the probability of your actual self ever having to face this situation and (in the process) becoming you. And when things are put that way, well, the correct decision becomes rather obvious.
But this kind of counterfactual reasoning is extremely counterintuitive. Our brains aren’t designed for this kind of thinking (well, not explicitly, anyway). You have to think about hypothetical versions of yourself that have never existed and (if all goes well) will never exist, and therefore only exist in the space of logical possibility. What does that even mean, anyway? Well, answering confused questions like that is pretty much MIRI’s goal these days, so I dunno, maybe we can ask them.
(Posted as a comment rather than an answer because all of this is pretty rambling, and I’m not super-confident about any of the stuff I say below, even if my tone or phrasing seems to suggest otherwise.)
For the purposes of a discussion like this, rather than talk about what intellectual honesty is, I think it makes more sense to talk about what intellectual honesty is not. Specifically, I’d suggest that the kinds of behavior we consider “intellectually honest” are simply what human behavior looks like when it’s not being warped by some combination of outside incentives. The reason intellectual honesty is so hard to find, then, is simply that humans tend to find themselves influenced by external incentives almost all of the time. Even absent more obvious factors like money or power, humans are social creatures, and all of us unconsciously track the social status of ourselves and others. Throw in the fact that social status is scarce by definition, and we end up playing all sorts of social games “under the table”.
This affects practically all of our interactions with other people, even interactions ostensibly for some other purpose (such as solving a problem or answering a question). Unless people are in a very specific kind of environment, by default, all interactions have an underlying status component: if I say something wrong and someone corrects me on it, I’m made to seem less knowledgeable in comparison, and so that person gains status at my expense. If you’re in an environment where this sort of thing is happening (and you pretty much always are), naturally you’re going to divert some effort away from accomplishing whatever the actual goal is, and toward maintaining or increasing your social standing. (Of course, this behavior needn’t be conscious at all; we’re perfectly capable of executing status-increasing maneuvers without realizing we’re doing it.)
This would suggest that intellectual honesty is most prevalent in fields that prioritize problem-solving over status, and (although confirmation bias is obviously a thing) I do think this is observably true. For example, when a mathematician finds that they’ve made a mistake, they pretty much always own up to it immediately, and other mathematicians don’t respect them less for doing so. (Ditto physicists.) And this isn’t because mathematicians and physicists have some magical personality trait that makes them immune to status games—it’s simply because they’re focused on actually doing something, and the thing they’re doing is more important to them than showing off their own cleverness.
If you and I are working together to solve a particular problem, and both of us actually care about solving the problem, then there’s no reason for me to feel threatened by you, even if you do something that looks vaguely like a status grab (such as correcting me when I make a mistake). Because I know that we’re fundamentally on the same side, I don’t need to worry nearly as much about what I say or do in front of you, which in turn allows me to voice my actual thoughts and opinions much more freely. The atmosphere is collaborative rather than competitive. In that situation, both of us can act “intellectually honest”, but importantly, there’s not even a need for that term. No one’s going to compliment me on how “intellectually honest” I’m being if I quickly admit that I made a mistake, because, well, why would I be doing anything other than trying to solve the problem I set out to solve? It’s a given that I’d immediately abandon any unpromising or mistaken approaches; there’s nothing special about that kind of behavior, and so there’s no need to give it a special name like “intellectual honesty”.
The only context in which “intellectual honesty” is a useful concept is one that’s already dominated by status games. Only in cases where the incentives are sharply aligned against admitting that you’re wrong does it become something laudable, something unusual, something to be praised whenever someone actually does it. In practice, these kinds of situations crop up all the time because status is something humans breathe, but I still think it’s useful to point out that “intellectual honesty” is really just the default mode of behavior, even if that default mode is often corrupted by other stuff.
Firstly, to make sure all of us are on the same page: “procrastination”, as the word is the typically used, does not mean that one sits down and thinks carefully about the benefits and drawbacks of beginning to work right now as opposed to later, and then, as a result of this consideration, rationally decides that beginning to work later is a more optimal decision. Rather, when most people use the word “procrastinate”, they generally mean that they themselves are aware that they ought to start working immediately—such that if you asked them if they endorsed the statement “I should be working right now”, they would wholeheartedly reply that they do—and yet mysteriously, they still find themselves doing something else.
If, Said, you have not experienced this latter form of procrastination, then I’m sure you are the object of envy for many people here (including myself). If, however, you have, and this is what you were referring to when you answered “yes” to lkaxas’ question, then the followup question about “internal experience” can be interpreted thusly:
Why is it that, even though you consciously believe that working is the correct thing to be doing, and would verbally endorse such a sentiment if asked, you nonetheless do not do the thing you think is correct to do? This is not merely “irrational”; it seems to defy the very concept of agency—you are unable to act on your own will to act, which seems to undercut the very notion that you choose to do things at all. What does it feel like when this strange phenomenon occurs, when your agency seems to disappear for no explicable reason at all?
To this, certain others (such as myself and, I presume, lkaxas and Kaj Sotala) would reply that there is some additional part of our decision-making process, perhaps a less conscious, less explicit part whose desires we cannot verbalize on demand and are often entirely unaware of, which does not endorse our claim that to begin working now is the best thing to do. This part of us may feel some sense of visceral repulsion when the thought of working arises, or perhaps it may simply be attracted to something else that it would rather be doing—but regardless of the cause, the effect of that hidden desire overrides our conscious will to work, and a result, we end up doing something other than working, despite the fact that we genuinely do wish to work. (Much of IFS, as I understand it, has to do with identifying these more subtle parts of our minds and promoting them to conscious attention so that they may be analyzed with the same rigor one devotes to one’s normal thoughts.)
You, however, seem to have rejected this multi-agent framework, and so—assuming that you have in fact experienced “procrastination” as described above—your experience while procrastinating must describe something else entirely, something which need not invoke reference to such concepts as desires being “overridden” by deeper desires, or a different “part” of oneself that wants different things than the one does. If so, could you provide such a description?
This post seems relevant. (Indeed, it seems to dissolve the question entirely, and a full decade in advance.)
I suspect that this is evidence in favour of slower takeoff speeds, because being as smart as humans isn’t nearly enough to do as well as humans.
I don’t see the connection between the latter claim and the former claim.
I don’t understand what “the underlying causality I am part of” can possibly mean, since causality is a human way to model observations. This statement seems to use the mind projection fallacy to invert the relationship between map and territory.
If you want to discount the use of causal models as merely a “human way to model observations” (one that presumably bears no underlying connection to whatever is generating those observations), then you will need to explain why they work so well. The set of all possible sequences of observations is combinatorially large, and the supermajority of those sequences admit no concise description—they contain no regularity or structure that would allow us to substantially compress their length without losing information. The fact that our observations do seem to be structured, therefore, is a very improbable coincidence indeed. The belief in an external reality is simply a rejection of the notion that this extremely improbable circumstance is a coincidence.
This is a strange scenario (it seems to be very different from the sort of scenario one usually encounters in such problems), but sure, let’s consider it. My question is: how is it different from “Omega doesn’t give A any money, ever (due to a deep-seated personal dislike of A). Other agents may, or may not, get money, depending on various factors (the details of which are moot)”?
This doesn’t seem to have much to do with decision theories.
Yes, this is correct, and is precisely the point EYNS was trying to make when they said
Intuitively, this problem is unfair to Fiona, and we should compare her performance to Carl’s not on the “act differently from Fiona” game, but on the analogous “act differently from Carl” game.
“Omega doesn’t give A any money, ever (due to a deep-seated personal dislike of A)” is a scenario that does not depend on the decision theory A uses, and hence is an intuitively “unfair” scenario to examine; it tells us nothing about the quality of the decision theory A is using, and therefore is useless to decision theorists. (However, formalizing this intuitive notion of “fairness” is difficult, which is why EYNS brought it up in the paper.)
I’m not sure why shminux seems to think that his world-counting procedure manages to avoid this kind of “unfair” punishment; the whole point of it is that it is unfair, and hence unavoidable. There is no way for an agent to win if the problem setup is biased against them to start with, so I can only conclude that shminux misunderstood what EYNS was trying to say when he (shminux) wrote
I note here that simply enumerating possible worlds evades this problem as far as I can tell.
Say you have an agent A who follows the world-enumerating algorithm outlined in the post. Omega makes a perfect copy of A and presents the copy with a red button and a blue button, while telling it the following:
“I have predicted in advance which button A will push. (Here is a description of A; you are welcome to peruse it for as long as you like.) If you press the same button as I predicted A would push, you receive nothing; if you push the other button, I will give you $1,000,000. Refusing to push either button is not an option; if I predict that you do not intend to push a button, I will torture you for 3^^^3 years.”
The copy’s choice of button is then noted, after which the copy is terminated. Omega then presents the real agent facing the problem with the exact same scenario as the one faced by the copy.
Your world-enumerating agent A will always fail to obtain the maximum $1,000,000 reward accessible in this problem. However, a simple agent B who chooses randomly between the red and blue buttons has a 50% chance of obtaining this reward, for an expected utility of $500,000. Therefore, A ends up in a world with lower expected utility than B.
[META] As a general heuristic, when you encounter a post from someone otherwise reputable that seems completely nonsensical to you, it may be worth attempting to find some reframing of it that causes it to make sense—or at the very least, make more sense than before—instead of addressing your remarks to the current (nonsensical-seeming) interpretation. The probability that the writer of the post in question managed to completely lose their mind while writing said post is significantly lower than both the probability that you have misinterpreted what they are saying, and the probability that they are saying something non-obvious which requires interpretive effort to be understood. To maximize your chances of getting something useful out of the post, therefore, it is advisable to condition on the possibility that the post is not saying something trivially incorrect, and see where that leads you. This tends to be how mutual understanding is built, and is a good model for how charitable communication works. Your comment, to say the least, was neither.
I am not sure how one can talk about the observed universe and the number 3^^^3 in the same sentence, given that the maximum informational content is roughly 10^120 qubits, the rest is outside the cosmological horizon.
Where in the post do you see it suggested that our universe is capable of containing 3^^^3 of anything?
Alternatively, if we talk about the simulation argument, then the expression “practical implications” seems out of place.
If there’s some kind of measure of “observer weight” over the whole mathematical universe, we might be already much larger than 1/3^^^3 of it, so the total utilitarian can only gain so much.
Could you provide some intuition for this? Naively, I’d expect our “observer measure” over the space of mathematical structures to be 0.
Saving the world certainly does seem to be an instrumentally convergent strategy for many human terminal values. Whatever you value, it’s hard to get more of it if the world doesn’t exist. This point should be fairly obvious, and I find myself puzzled as to why you seem to be ignoring it entirely.
While I liked Valentine’s recent post on kensho and its follow-ups a lot, one thing that I was annoyed by were the comments that the whole thing can’t be explained from a reductionist, third-person perspective. I agree that such an explanation can’t produce the necessary mental changes that the explanation is talking about. But it seemed wrong to me to claim that all of this would be somehow intrinsically mysterious and impossible to explain on such a level that would give people at least an intellectual understanding of what Looking and enlightenment and all that are.
Speaking as someone who’s more or less avoided participating in the kensho discussion (and subsequent related discussions) until now, I think the quoted passage pretty much nails the biggest reservation I had with respect to the topic: the language used in those threads tended to switch back and forth between factual and metaphorical with very little indication as to which mode was being used at any particular moment, to the point where I really wanted to just say, “Okay, I sort of see what you’re gesturing at and I’d love to discuss this with you in good faith, but before we get started on that, can we quickly step out of mythic mode/metaphor land/narrative thinking for a moment, just to make sure that we are all still on the same page as far as basic ontology goes, and agree that, for instance, physics and mathematics and logic are still true?”
But when other people in those threads (such as, for example, Said Achmiz) asked essentially the same question, it seemed to me (as in System-1!seemed) that Val and others would simply respond with “It doesn’t matter what basic ontology you’re using unless that ontology actually helps you Look.” Which, okay, fine, but I don’t really want to start trying to Look until I can confirm the absence of some fairly huge epistemic issues that typically plague this region of thought-space.
All of which is to say, I’m glad this post was made. ;-)
(although there is a part of me that can’t help but wonder why this post or something like it wasn’t the opener for this topic, as opposed to something that was only typed up after a couple of huge demon threads spawned)
The purpose of this post is to communicate, not to persuade. It may be that we want to bit [sic] the bullet of the strongest form of robustness to scale, and build an AGI that is simply not robust to scale, but if we do, we should at least realize that we are doing that.
Indeed I am, and for good reason: the cost I speak of is one which utterly dwarfs all others.
This is a claim that requires justification, not bald assertion—especially in this kind of thread, where you are essentially implying that anyone who disagrees with you must be either stupid or malicious. Needless to say, this implication is not likely to make the conversation go anywhere positive. (In fact, this is a prime example of a comment that I might delete were it to show up on my personal blog—not because of its content, but because of the way in which that content is presented.)
Issues with tone aside, the quoted statement strongly suggests to me that you have not made a genuine effort to consider the other side of the argument. Not to sound rude, but I suspect that if you were to attempt an Ideological Turing Test of alkjash’s position, you would not in fact succeed at producing a response indistinguishable from the genuine article. In all charitability, this is likely due to differences of internal experience; I’m given to understand that some people are extremely sensitive to status-y language, while others seem blind to it entirely, and it seems likely to me (based on what I’ve seen of your posts) that you fall into the latter category. In no way does this obviate the existence or the needs of the former category, however, and I find your claim that said needs are “dwarfed” by the concerns most salient to you extremely irritating.
Footnote: Since feeling irritation is obviously not a good sign, I debated with myself for a while about whether to post this comment. I decided ultimately to do so, but I probably won’t be engaging further in this thread, so as to minimize the likelihood of it devolving into a demon thread. (It’s possible that it’s already too late, however.)
It’s also entirely information-free, which means that as an epistemic aid it’s rather… lacking.
The argument is never about how soon the future will come, always about how good the future will be. There is nothing “wrong” with any given outcome, but if we can do better, then it’s worth dedicating thought to that.
I think a large part of what prevented many people from investing in Bitcoin may have been the epistemic norms commonly referred to nowadays as “the absurdity heuristic”, “the outside view”, “modest epistemology”, etc. In other words, many of us may have held the (subconscious) belief that it’s impossible to perform substantially better than the market, even in situations where the Efficient Markets Hypothesis may not fully apply. To put it another way:
Well, suppose God had decided, out of some sympathy for our project, to make winning as easy as possible for rationalists. He might have created the biggest investment opportunity of the century, and made it visible only to libertarian programmers willing to dabble in crazy ideas. And then He might have made sure that all of the earliest adapters were Less Wrong regulars, just to make things extra obvious.
I think many of us considered this, and unconsciously dismissed it due to the obvious absurdity: surely things can’t be that easy, right? Sure, we may be rationalists, and sure, rationalists “ought to win”, but surely winning can’t be so easy that the opportunity to win literally hits us on the head, right?
I think what this points to is a fundamental inability on our part to Take Ideas Seriously. Of course, most people don’t have this ability at all, and we’re surely doing much better on that count—but what matters in this case isn’t your relative superiority to other people, but your absolute level of skill. (I’m using the pronoun “your” here to refer to the majority of rationalists who didn’t invest in Bitcoin, not the few who did.) The corresponding solution seems obvious: work to improve our ability to Take Ideas Seriously, without dismissing absurd-sounding ideas too quickly.
Easier said than done, of course.
Incidentally, I’m also interested in what specifically you mean by “random program”. A natural interpretation is that you’re talking about a program that is drawn from some kind of distribution across the set of all possible programs, but as far as I can tell, you haven’t actually defined said distribution. Without a specific distribution to talk about, any claim about how likely a “random program” is to do anything is largely meaningless, since for any such claim, you can construct a distribution that makes that claim true.
(Note: The above paragraph was originally a parenthetical note on my other reply, but I decided to expand it into its own, separate comment, since in my experience having multiple unrelated discussions in a single comment chain often leads to unproductive conversation.)