Where “the Sequences” Are Wrong

Link post

In short: Wherever they espouse or depend on ‘negativism.’

Eliezer Yudkowsky’s magnum opus is the cornerstone of the Rationalist literature and the core of what makes up the LessWrong community’s body of ideas that are frequently cited during arguments and debates. One question that is naturally raised when people discuss the “Rationalists” as a sub-culture—and particularly, whether they are a harmful sub-culture or not—is how much of this depends on the Sequences, or whether any of their quirks as a group are merely incidental, i.e., not dependent on the Sequences or any of their founding literature.

Over the years, I have gradually come to the conclusion that cultures are in fact largely dependent on, and fairly faithful to, their founders and their founders’ views, especially their written and explicated views. Therefore, it is fair to to treat the Rationalists as reliably following their founder’s ideas and fairly reliably both acting on them and expounding on them when they do so. Therefore, whatever we like about them is a consequence of something likeable about Eliezer Yudkowsky and something he taught, and vice-versa.

Thus, it wouldn’t be fair to disparage anyone because they had merely “tripped up” or had flubbed in the way that they explained something once. We shouldn’t judge anyone as being as bad as the worst thing they ever did. It would be better if instead you considered them to be as bad as the best thing they ever did. I would probably recommend that better yet, try and include the sum-total of everything they appear to stand for—and especially what they do not back down from—when you have put your best effort in to trying to understand them.

Thus, when we try and decide on big decisions together as a society, such as on whether or not, or how, to regulate AI, we are probably much better off by examining the views in context that led us to believe one way or the other. I think we can trust that it seems like most of the views that lead us towards wanting to regulate /​ slow-down AI development probably stem from Eliezer Yudkowsky’s reasoning and arguments.

Therefore, given that p(doom) > 99% seems very pessimistic to me indeed, and also incorrect, I can also conclude that this is not an anomaly from the standpoint of Yudkowsky’s reasoning, it must be derived from nearly everything relevant in the Sequences, which amounts to a fairly large chunk of the material. If very high p(doom) strikes us as very wrong (and turns out to be wrong, which I don’t think will take very long for us to find out, by assumption), then it will have been because very significant parts of the Sequences were wrong, too.

I have concluded that wherever the Sequences are wrong, they are wrong due to the negativism implied or explicitly assumed in them. You can think of negativism as being the explicit negation of the philosophy I expound here: Negativism is the binary bifurcation, which treats propositions as being one of either true or false, and also that one can define success or failure—it can be stated within the realm of ‘valid scientific sentences’ things that could possibly succeed or fail. This is a fairly tricky subject area, in my opinion, and also I think this is due to the fact that Artificial Intelligence is an overlap area between logic, mathematics, philosophy, and physics.

To elaborate on that further, negativism elevates “errors” to a metaphysically real position, one that I consider to be overly mystical. What do I mean by “metaphysically real?” Well, when one elevates “errors” to the metaphysically real position, they begin to consider that something could potentially be fatally flawed in an objective sort of way. Now, keep in mind that I am describing a position that I believe to be inherently absurd, so it may sound that way I as I describe it.

Negativism implies that there is a cognitive error-mode known as “wishful thinking” in which one can be led to believe that a certain mode-of-thought will bring them success, but where in fact they will inevitably fail. This is when someone gets stuck in a pattern-of-thought that they can’t get out of, and where only other people can notice that there is something wrong with them. The reason they get stuck in them is because the pattern-of-thought has attractive properties to it that make it nice to believe. Other people are capable of noticing this either because they have been trained to notice, or because those attractive properties do not pertain to them specifically, and thus they are unaffected by it.

For example, suppose Alice has come up with a nice theory that has attractive properties[1]: it answers several outstanding important questions, provides some solution methods to another set of tricky issues, and uses a bunch of fancy-looking equations that Alice can put her name to, and gain some respect and prestige among her peers. Bob admonishes her not to be so quick to think so highly of herself, and that the dangers of wishful thinking can lead her to prematurely conclude that she is correct. If she indulges in this wishful thinking too much as she progresses on her theory, it will lead her to draw many mistaken conclusions, potentially compromising the entire chain of reasoning.

It’s easy to see off-the-bat from this that negativism requires one to believe that there are things that you cannot do: There are rules in logical chains-of-reasoning that must be followed, and there are certain steps—that someone might be unaware of—that are not allowed. Furthermore, it also requires one to believe that those unallowed steps could seem allowed due to having some justifications, but ultimately must satisfy all the requirements for justification before being allowed. Logical steps must be ruled-in, not ruled-out. This must follow: If Alice uses “wishful thinking,” then she is necessarily using logical-steps that have some justification, according to her. Bob is saying that it is dangerous to use things because they have some justification and lead to nice-looking conclusions.

Alice is moving from step to step excitedly and optimistically. She sees that one step in her reasoning seems fairly motivated by the other principles in her theory and what she’s trying to prove. Certain patterns keep showing up, and she wants to show how those patterns can be inferred from one another.[2] The whole thing seems deep and profound.

Bob believes that each of those steps has to be fully justified before moving on. In his view, errors gone unnoticed can keep building up without one being aware, and cause the whole thing to collapse unexpectedly. This is the pessimistic view. Bob believes that at most only one of {Loop Quantum Gravity, String Theory} is correct, and that possibly none of them are.[3]

According to Bob’s worldview, a fairly complete theory like String Theory could just be entirely wrong—and there is no reason to believe it isn’t wrong until it has been fully confirmed experimentally. Alice isn’t saying that nothing is wrong or that nothing can be wrong (that would amount to a strawman of her way of thinking). She is using her own judgement, which includes both positive and negative judgements, to perform each step of her theory-crafting. Bob’s pessimistic view is that neither of Alice’s positive or negative judgements can be simply assumed to be trustworthy a priori. This amounts to essentially a negative judgement of her entire judgement process as a whole.

Thus negativism teaches individuals that their own judgement and reasoning is naturally and by-default irrational and error-prone. Furthermore, it is at least strongly implied that these kinds of errors, typically produced by the default human psyche, do matter on a practical and fundamental level, otherwise there would be no need to talk about them. The main process of training our minds to produce good results is to identify and address these cognitive failure-modes.

This kind of thinking emerges from the thought-pattern of “those people over there are really wrong about some big, fundamental issue, and they don’t change their minds when I argue with them.” In order for those big cognitive errors to matter significantly, they must result in a significant number of people coming to believe in a significantly wrong idea, that lasts long enough to be noticed. Therefore, in order to explain why some people believe wrong things and why you can see that they are wrong (but they can’t), you’ll need to identify a specific cognitive deficit that people must have (by default) but that you’ve managed to eliminate from yourself somehow.

In this case, it isn’t really enough that people who choose to believe in things you consider false—religion, let’s say—are aware of the counter-arguments to their religion, but choose to believe in it anyway, because—let’s suppose—they like the way it makes them feel, and-or it gives them a nice community and hope for the future. To you, the error has been made, and no excuses can justify that. Therefore, they must be using some kind of tactic(s) that allow them to believe in their thing despite evidence to the contrary. Here, your thought-pattern goes: “Clearly they are believing in something that is logically shown to be false. They say it is because they like the way it makes them feel and that it can’t be proven to be false, but I think it must be for different reasons from what they say. Perhaps they believe they believe for reasons X, but actually it is for reasons Y.”

Here, strong negativism negates even their stated reasons for believing in some proposition. In negativism, you can’t even be sure that the reasons you give to yourself and others for why you think something are correct, let alone whether or not you are correct about what you think.

The Sequences imply that one can believe they believe something, but not really believe, or at least not have a good understanding of what their belief even means. You can be technically incorrect about your own assessment of your own beliefs. This matters especially for culture-bound or ritualistic /​ religious beliefs, but it can apply to anything. People are said to “wear their beliefs as attire.” This is a proposed explanation for why people are often observed to simultaneously a) profess belief enthusiastically, and b) believe in silly things (i.e. things that don’t constantly re-assert their own veracity).

People who’ve stopped anticipating-as-if their religion is true, will go to great lengths to convince themselves they are passionate, and this desperation can be mistaken for passion.

Yudkowsky reasons—I infer—that because people who do this aren’t obviously lying, then there’s no reason to assume that they must be, and therefore their expression of belief is done to convince themselves that they really do believe what they say they do. I find this unlikely.

Yudkowsky’s reasoning is speculative, and it is given as a proposed explanation for an observation he initially finds rather mysterious: He doesn’t understand why people believe things that they—according to his reckoning—have the mental faculties to deduce are a right load of old codswallop. Or, rather, that those beliefs don’t produce any observable evidence that the belief-holders will regularly interact with. So those people actually don’t believe, but just think they do.

That means people are wronger than simply having wrong beliefs, they are wrong about having them, too.

According to Yudkowsky, the mind is inherently rather opaque, both to others as well as yourself. He addresses these concerns in the following Sequence, “Noticing Confusion.” Ostensibly, if he is correct, then we are to find ourselves in a rather hopeless situation, unless there exist methods which can reliably allow us to detect when we are wrong about our own assessments of our own minds.

If only there were an art of focusing your uncertainty—of squeezing as much anticipation as possible into whichever outcome will actually happen!

He proposes that we can siphon our uncertainty into predictions about what is going to happen. This part, and most of his treatment(s) on Bayes’ Theorem, I have no problem with. But I don’t think this actually addresses the “noticing confusion” issue or of how to deal with the problem of not understanding our own reasons for believing things.

Yudkowsky registers an event where he by his own assessment experienced being confused by something (a second-hand story of something that seemed inexplicable off-the-bat). Then, he laments that he was fooled into believing a lie:

We are all weak, from time to time; the sad part is that I could have been stronger. I had all the information I needed to arrive at the correct answer, I even noticed the problem, and then I ignored it. My feeling of confusion was a Clue, and I threw my Clue away.

He chastises himself for noticing a problem and then ignoring it. This is the wrong, negativist approach. In the negativist approach, you punish yourself for making a mistake. The problem with that is—such as in this case—he didn’t actually make a mistake. His gut told him immediately that there was something that didn’t make sense about the story he was told. But then he does something strange: He notices that he made a mistake by not trusting his gut instinct earlier enough, and then decides once again that he made another mistake. This is not, actually, the only reaction one could have. One could instead react in the following way: “Oops, I guess I didn’t make a mistake after all.” These two different reactions calibrate the mind in two different directions.

The sad part is that he didn’t actually need to be so sad nor consider himself weak for what he’d done, and furthermore, putting in the effort to feel more sad about the situation is not the best way to re-calibrate his intuitions to have righted the wrong he made. The sad part—that he didn’t actually need to be as sad as he was—isn’t technically the sad part at all. Going “Oh, dammit I ignored by initial instinct about the problem once again! Damn my error-prone and morally and mentally weak self!” isn’t actually going to make one less likely to make this kind of error.

His book is full of advice along the lines of noticing when you feel certain types of feelings, and how to react to those feelings in specific ways. Those reactions do take some amount of effort and will feel like choices. And that, by itself, is true—it is important to notice feelings and decide how best to react to them. But his advice also tends to say that it is better to react in ways that are negative to one’s self, and this is where it diverges from good advice.

Be not too quick to blame those who misunderstand your perfectly clear sentences, spoken or written. Chances are, your words are more ambiguous than you think.

In that linked piece, for example, he admonishes one not to assume that people can easily interpret what you meant. Therefore, if someone says to you that you aren’t making sense to them, believe them. Despite chastising himself earlier for the error of assigning low probability to the chance that the story he heard was simply a lie, he doesn’t choose to update his credences towards the possibility that people also lie—especially in adversarial conversations—about whether or not their debate partner can competently express their own ideas sufficiently.

“Chances are, your words are more ambiguous than you think” automatically goes down the further we march into the territory of our own ideas. The more we ask for, and give, further explication, the less likely our words are to be interpreted ambiguously. But the impression I get from this is that he always thinks this is true. He cites one study in which participants were asked to deliver intentionally ambiguous sentences to one another and asked to rate how well they thought they were understood.

Speakers thought that they were understood in 72% of cases and were actually understood in 61% of cases.

This doesn’t seem bad enough for me to worry about this problem as much as he thinks we should, especially given the set-up of this experiment. If the experimenters are intentionally choosing deliberately ambiguous sentences, that can change things dramatically, here. Remember, this is the question of “does intent matter?”

As Keysar and Barr note, two days before Germany’s attack on Poland, Chamberlain sent a letter intended to make it clear that Britain would fight if any invasion occurred. The letter, phrased in polite diplomatese, was heard by Hitler as conciliatory—and the tanks rolled.

Does he really think Hitler misunderstood due to some mistake on Chamberlain’s part? Would Hitler not have simply done what he was always planning to do?

Yudkowsky, apparently, does not even think Hitler would be the type of person to just lie about whether or not he understood what someone else said.

To make our stupidity obvious, even to ourselves—this is the heart of Overcoming Bias.

This is the heart of negativism—to make ourselves look stupid to ourselves, to declare ourselves stupid and ignorant as static, holistic classifications and believe that we’ve performed some kind of ritualistic purification. But in fact, what makes this far more appealing is to do this to other people; We’ve already performed the cleansing ritual, but the huddled masses have not, and thus are still subject to the stupidity and ignorance that is the default in all human beings.

There are two basic types of belief in evil:

  1. Evil is—as it is typically defined—willful, knowing, malintent. Intent and desire to cause someone else harm.

  2. A more muddied definition in which ‘evil’ can be committed unknowingly. Actions can be considered ‘evil’ even if they were not willful and knowing. Typically, this is relevant to whether or not and how to punish people for various actions.

Because ‘evil’ is one of the most emotionally loaded words there is—and should be—it can never be defined in such a way that completely removes that heaviness and its connotations for how one should react to it.

But nor is that ever really intended, for even under definition 2, we are still urged to react in the same way—shudder in moral disgust and-or outrage—whenever we are told about some kind of ghastly act that took place.

A quote attributed to Voltaire (and observed most recently alongside anti-Trump bumper stickers) says, “Anyone who can make you believe absurdities can make you commit atrocities.”

The intent of that message, as far as I can tell from the contexts I have seen it in, is usually to cause someone to feel as though they may have committed an unknowing sin by believing in right-wing talking points.[4]

In other words, you have committed a number 2 evil by allowing yourself to be persuaded by someone else’s less-than-holy ideals.

Like I pointed out in my last post, the primary axiom on which “Mistake Theory” rests is that number 2 evil is essentially the only kind of evil that exists, and at least that it is the only type worth worrying about politically.

Yudkowsky muddles the definition of evil quite enough here, and in a most well-known fashion, presents Conflict Theory as a great mistake here. I have already presented my treatment on that part of his ideology.

The reason that some people prefer to believe this axiom of Mistake Theory over number 1 evil is not because it would be “scarier” if number 1 evil is real (quite the contrary, actually). It is because this axiom allows people to believe in their own superiority due to no more than their membership in a group which ascribes inferiority to non-group membership.

So if we were to focus our uncertainty to constrain our anticipation of what we’d expect a group that espouses the philosophy of the Sequences to actually be observed doing, what would we see?

Well, I would predict that belief in negativism causes depression, people that adhere to this philosophy to form groups based on negative-selection, and these groups to be somewhat cult-like in their behaviors.

You can get depressed from taking Yudkowsky’s ideas too seriously because they insist that one should treat their own mind as inferior.

Such questions must be dissolved. Bad things happen when you try to answer them.

This is an extreme position that says to push down things that arise from your own natural thought process—to interfere with yourself. Furthermore, it’s even worse because it isn’t advising you to do anything more civilized than hit it with a sledgehammer until it’s dead. Then keep beating it until someone has to pull you away.

Note how the above quote carries the phrase “bad things happen.” It is self-evident that you will be more anxious and depressed the more you believe that bad things will happen. The most common anxiety-depression state is where you think that bad things could happen if you do the wrong thing, and it will be your fault, and you want to be correct and not have to worry that merely wanting to be correct isn’t enough.

It is well-documented that people in Yudkowsky’s circle report many cases of themselves being depressed, others in the circle being depressed, as well as cases of social ostracism and shunning.

Yudkowsky’s insistence that no one, not even an abstract formal system, ought to be able to trust itself goes very deep and very far. And this is where I believe he makes his most glaring error, because it is here where he chooses to interpret math differently than what it’s obvious face-value meaning is.

But this is also where it is most crystal-clear how his thesis objects to self-trust in general, and why this is necessary to prove his Orthogonality Thesis:

An inability to trust assertions made by a proof system isomorphic to yourself, may be an issue for self-modifying AIs.

While it should be self-evident that no one, when writing or talking about any topic, intends for their points not to be taken seriously, unfortunately it is also the case that sometimes, when their points are vulnerable due to external disagreement, the authors of said points sometimes strategically decide to claim that they have been misinterpreted, and in fact meant something that was both weaker and therefore less objectionable (this is what a “Motte and Bailey” typically is defined to mean).

But, the stronger—and more objectionable—point is necessary to complete Yudkowsky’s objectives. And this stronger, more objectionable point is that “a mathematical system cannot assert its own soundness without becoming inconsistent.”

I believe that point is both provably false as well as rather profound and significant (in its form where it has been proven to be false).

But Yudkowsky also is at his most awkwardly and embarrassingly low-effort incorrect here, regarding Löb’s Theorem.

You see, the bailey in that stronger, more objectionable point is that one shouldn’t try to assert their own soundness, in practice, otherwise bad things will happen.

In that piece, Yudkowsky asserts that if one assumes their hypothesis allows one to prove an important theorem, then believing that their important theorem is actually true is tantamount to believing that their hypothesis is true, too (presumably without justification). This would allow one to believe anything, ostensibly, including the negation of their hypothesis.

As far as I can tell, Yudkowsky’s “proof” that Löb’s Theorem asserts what he says it does is actually an attempt to show that it directly leads to an inconsistency, which (if correct) would actually show that the theorem is false. This proof is not correct anyway, but even if it was, Yudkowsky deliberately equivocates between “taking (what the theorem actually says) as a hypothesis leads to a contradiction” and “what the theorem actually says.” In other words, he redefines Löb’s Theorem to mean “Löb’s Theorem implies contradiction”, ostensibly, just so he can disbelieve in it and get away with it.

Consider that claim that “bad things will happen if you do X” is about as nebulous as it is designed to make you feel discouraged. There are a handful of other “folk theorems” that are said to say that bad things will happen if you try to do X. E.g., the Undecidability of the Halting Problem is commonly cited as a reason why it would be dangerous to assume that you can find an algorithm that “mostly works” or “does a good job on almost all problems.” The motte, of course, is that it claims only that you can’t have an algorithm that solves the problem on any arbitrary program.

The nebulosity of the “bad things will happen if …” paradigm throughout the Sequences is starkly incongruous with the constantly-repeated directive to remain as precise and actionable as possible given throughout the work. And yet, said paradigm is necessary to achieve Yudkowsky’s goal of convincing enough people that “intelligence explosion” risk is important enough to worry about.

You see, belief in doom is essentially the same thing as belief in “even if we all try to do the right thing, we will still probably fail.” And that leads to the more personal conclusion of “even if I try to do the right thing, I will still probably fail.” It’s no wonder that this makes people depressed.[5]

The Sequences make this even worse, for they strongly imply “Even if you try to do the right thing, if you don’t do the correct thing, you should actively try to make yourself feel inferior and worse.”

  1. ^

    “Natural Abstractions” is one of those rare theories that—favorably, in my opinion—just begins to scratch the surface of the binary bifurcation.

  2. ^

    See proof of T=J^2. The cornerstone of determining what is true is using discernment on itself.

  3. ^

    In my view it seems almost certain that both String Theory and Loop Quantum Gravity are correct.

  4. ^

    While it is not, in theory, impossible for it to be said in anti-left-wing contexts, empirically this attitude is observed to be more present in anti-right-wing contexts.

  5. ^

    A reminder that in Inner-Compass Theory, whatever is more likely to make you feel depressed is also that much more likely to be wrong.