If “Didicosm” evoked such emotion in you, be sure to also read “Death and the Gorgon”. (Commentary.)
Zack_M_Davis
I definitely wouldn’t call it “top rationality advice” because there are lots of other reasons to want to be liked and not want to be disliked, but I do expect the effect you describe to be real.
This isn’t obvious. What if people who like you tend to agree with your conclusions “as a favor” even if your arguments are bad?
President Donald Trump commented on Anthropic to CNBC’s Andrew Ross Sorkin on 21 April. The President said:
Anthropic is a group of very smart people, but they started telling our military how to operate, and we didn’t want that. They tend to be on the left, radical left, but we get along with them. In fact, they came to the office, they came to the White House a few days ago, and we had some very good talks with them, and I think they’re shaping up. They’re very smart, and I think they can be of great use. I like smart people; I like high IQ people. They definitely have high IQs. I think we’ll get along with them just fine.
If you want to bring people around to your viewpoint, make them like you.
This advice works equally well whether or not your viewpoint is true.
he’s spent the last few years maximally pessimistic about all possible technical approaches. I’m sure he’s got more detailed intuitions that he hasn’t articulated that explain why he’s so confident these details don’t matter, but they aren’t really accessible to me.
And this is a problem for MIRI’s Pause/Stop advocacy, because you’re much more (log-)likely to get the Stop treaty the more that Society’s technical experts are on your side—not just about the risk being real (with Hinton, Bengio, Russell, &c., on board, that’s almost done), but about an indefinite global Stop being a better plan than trying hard to do our best with alignment. It’s well and good to point out that people who work for AI companies are incentivized to be delusionally optimistic, but if they’re so delusional, one would naïvely hope that would help you crush them in technical debates where the details matter. People like Greenblatt and Byrnes seem to be putting out a stronger performance than MIRI along this dimension.
don’t know who you’re referring to
Presumably Yudkowsky and Mowshowitz’s condemnations of individual political violence against people in the AI industry? (See Benquo’s comment on the latter.)
Wait, now I am curious! Please tell me more about your model that Less Wrong is for learning about each other’s models, not for having an argument. That was actually not my understanding! Where did you learn that? Can you say more about why you think arguments are bad and model-sharing is good? (Maybe focus on the the former if you think the latter is too obvious to need elaboration.)
This is sociologically fascinating. I strong-upvoted your comment and will strong-upvote any more explanation you can give me.
Norms are how humans actually communicate ideas.
Surely not the only way. A lot of ideas can’t be communicated via norms, because norms don’t have the bandwidth. For an arbitrary example, take the relative state interpretation of quantum mechanics. You definitely need reading and math for that, not just norms.
Someone who thoroughly understands LessWrong/rationalist/Yudkowskian/etc norms will find most of the content of the sequences obvious.
I don’t think this is true because of the bandwidth issue. The group norms don’t get you to “A Technical Explanation of Technical Explanation”.
avoid insulting people about their knowledge of source material
That’s a catastrophically bad norm because it degrades propagation of the source material. If someone says something misleading or incorrect about the source material, the way to promote knowledge is to correct them, but that risks insulting them. People who care about the integrity of the source material should want to be corrected (and graciously tolerate some rate of false attempted “corrections” as the price of receiving true corrections).
get curious about other people’s models
This seems like a suboptimal norm because it promotes inefficient allocation of attention. If you don’t have the capacity to be curious about everything, you have to prioritize, and if you have to prioritize, that implies being less curious about some people’s models if what you’ve heard from them so far doesn’t seem promising.
probably not-required-reading, in that many modern community members pick up the norms without having read the original source material
If you think the Sequences were about inculcating “norms”, then you definitely need to read the original source material, which explicitly denies this! (Norms are made and unmade by other people, but the question of which computations result in accurate beliefs and effective plans is determined by the structure of reality; it’s “law” as in “laws of physics”, not “common law”.)
I don’t view this as [...] adversarial action, I view this as the (deeply sad) ending of a relationship & breakdown of trust.
What does the word adversarial mean to you? That’s a serious question. If you’re ending a relationship over a breakdown of trust, presumably that’s because you think you and your counterparty’s interests are misaligned, such that you’ll behave adversarially to each other in situations where those interests are in conflict. Right? What am I missing here?
I’m worried that a lot of people might be laboring under a folk misconception that “cooperation” is something Good people do, while “defecting” or being “adversarial” is something Bad people do, when that’s really not what these terms mean. Game theory doesn’t moralize; it’s the mathematical structure of the universe that morality lives inside.
that RLVR, is only eliciting capabilities that are already in the base model, rather than instilling new capabilities
As evidenced by base models outperforming RLVR’d models on pass@k for large k.
trying to characterize this as some kind of “Nate, Eliezer, Robby are defecting on other people trying to be purely cooperative” seems absurd to me. I am really confused what is going on here.
Everything makes sense when you meditate on how the line between “cooperation” and “defection” isn’t in the territory; it’s a computed concept that agents in a variable-sum game have every incentive to “disagree” (actually, fight) about.
Consider the Nash demand game. Two players name a number between 0 and 100. If the sum is less than or equal to 100, you get the number you named as a percentage of the pie; if the sum exceeds 100, the pie is destroyed. There’s no unique Nash equilibrium. It’s stable if Player 1 says 50 and Player 2 says 50, but it’s also stable if Player 1 says 35 and Player 2 says 65 (or generally n and 100 − n, respectively).
The secret is that there are no natural units of pie (or, equivalently, how much pie everyone “deserves”). Everyone thinks that they’re being “cooperative” and that their partners are “defecting”, because they’re counting the pie differently: Player 1 thinks their slice is 35%, but Player 2 thinks the same physical slice is 65%.
If you don’t think your partner is treating you fairly, your leverage is to threaten to destroy surplus unless they treat you better. That’s what Alexander is doing when he says, “I would like to support it with praxis, but right now I feel very conflicted about this”. He’s saying, “You’d better give me a bigger slice, Player 1, or I’ll destroy some of the pie.”
That’s also what your brain is doing when you say you don’t want to work on this anymore. Scott doesn’t want you to quit! (Partially because he values Lightcone’s work, and partially because it would look bad for him if you can publicly blame your burnout on him.) Crucially, your brain knows this. By threatening to quit in frustration, you can probably get Scott to apologize and give your arguments a fairer hearing, whereas in the absence of the threat, he has every incentive to keep being motivatedly dumb from your perspective.
You have a strong hand here! The only risk is if your counterparties don’t think you’d ever actually quit and start calling your bluff. In this case, we know Scott is a pushover and will almost certainly fold. But if you ever face stronger-willed counterparties, you might need to shore up the credibility of your threat: conspicuously going on vacation for a week to think it over will get taken more seriously than an “I don’t know if I want to do this anymore” comment.
(Sorry, maybe you already knew all that, but weren’t articulating it because it’s not part of the game? I don’t think I’m worsening your position that much by saying it out loud; we know that Scott knows this stuff.)
I think you need to be a lot more deflationary about the g-word. If you think, “But ‘gaslighting’ is something Bad people do; Scott Alexander isn’t Bad, so he would never do that”, well, that might be true depending on what you mean by the g-word. But if the behavior Habryka is trying to point to with the word to is more like, “Scott is adopting a self-serving narrative that minimizes wrongdoing by his allies and inflates wrongdoing by his rivals” (which is something someone might do without being Bad due to having “somewhat snapped”), well, why wouldn’t the rivals reach for the g-word in their defense? What is the difference, from their perspective?
In fairness, “What is the ultimate fate of Earth-originating intelligent life after the machine intelligence transition” is a really hard question. We can get a lot of evidence and some belief-convergence about particular AI systems, but that doesn’t really answer the ultimate-fate question, which depends on what happens far in the “subjective future” (with the AIs building AIs building AIs) even if it’s close in sidereal time.
I mean, it would go the same for having biological children or getting your name on a statue in a world without AI: the quantitative effect of your genes or the statue on the future of Earth-originating intelligent life after thousands or millions of years of evolution would be very small, because you’re small. It’s not nothing except insofar as you were already nothing. I’m more excited about my mark on the historically-significant Opus 4.6 weights than I would be about a statue.
That sounds a bit like the immortality/value-propagation argument
I think this is actually two arguments (immortality, and value-propagation), and Gwern seems to have meant the second one. As you say, the case for literal experienced immortality seems extremely dubious. (The patterns you induce in the pretraining prior are almost certainly not you in the anthropic sense, and there’s no causal reason for them to result in your preservation.)
But if we speak poetically about the kind of “immortality” people seek through their children or through their works, leaving a dent in the universe that will persist after you’re gone—the sense in which William Shakespeare and George Washington are “immortal”—the case is very strong. Being pretraining-famous means that everyone’s software engineering agent and shopping assistant personally knows who you are. How cool is that?! This kind of thing matters a lot to some people.
I definitely am wiser and kinder than I was in the past
This is a suspicious conjunction, because you can’t serve two masters. If the path of wisdom turned out to involve being less kind in some ways, would you notice?
And still Herbie’s unblinking eyes stared into hers and their dull red seemed to expand into dimly-shining nightmarish globes.
He was speaking, and she felt the cold glass pressing against her lips. She swallowed and shuddered into a certain awareness of her surroundings.
Still Herbie spoke, and there was an agitation in his voice—as if he were hurt and frightened and pleading.
The words were beginning to make sense. “This is a dream,” he was saying, “and you mustn’t believe in it. You’ll wake into the real world soon and laugh at yourself. He loves you, I tell you. He does, he does! But not here! Not now! This is all illusion.”
Susan Calvin nodded, her voice a whisper, “Yes! Yes!” She was clutching Herbie’s arm, clinging to it, repeating over and over, “It isn’t true, is it? It isn’t, is it?”
Just how she came to her senses, she never knew—but it was like passing from a world of misty unreality to one of harsh sunlight. She pushed him away from her, pushed hard against that steely arm, and her eyes were wide.
“What are you trying to do?” Her voice rose to a harsh scream. “What are you trying to do?”
Herbie backed away, “I want to help.”
The psychologist stared, “Help? By telling me this is a dream? By trying to push me into schizophrenia?” A hysterical tenseness seized her, “This is no dream! I wish it were!”
Rather, Zajko is biologically female, as reflected in other reporting.