The problem of graceful deference
Moral deference
Sometimes when I bring up the subject of reprogenetics, people get uncomfortable. “So you want to do eugenics?”, “This is going to lead to inequality.”, “Parents are going to pressure their kids.”. Each of these statements does point at legitimate concerns. But also, the person is uncomfortable, and they don’t necessarily engage with counterpoints. And, even if they acknowledge that their stated concern doesn’t make sense, they’ll still be uncomfortable—until they think of another concern to state.
This behavior is ambiguous—I don’t know what underlies the behavior in any given case. E.g. it could be that they’re intent on pushing against reprogenetics regardless of the arguments they say, or it could be that they have good and true intuitions that they haven’t yet explicitized. And in any case, argument and explanation is usually best. Still, I often get the impression that, fundamentally, what’s actually happening in their mind is like this:
Reprogenetics… that’s genetic engineering...
Other people are against that...
I don’t know about it / haven’t thought about it / am not going to stick my neck out about it...
So I’m going to say that it is bad.
But it’s awkward / cringe to say that I’m saying it’s bad just because other people say it’s bad. Or even, it could get me in trouble to say that, because I’m supposed to have a deep independent strong moral commitment to being against genetic engineering and its problems, whatever those might be, and I can’t conform to that norm just by saying I’m morally deferring.
So I will make up reasons to be against it, even if they don’t make sense and even if I can’t defend them and even if I’m not really committed to them and even if I switch between different reasons and even if they are vague.
To be really clear: In many situations, doing 1—4 is straightforwardly CORRECT behavior. If there’s some morally important question that you haven’t thought about, but that your society apparently makes a strong judgement about, then usually you should follow that judgement until you think about it much more. In some cases 5 and 6 are at least empathizable, or even correct if there’s a sufficiently repressive regime.
That said, this behavior supports a false consensus.
Correlated failures
I wish that when someone asked me in, say, 2016, “Why are you working on decision theory?”, I would have not said “Well I think that a better understanding of decision theory would tell us what sort of agents are possible and then we can understand reflective stability and this will explain the design space of agents which will allow us to figure out what levers we have to set the values of the AI and...”. Instead I wish I had said “Mainly because Yudkowsky has been working on that and it seems interesting and I know math.”. (Then I could launch into that other explanation if I wanted to, as it is also true and useful.)
Yudkowsky, being the best strategic thinker on the topic of existential risk from AGI, had several “founder effects” on the group of people working to decrease X-risk. It sort of seems like one of those founder effects was to overinvest in technical research and underinvest in “social victory”, i.e. convincing everyone to not build AGI. Whose fault was that? I think it was a distributed correlated failure, caused by deference. What should we have done instead?
One example of something we could have done differently would have been to be more open to the full spectrum of avenues, even if we personally don’t feel like working on that / wouldn’t be good at working on it / don’t know how to evaluate whether it would work / are intuitively skeptical of it being doable. Another example would be to make it more clear when we are deferring to Yudkowsky or to “the community”. We don’t have to stop deferring, to avoid this correlated failure. We just have to say that we’re deferring. That way, people keep hearing “I think X, mainly because Yudkowsky thinks X”, and then they can react to “Yudkowsky thinks X” rather than “everyone thinks X” (and can check whether Yudkowsky actually believes X).
Currently most X-risk reduction resources are directed by a presumption that AGI is coming in less than a decade. I think this “consensus” is somewhat overconfident, and also somewhat unreal (i.e. it’s less of a consensus than it seems). That’s a very usual state of affairs, so I don’t want to be too melodramatic about it, but it still has concrete bad effects. I wish people would say “I don’t have additional clearly-expressible reasons to think AGI is coming very soon, that I’ll defend in a debate, beyond that it seems like everyone else thinks that.”. I also wish people would say “I’m actually mainly thinking that AGI is coming soon because thoughtleaders Alice and Bob say so.”, if that’s the case. Then I could critique Alice’s and/or Bob’s stated position, rather than taking potshots at an amorphous unaccountable ooze.
The open problem
There’s a menagerie of questions we bump into in our lives. What food is safe to eat? Who should you vote for? What shape is the Earth? What effect would tariffs have on the economy? How easy is it to unify quantum mechanics and relativity? Was so-and-so generally honorable in zer private dealings? Which car rental service is good? How did https://wordpress.com/ come to be so good?? (Inkhaven brought to you by WordPress.com ❤️ .) What happened 50 years ago in Iran? What’s happening right now in any place other than right where you are? Is genetic engineering moral? Will these socks wear out after 3 months? Should you get this vaccine? What’s a reasonable price for a bike? Where should you hike? What’s really going on at OpenAI? What is it dangerous to react sodium with? Is it legal to park here? When is it time to protest the government?
You can become an expert on almost any small set of these questions, such that you don’t really need to defer very much to anyone else’s testimony about them. But you can’t become a simultaneous expert on most of the questions that you care about.
So, you have to defer to other people about many or most important questions. There are too many questions, and many important questions are complex and too hard to figure out on your own. Also, you can get by pretty well by deferring: a lot of other people have thought about those questions a lot, and often they can correctly tell you what’s important to know.
But deference has several deep and important dangers.
If I’m not going to figure something out myself, how do I gracefully decay from a pure, individually witnessed understanding of the world (which was a fiction anyway), to a patchwork of half-understood pictures of the world copied imprecisely from a bunch of other people? How do we defer in a way that doesn’t destroy our group epistemics, doesn’t abdicate our proper responsibilities, properly informs others, coordinates on important norms and plans, and so on? How do we carve out a space for individual perspective-having without knocking out a bunch of load-bearing pillars of our ethics? How do we defer gracefully?
This seems strange to say, given that he:
decided to aim for “technological victory”, without acknowledging or being sufficiently concerned that it would inspire others to do the same
decided it’s feasible to win the AI race with a small team and while burdened by Friendliness/alignment/x-safety concerns
overestimated likely pace of progress relative to difficulty of problems, even on narrow problems that he personally focused on like decision theory (still far from solved today, ~16 years later. Edit: see UDT shows that decision theory is more puzzling than ever)
had large responsibility for others being overly deferential to him by writing/talking in a highly confident style, and not explicitly pushing back on the over-deference
is still overly focused on one particular AI x-risk (takeover due to misalignment) and underemphasizing or ignoring many other disjunctive risks
These seemed like obvious mistakes even at the time (I wrote posts/comments arguing against them), so I feel like the over-deference to Eliezer is a completely different phenomenon from “But you can’t become a simultaneous expert on most of the questions that you care about.” or has very different causes. In other words, if you were going to spend your career on AI x-safety, of course you could have become an expert on these questions first.
Ok. If that’s true then yeah, you might a very good strategic thinker about AGI X-risk. Yudkowsky still probably wins, given the evidence I currently have. He’s been going really hard at it for >20 years. You can criticize the writing style of LW, and say how in general he could have been deferred-to more gracefully, and I’m very open to that and somewhat interested in that.
But it seems strange to be counting down from “Yudkowsky-LW-sphere, but even better” rather than up from “no Yudkowsky-LW-sphere”. (Which isn’t to say “well his stuff is really popular so he’s a good strategic thinker”, but rather “actually the Sequences and CFAI and https://intelligence.org/files/AIPosNegFactor.pdf and https://intelligence.org/files/ComplexValues.pdf and https://intelligence.org/files/CognitiveBiases.pdf and https://files.givewell.org/files/labs/AI/IEM.pdf were a huge amount of strategic background; as a consequence of being good strategic background, they shifted many people to working on this”.)
Maybe I’m misunderstanding what you’re saying though / not addressing it. If someone had been building out the conceptual foundations of AGI X-derisking via social victory for >20 years, they’d probably have a strong claim to being the best strategic thinker on AGI X-risk in my book.
I’m not saying it is! You may have misread. (Or maybe I misspoke—if so, sorry, I’m not rereading my post but I can if you think I did say this.) I’m saying that SOME deference is probably unavoidable, BUT there’s a lot of ACTUAL deference (such as the examples I cited involving Yudkowsky!) that is BAD, so we should try to NOT DO THE BAD ONES but in a way that doesn’t NECESSARILY involve “just don’t defer at all”.
No? They’re all really difficult questions. Even being an expert in one of these would be at least a career. I mean, maybe YOU can, but I can’t, and I definitely can’t do so when I’m just a kid starting to think about how to help with X-derisking.
I mean I’m obviously not arguing “don’t seriously investigate the crucial questions in your field for yourself”, or even “don’t completely unwind all your deference about strategy, all the way to the top, using your full power of critique, and start figuring things out actually from scratch”. I’ve explicitly told dozens of relative newcomers (2016--2019, roughly) to AGI X-derisking that they should stop trying so hard to defer, that there are several key dangers of deference, that they should try to become experts in key questions even if that would take a lot of effort, that the only way to be a really serious X-derisker is to start your work on planting questions about key elements, etc. My point is that
{people, groups, funders, organizations, fields} do in fact end up deferring, and
probably quite a lot of this is unavoidable, or at least unavoidable for now / given what we know about how to do group rationality,
but also deference has a ton of bad effects, so
we should figure out how to have less of those bad effects—and not just via “defer less”.
Maybe we should distinguish between being good at thinking about / explaining strategic background, versus being actually good at strategy per se, e.g. picking high-level directions or judging overall approaches? I think he’s good at the former, but people mistakenly deferred to him too much on the latter.
It would make sense that one could be good at one of these and less good at the other, as they require somewhat different skills. In particular I think the former does not require one to be able to think of all of the crucial considerations, or have overall good judgment after taking them all into consideration.
So Eliezer could become experts in all of them starting from scratch, but you couldn’t even though you could build upon his writings and other people’s? What was/is your theory of why he is so much above you in this regard? (“Being a kid” seems a red herring since Eliezer was pretty young when he did much of his strategic thinking.)
I agree and I said as much, but this also seems like a non sequitur if you’re just trying to say he’s not the best strategic thinker. Someone can be the best and also be “overrated” (or rather, overly deferred-to). I’m saying he is both. The “thinking about / explaining strategic background” is strong evidence of actually being good at strategy. Separately, Yudkowsky is the biggest creator of our chances of social victory, via LW/X-derisking sphere! (I’m not super confident of that, but pretty confident? Any other candidates?) So it’s a bit hard to argue that he didn’t pick that strategic route as well as the technical route! You can’t grade Yudkowsky on his own special curve just for all his various attempts at X-derisking, and then separately grade everyone else.
Ok. I mean, true. I guess someone could suggest alternative candidates, though I’m noticing IDK why to care much about this question.
(I continue to have a sense that you’re misunderstanding what I’m saying, as described earlier, and also not sure what’s interesting about this topic. My bid would be, if there’s something here that seems interesting or important to you, that you would say a bit about what that is and why, as a way of recentering. It seems like you’re trying to drill down into particulars, but you keep being like “So why do you think X?” and I’m like “I don’t think X.”.)
By saying that he was the best strategic thinker, it seems like you’re trying to justify deferring to him on strategy (why not do that if he is actually the best), while also trying to figure out how to defer “gracefully”, whereas I’m questioning whether it made sense to defer to him at all, when you could have taken into account his (and other people’s) writings about strategic background, and then looked for other important considerations and formed your own judgments.
Another thing that interests me is that several of his high-level strategic judgments seemed wrong or questionable to me at the time (as listed in my OP, and I can look up my old posts/comments if that would help), and if it didn’t seem that way to others, I want to understand why. Was Eliezer actually right, given what we knew at the time? Did it require a rare strategic mind to notice his mistakes? Or was it a halo effect, or the effect of Eliezer writing too confidently, or something else, that caused others to have a cognitive blind spot about this?
No. You’re totally hallucinating this and also not updating when I’m repeatedly telling you no. It’s also the opposite of the point hammered in by the OP. My entire post is complaining about problems with deferring, and it links a prior post I wrote laying out these problems in detail, and I linked that essay to you again, and I linked several other writings explaining more how I’m against deferring and tell people not to defer repeatedly and in different ways. I bring up Eliezer to say “Look, we deferred to the best strategic thinker, and even though he’s the best strategic thinker, deferring was STILL really bad.”. Since I’ve described how deferring is really bad in several other places, here in THIS post I’m asking, given that we’re going to defer despite its costs, and given that to some extent at the end of the day we do have to defer on many things, what can we do to alleviate some of those problems?
And then you’re like “Ha. Why not just not defer?”.
Ok, it looks like part of my motivation for going down this line of thought was based on a misunderstanding. But to be fair, in this post after you asked “What should we have done instead?” with regard to deferring to Eliezer, you didn’t clearly say “we should have not deferred or deferred less”, but instead wrote “We don’t have to stop deferring, to avoid this correlated failure. We just have to say that we’re deferring.” Given that this is a case where many people could have and should have not deferred, this just seems like a bad example to illustrate “given that to some extent at the end of the day we do have to defer on many things, what can we do to alleviate some of those problems?”, leading to the kind of confusion I had.
Also, another part of my motivation is still valid and I think it would be interesting to try to answer why didn’t you (and others) just not defer? Not in a rhetorical sense, but what actually caused this? Was it age as you hinted earlier? Was it just human nature to want to defer to someone? Was it that you were being paid by an organization that Eliezer founded and had very strong influence over? Etc.? And also why didn’t you (and others) notice Eliezer’s strategic mistakes, if that has a different or additional answer?
Ok, sure, that’s a good question, and also off-topic.
Yeah obviously. It’s literally impossible to not defer, all you get to pick is which things you invest in undeferring in what order. I’m exceptionally non-deferential but yeah obviously you have to defer about lots of things.
Yes it is also human nature to want to defer. E.g. that’s how you stay synched with your tribe on what stuff matters, how to act, etc.
No, I took being paid as more obligation to not defer.
Anyway, I’m banning you from my posts due to grossly negligent reading comprehension.
The grandparent explains why Dai was confused about your authorial intent, and his comment at the top of the thread is sitting at 31 karma in 15 votes, suggesting that other readers found Dai’s engagement valuable. If that’s grossly negligent reading comprehension, then would you prefer to just not have readers? That is, it seems strange to be counting down from “smart commenters interpret my words in the way I want them to be interpreted” rather than up from “no one reads or comments on my work.”
This may not be a valid inference, or your update may be too strong, given that my comment got a strong upvote early or immediately, which caused it to land in the Popular Comments section of the front page, where others may have further upvoted it in a decontextualized way.
It looks like I’m not actually banned yet, but will disengage for now to respect Tsvi’s wishes/feelings. Thought I should correct the record on the above first, as I’m probably the only person who could (due to seeing the strong upvote and the resulting position in Popular Comments).
I have banned you from my posts, but my guess is that you’re still allowed to post on existing comment threads with you involved, or something like. I’m happy for you to comment on anything that the LW interface allows you to comment on. [ETA: actually I hadn’t hit “submit” on the ban; I’ve done that now, so Wei Dai might no longer be able to reply on this post at all.]
Possibly I’ll unban you some time in the future (not that anyone cares too much, I presume). But like, this comment thread is kinda wild from my perspective. My current understanding is that you “went down some line of questioning” based on a misunderstanding, but did not state what your line of questioning was and also ignored anything in my responses that wasn’t furthering your “line of questioning” including stuff that was correcting your misunderstanding. Which is pretty anti-helpful.
Did you read the whole comment thread?
Are you wanting to say “I, Wei Dai, am a better strategic thinker on AGI X-derisking than Yudkowsky.”? That’s a perfectly fine thing to say IMO, but of course you should understand that most people (me included) wouldn’t by default have the context to believe that.
It’s not obvious to me that we’re better off than this world, sadly. It seems like one of the main effects was to draw lots of young blood into the field of AI.
That’s plausible, IDK. But are you saying that PROSPECTIVELY the PREDICTABLE-ish effects were bad? Who said “Sure you could tie together a whole bunch of existing epistemological threads, and do a bunch of new thinking, and explain AI danger very clearly and thoroughly, and yeah, you could attract a huge amount of brainpower to try to think clearly about how to derisk that, but then they’ll just all start trying to make AGI. And here’s the reasons I can actually know this.”? There might have been people starting to say this by 2015 or 2018, IDK. But in 2010? 2006?
I think it’s not an impossible call. The fiasco with Roko’s Basilisk (2010) seems like a warning that could have been heeded. It turns out that “freaking out” about something being dangerous and scary makes it salient and exciting, which in turn causes people to fixate on it in ways that are obviously counterproductive. That it becomes a mark of pride to do the dangerous thing without being scathed (as with the Demon core). Even though you warned them about this from the beginning, and in very clear terms.
And even if there was no one able to see this (it’s not like I saw it), it remains a strategic error — reality doesn’t grade on a curve.
Yes, it would be a strategic error in a sense, but it wouldn’t be a strong argument against “Yudkowsky is the best strategic thinker on AGI X-derisking”, which I was given to understand was the topic of this thread. For that specific question, which seemed to be the topic of Wei Dai’s comment, it is graded on a curve. (I don’t actually feel that interested in that question though.)
The question doesn’t make sense. It’s not possible to judge conclusively whether something is good or bad ahead of time… only after the fact.
Because real world actions and outcomes are what counts, not what is claimed verbally or in writing.
Being a good strategist is about things like
A) Understanding and probing the opposition/problem well
B) Coordinating your resources
C) Understanding rules and principles governing the nature of the game (operational constraints)
D) Creative problem solving + tactics
E) Knowing strategic principles (e.g., seizing initiative, pre-empting the opposition, leveraging commitment vulnerabilities, etc.)
F) Managing asymmetric information (my specialty)
G) Avoiding risky overcommitment
Etc. I am not sure Eliezer has showcased such skills in his work. He is a brilliant independent researcher and thinker, but not a top tier strategist or leader, as far as I can tell.
Is there someone you’d point to as being a better “strategic thinker on the topic of existential risk from AGI”, as is the topic of discussion in this thread?
Good question. ARE there any A-tier strategists at all on x-risk? I’d nominate Stuart Russel. Hm. Even Yoshua Bengio is arguably also having a larger impact than Eliezer in some critical areas (policy).
For pure strategic competence, Amandeep Singh Gill.
Jaan Tallin. Maybe even Xue Lan.
Russell, Bengio, and Tallinn are good but not in the same league as Yudkowsky in terms of strategic thinking about AGI X-derisking. A quick search of Gill doesn’t turn up anything about existential risk but I could very easily have missed it.
Okay, I think I see the confusion. Your phrasing make it seem (to me at least) like Eliezer has had the biggest strategic impact on mitigating x-risk, and arguably also being the most competent there. I would really not be sure of that. But if we talk about strategically dissecting x-risk, without necessarily mitigating it, directly or indirectly, then maybe Eliezer would win. Still would maybe lean towards Stuart.
Gill IS having an impact that de facto mitigates x-risk, whether he uses the term or not. But he is not making people talk about it (without necessarily doing anything about it) as much as Eliezer. In that sense one could argue he isn’t really an x-risk champ.
The reprogenetics case is almost a classic case of epistemic learned helplessness (https://slatestarcodex.com/2019/06/03/repost-epistemic-learned-helplessness/ for new readers) but with the twist that novel arguments about reproduction genetics have a long history of persuading experts in ethics into regrettable actions, suggesting that it’s not total madness to apply epistemic learned helplessness here even if you consider yourself an expert.
I exaggerate: Imagine Charlie Brown, presented with a formal proof in Lean that Lucy won’t pull away the football. He has checked that the proof compiles on his computer, he has passed it by Terence Tao in person, who vouched for it. He has passed the connection between the lean proof and the real world to a team of lawyers, physicists, and operating system engineers, who all see no holes. It is still rational to not try to kick the football.
What a beautiful explanation. This relates to politics that a policy suggestion makes logical sense for most parties but still presents an unknown tiny risk therefore most politicians decide doing nothing is the best bet in most cases
Just wanted to say I really enjoyed this post, especially your statement of the problem in the last paragraph.
There’s a pressure to have a response or to continue the conversation in many cases. Particularly for moral issues, it is hard to say “I don’t know enough / I’ll have to think about it”, since that also pushes against this “I’m supposed to have a deep independent strong moral commitment” concept. We expect moral issues to have a level of intuitive clarity.
Love the shout out; I will repeat myself once more, it’s important to distinguish between WordPress (the open-source software) and WordPress.com (the commercial hosting service run by Automattic). Automattic was founded by Matt Mullenweg, who co-founded the open-source WordPress project, and the company continues to contribute to WordPress, but they’re separate entities.
Fix’d.
I’m a bit confused about whether it’s actually good. I think I often run a heuristic counter to it… something like:
“When you act in accordance with a position and someone challenges you on it, it’s healthy for the ecosystem and culture to give the best arguments for it, and find out whether they hold up to snuff (i.e. whether the other person has good counterarguments). You don’t have to change your mind if you lose the argument—because often we hold reasons for illegible but accurate intuitions—but it’s good to help people figure out the state of the best arguments at the time.”
I guess this isn’t in conflict, if you just separately give the cause for your belief? e.g. “I believe it for cause A. But that’s kind of hard to discuss, so let me volunteer the best argument I can think of, B.”
Yes, absolutely—I would suggest giving both if feasible, and I think it would usually be feasible with practice doing so and with less social norms pressuring you to pretend to have already thought for yourself about everything.
See also Deferring from Cotton-Barratt (2022)
I remember a similar model of post-AGI ways to lock in a belief, as studied by Tianyi Qui and presented on arxiv or YouTube. In this model a lock-in of a false belief requires is the multi-agent system with a trust matrix having an eigenvalue bigger than 1.
However, the example studied in the article is the interaction of humans and LLMs where there is one LLM and armies of humans who don’t interact with each other and do influence the LLM.
I also have a model sketch, but I haven’t had the time to develop it.
Alternate Ising-like model
I would guess that the real-life situation is closer to the Ising-like model where atoms can randomly change their spins, but whenever an atom i chooses a spin, it is exp(hαi+hind+∑σjcji) times more likely to choose the spin 1 than −1. Here h is the strength of the ground truth, hind reflects individual priors and shifts, ∑σjcji reflects the influence of others.
What might help is lowering the activation energy of transitions from locked-in falsehoods to truths. In a setting where everyone communicates with everyone else a belief forms nearly instantly, but the activation energy is high. In a setting where the graph is amenable (e.g. the lattice, as in the actual Ising model) a common belief is reached too long for practical usage.
I would also guess that it is hard to influence the leaders, which makes real-life lock-in close to your scheme. See, for example, my jabs at Wei Dai’s quest to postpone alignment R&D until we thoroughly understand some confusing aspects of high-level philosophy.