Partial summary of debate with Benquo and Jessicata [pt 1]

Note: I’ll be trying not to engage too much with the object level discussion here – I think my marginal time on this topic is better spent thinking and writing longform thoughts. See this comment.

Over the past couple months there was some extended discussion including myself, Habryka, Ruby, Vaniver, Jim Babcock, Zvi, Ben Hoffman, Jessicata and Zack Davis. The discussion has covered many topics, including “what is reasonable to call ‘lying’”, and “what are the best ways to discuss and/​or deal with deceptive patterns in public discourse”, “what norms and/​or principles should LessWrong aspire to” and others.

This included comments on LessWrong, email, google-docs and in-person communication. This post is intended as an easier-to-read collection of what seemed (to me) like key points, as well as including my current takeaways.

Part of the challenge here was that it seemed like Benquo and I had mostly similar models, but many critiques I made seemed to Ben to be in the wrong abstraction, and vice-versa. Sometimes I would notice particular differences like “In my model, it’s important that accusations be held to a high standard”, whereas Ben felt “it’s important that criticism not be held to higher standards than praise.” But knowing this didn’t seem to help much.

This post mostly summarizes existing online conversation. I’m hoping to do a followup where I make good on my promise to think more seriously through my cruxes-on-ontology, but it’s slow going.

Comment Highlights

This begins with some comment highlights from LessWrong which seemed useful to gather in one place, followed by my takeaways after the fact.

I attempt to pass Ben’s ITT

In the comments of “Rationalization” and “Sitting Bolt Upright in Alarm”, a few things eventually clicked and I attempted to pass Ben’s Ideological Turing Test:

Let me know how this sounds as an ITT:

Thinking and building a life for yourself

  • Much of civilization (and the rationalsphere as a subset of it and/​or memeplex that’s influenced and constrained by it) is generally pointed in the wrong direction. This has many facets, many of which reinforce each other. Society tends to:

  • Schools systematically teach people to associate reason with listening-to/​pleasing-teachers, or moving-words-around unconnected from reality. [Order of the Soul]

  • Society systematically pushing people to live apart from each other, to work until they need (or believe they need) palliatives, in a way that doesn’t give you space to think [Sabbath Hard and Go Home]

  • Relatedly, society provides structure that incentivizes you to advance in arbitrary hierarchy, or to tread water and barely stay afloat, without reflection of what you actually want.

  • By contrast, for much of history, there was a much more direct connection between what you did, how you thought, and how your own life was bettered. If you wanted a nicer home, you built a nicer home. This came with many overlapping incentive structures reinforced something closer to living healthily and generating real value.

  • (I’m guessing a significant confusion was me seeing this whole section as only moderately connected rather than central to the other sections)

We desperately need clarity

  • There’s a collection of pressures, in many-but-not-all situations, to keep both facts and decision-making principles obfuscated, and to warp language in a way that enables that. This is often part of an overall strategy (sometimes conscious, sometimes unconscious) to maneuver groups for personal gain.

  • It’s important to be able to speak plainly about forces that obfuscate. It’s important to lean _fully_into clarity and plainspeak, not just taking marginal steps towards it, both because clear language is very powerful intrinsically, and there’s a sharp dropoff as soon as ambiguity leaks in (moving the conversation to higher simulacrum levels, at which point it’s very hard to recover clarity)

[Least confident] The best focus is on your own development, rather than optimizing systems or other people

  • Here I become a lot less confident. This is my attempt to summarize whatever’s going on in our disagreement about my “When coordinating at scale, communicating has to reduce gracefully to about 5 words” thing. I had an impression that this seemed deeply wrong, confusing, or threatening to you. I still don’t really understand why. But my best guesses include:

  • This is putting the locus of control in the group, at a moment-in-history where the most important thing is reasserting individual agency and thinking for yourself (because many groups are doing the wrong-things listed above)

  • Insofar as group coordination is a lens to be looked through, it’s important that groups a working in a way that respects everyone’s agency and ability to think (to avoid falling into some of the failure modes associated with the first bullet point), and simplifying your message so that others can hear/​act on it is part of an overall strategy that is causing harm

  • Possibly a simpler “people can and should read a lot and engage with more nuanced models, and most of the reason you might think that they can’t is because school and hierarchical companies warped your thinking about that?” And then, in light of all that, something is off with my mood when I’m engaging with individual pieces of that, because I’m not properly oriented around the other pieces? Does that sound right? Are there important things left out or gotten wrong?

Ben responded:

This sounds really, really close. Thanks for putting in the work to produce this summary!

I think my objection to the 5 Words post fits a pattern where I’ve had difficulty expressing a class of objection. The literal content of the post wasn’t the main problem. The main problem was the emphasis of the post, in conjunction with your other beliefs and behavior.

It seemed like the hidden second half of the core claim was “and therefore we should coordinate around simpler slogans,” and not the obvious alternative conclusion “and therefore we should scale up more carefully, with an uncompromising emphasis on some aspects of quality control.” (See On the Construction of Beacons for the relevant argument.)

It seemed to me like there was some motivated ambiguity on this point. The emphasis seemed to consistently recommend public behavior that was about mobilization rather than discourse, and back-channel discussions among well-connected people (including me) that felt like they were more about establishing compatibility than making intellectual progress. This, even though it seems like you explicitly agree with me that our current social coordination mechanisms are massively inadequate, in a way that (to me obviously) implies that they can’t possibly solve FAI.

I felt like if I pointed this kind of thing out too explicitly, I’d just get scolded for being uncharitable. I didn’t expect, however, that this scolding would be accompanied by an explanation of what specific, anticipation-constraining, alternative belief you held. I’ve been getting better at p_ointing out this pattern_ (e.g. my recent response to habryka) instead of just shutting down due to a preverbal recognition of it. It’s very hard to write a comment like this one clearly and without extraneous material, especially of a point-scoring or whining nature. (If it were easy I’d see more people writing things like this.)

Summary of Private LessWrong Thread (Me/​Benquo/​Jessica)

One experiment we tried during the conversation was to hold a conversation on LessWrong, in a private draft (i.e. where we could respond to each other with nested threading, but only have to worry about responding to each other)

The thread was started by Ruby, with some proposals for LessWrong moderation style. At first the conversation was primarily Ruby and Zvi. At some point Ruby might make the full thread public, but for now I’m focusing on an exchange between Benquo, Jessica and I, which I found most helpful for clarifying our positions.


It might help for me to also try to make a positive statement of what I think is at stake here. [...]

What I see as under threat is the ability to say in a way that’s actually heard, not only that opinion X is false, but that the process generating opinion X is untrustworthy, and perhaps actively optimizing in an objectionable direction. Frequently, attempts to say this are construed p_rimarily_ as moves to attack some person or institution, pushing them into the outgroup. Frequently, people suggest to me an “equivalent” wording with a softer tone, which in fact omits important substantive criticisms I mean to make, while claiming to understand what’s at issue.

I responded:

My core claim is: “right now, this isn’t possible, without a) it being heard by many people as an attack, b) without people having to worry that other people will see it as an attack, even if they don’t.”

It seems like you see this something as “there’s a precious thing that might be destroyed” and I see it as “a precious thing does not exist and must be created, and the circumstances in which it can exist are fragile.” It might have existed in the very early days of LessWrong. But the landscape now is very different than it was then. With billions of dollars available and at stake, what worked then can’t be the same thing as what works now.

[in public. In private things are much easier. It’s *also* the case that private channels enable collusion – that was an update i’ve made over the course of the conversation.]

And, while I believe that you earnestly believe that the quote paragraph is important, your individual statements often look too optimized-as-an-obfuscated-attack for me to trust that they are not. I assign substantial probability to a lot of your motives being basically traditional coalition-political and you are just in denial about it, with a complicated narrative to support them. If that’s not true, I realize it must be extremely infuriating to be treated that way. But the nature of the social landscape makes it a bad policy for me to take you at your word in many of the cases.

Wishing the game didn’t exist doesn’t make the game not exist. We could all agree to stop playing at once, but a) we’d need to credibly believe we were all actually going to stop playing at once, b) have enforcement mechanisms to make sure it continues not being played, c) have a way to ensure newcomers are also not playing.

And I think that’s all possibly achievable, incrementally. I think “how to achieve that” is a super important question. But attempting to not-play the game without putting in that effort looks me basically like putting a sign that says “cold” on a broken refrigerator and expecting your food to stay fresh.


I spent a few minutes trying to generate cruxes. Getting to “real” cruxes here feels fairly hard and will probably take me a couple hours. (I think this conversation is close to the point where I’d really prefer us to each switch to the role of “Pass each other’s ITTs, and figure out what would make ourselves change our mind” rather than “figure out how to explain why we’re right.” This may require more model-sharing and trust-building first, dunno)

But I think the closest proximate crux is: I would trust Ben’s world-model a lot more if I saw a lot more discussion of how the game theory plays out over multiple steps. I’m not that confident that my interpretation of the game theory and social landscape are right. But I can’t recall any explorations of it, and I think it should be at least 50% of the discussion here.

But the landscape now is very different than it was then. With billions of dollars available and at stake, what worked then can’t be the same thing as what works now.

Jessica responds to me:

Is this a claim that people are almost certainly going to be protecting their reputations (and also beliefs related to their reputations) in anti-epistemic ways when large amounts of money are at stake, in a way they wouldn’t if they were just members of a philosophy club who didn’t think much money was at stake?

This claim seems true to me. We might actually have a lot of agreement. And this matches my impression of “EA/​rationality shift from ‘that which can be destroyed by the truth should be’ norms towards ‘protect feelings’ norms as they have grown and want to play nicely with power players while maintaining their own power.”

If we agree on this point, the remaining disagreement is likely about the game theory of breaking the bad equilibrium as a small group, as you’re saying it is.

(Also, thanks for bringing up money/​power considerations where they’re relevant; this makes the discussion much less obfuscated and much more likely to reach cruxes)

[Note, my impression is that the precious thing already exists among a small number of people, who are trying to maintain and grow the precious thing and are running into opposition, and enough such opposition can cause the precious thing to go away, and the precious thing is currently being maintained largely through willingness to forcefully push through opposition. Note also, if the precious thing used to exist (among people with strong stated willingness to maintain it) and now doesn’t, that indicates that forces against this precious thing are strong, and have to be opposed to maintain the precious thing.]

I responded:

An important thing I said earlier in another thread was that I saw roughly two choices for how to do the precious thing, which is something like:

  • If you want to do the precious thing in public (in particular when billions of dollars are at stake, although also when narrative and community buy-in are at stake), it requires a lot of special effort, and is costly

  • You can totally do the precious thing in small private, and it’s much easier

  • And I think a big chunk of the disagreement comes from the ‘small private groups are also a way that powerful groups collude, and be duplicitous, and other things in that space.’

[There’s a separate issue, which is that researchers might feel more productive, locally, in private. But failure to write up their ideas publicly means other people can’t build on them, which is globally worse. So you also want some pressure on research groups to publish more]

So the problem-framing as I currently see it is:

  • What are the least costly ways you can have plainspoken truth in public, without destroying (or resulting in someone else destroying) the shared public space. Or, what collection of public truthseeking norms output the most useful true things per unit of effort in a sustainable fashion

  • What are ways that we can capture the benefits of private spaces (sometimes recruiting new people into the private spaces), while having systems/​norms/​counterfactual-threats in place to prevent collusion and duplicity, and encourage more frequent publishing of research.

And the overall strategy I currently expect to work best (but with weak confidence, haven’t thought it through) is:

  • Change the default of private conversations from ‘stay private forever’ to ’by default, start in private, but with an assumption that the conversation will usually go public unless there’s a good reason not to, with participants having veto* power if they think it’s important not to go public.”

  • An alternate take on “the conversation goes public” is “the participants write up a distillation of the conversation that’s more optimized for people to learn what happened, which both participants endorse.” (i.e. while I’m fine with all my words in this private thread being shared, I think trying to read the entire conversation might be more confusing than it needs to be. It might not be worth anyone’s time to write up a distillation, but if someone felt like it I think that’d be preferable all else being equal)

  • Have this formally counterbalanced by “if people seem to be abusing their veto power for collusion or duplicitous purposes, have counterfactual threats to publicly harm each other’s reputation (possibly betraying the veto-process*), which hopefully doesn’t happen, but the threat of it happening keeps people honest.

*Importantly, a formal part of the veto system is that if people get angry enough, or decide it’s important enough, they can just ignore your veto. If the game is rigged, the correct thing to do is kick over the gameboard. But, everyone has a shared understanding that a gameboard is better than no gameboard, so instead, people are incentivized to not rig the game (or, if the game is currently rigged, work together to de-rig it)

Because everyone agrees that these are the rules of the metagame, betraying the confidence of the private space is seen as a valid action (i.e. if people didn’t agree that these were the meta-rules, I’d consider betraying someone’s confidence to be a deeply bad sign about a person’s trustworthiness. But if people d_oa_gree to the meta-rules, then if someone betrays a veto it’s a sign that you should maybe be hesitant to collaborate with that person, but not as strong a sign about their overall trustworthiness)


I’m first going to summarize what I think you think:

  • $Billions are at stake.

  • People/​organizations are giving public narratives about what they’re doing, including ones that affect the $billions.

  • People/​organizations also have narratives that function for maintaining a well-functioning, cohesive community.

  • People criticize these narratives sometimes. These criticisms have consequences.

  • Consequences include: People feel the need to defend themselves. People might lose funding for themselves or their organization. People might fall out of some “ingroup” that is having the important discussions. People might form coalitions that tear apart the community. The overall trust level in the community, including willingness to take the sensible actions that would be implied by the community narrative, goes down.

  • That doesn’t mean criticism of such narratives is always bad. Sometimes, it can be done well.

  • Criticisms are important to make if the criticism is really clear and important (e.g. the criticism of ACE). Then, people can take appropriate action, and it’s clear what to do. (See strong and clear evidence)

  • Criticisms are potentially destructive when they don’t settle the matter. These can end up reducing cohesion/​trust, splitting the community, tarnishing reputations of people who didn’t actually do something wrong, etc.

  • These non-matter-settling criticisms can still be important to make. But, they should be done with sensitivity to the political dynamics involved.

  • People making public criticisms willy-nilly would lead to a bunch of bad effects (already mentioned). There are standards for what makes a good criticism, where “it’s true/​well-argued” is not the only standard. (Other standards are: is it clear, is it empathetic, did the critic try other channels first, etc)

  • It’s still important to get to the truth, including truths about adversarial patterns. We should be doing this by thinking about what norms get at these truths with minimum harm caused along the way.

Here’s a summary of what I think (written before I summarized what you thought):

  • The fact that $billions are at stake makes reaching the truth in public discussions strictly more important than for a philosophy club. (After all, these public discussions are affecting the background facts that private discussions, including ones that distribute large amounts of money, assume)

  • The fact that $billions are at stake increases the likelihood of obfuscatory action compared to in a philosophy club.

  • The “level one” thing to do is to keep using philosophy club norms, like old-LessWrong. Give reasons for thinking what you think. Don’t make appeals to consequences or shut people up for saying inconvenient things; argue at the object level. Don’t insult people. If you’re too sensitive to hear the truth, that’s for the most part your problem, with some exceptions (e.g. some personal insults). Mostly don’t argue about whether the other people are biased/​adversarial, and instead make good object-level arguments (this could be stated somewhat misleadingly as “assume good faith”). Have public debates, possibly with moderators.

  • A problem with “level one” norms is that they rarely talk about obfuscatory action. “Assume good faith”, taken literally, implies obfuscation isn’t happening, which is false given the circumstances (including monetary incentives). Philosophy club norms have some security flaws.

  • The “level two” thing to do is to extend philosophy club norms to handle discussion of adversarial action. Courts don’t assume good faith; it would be transparently ridiculous to do so.

  • Courts blame and disproportionately punish people. We don’t need to do this here, we need the truth to be revealed one way or another. Disproportionate punishments make people really defensive and obfuscatory, understandably. (Law fought fraud, and fraud won)

  • So, “level two” should develop language for talking about obfuscatory/​destructive patterns of social action that doesn’t disproportionately punish people just for getting caught up in them. (Note, there are some “karmic” consequences for getting caught up in these dynamics, like having the organization be less effective and getting a reputation for being bad at resisting social pressure, but these are very different from the disproportionate punishments typical of the legal system, which punish disproportionately on the assumption that most crime isn’t caught)

  • I perceive a backslide from “level one” norms, towards more diplomatic norms, where certain things are considered “rude” to say and are “attacking people”, even if they’d be accepted in philosophy club. I think this is about maintaining power illegitimately.

Here are more points that I thought of after summarizing your position:

  • I actually agree that individuals should be using their discernment about how and when to be making criticisms, given the political situation.

  • I worry that saying certain ways of making criticisms are good/​bad results in people getting silenced/​blamed even when they’re saying true things, which is really bad.

  • So I’m tempted to argue that the norms for public discussion should be approximately “that which can be destroyed by the truth should be”, with some level of privacy and politeness norms, the kind you’d have in a combination of a philosophy club and a court.

  • That said, there’s still a complicated question of “how do you make criticisms well”. I think advice on this is important. I think the correct advice usually looks more like advice to whistleblowers than advice for diplomacy.

Note, my opinion of your opinions, and my opinions, are expressed in pretty different ontologies. What are the cruxes?

Suppose future-me tells me that I’m pretty wrong, and actually I’m going about doing criticisms the wrong way, and advocating bad norms for criticism, relative to you. Here are the explanations I come up with:

  • “Scissor statements” are actually a huge risk. Make sure to prove the thing pretty definitively, or there will be a bunch of community splits that make discussion and cooperation harder. Yes, this means people are getting deceived in the meantime, and you can’t stop that without causing worse bad consequences. Yes, this means group epistemology is really bad (resembling mob behavior), but you should try upgrading that a different way.

  • You’re using language that implies court norms, but courts disproportionately punish people. This language is going to increase obfuscatory behavior way more than it’s worth, and possibly result in disproportionate punishments. You should try really, really hard to develop different language. (Yes, this means some sacrifice in how clear things can be and how much momentum your reform movement can sustain)

  • People saying critical things about each other in public (including not-very-blamey things like “I think there’s a distortionary dynamic you’re getting caught up in”) looks really bad in a way that deterministically makes powerful people, including just about everyone with money, stop listening to you or giving you money. Even if you get a true discourse going, the community’s reputation will be tarnished by the justice process that led to that, in a way that locks the community out of power indefinitely. That’s probably not worth it, you should try another approach that lets people save face.

  • Actually, you don’t need to be doing public writing/​criticism very much at all, people are perfectly willing to listen to you in private, you just have to use this strategy that you’re not already using.

These are all pretty cruxy; none of them seem likely (though they’re all plausible), and if I were convinced of any of them, I’d change my other beliefs and my overall approach.

There are a lot of subtleties here. I’m up for having in-person conversations if you think that would help (recorded /​ written up or not).

Me final response in that thread:

This is an awesome comment on many dimensions, thanks. I both agree with your summary of my position, and I think your cruxes are pretty similar to my cruxes.

There are a few additional considerations of mine which I’ll list, followed by attempting to tease out some deeper cruxes of mine about “what facts would have to be true for me to want to backpropagate the level of fear it seems like you feel into my aesthetic judgment.” [This is a particular metaframe I’m currently exploring]

[Edit: turned out to be more than a few straightforward assumptions, and I haven’t gotten to the aesthetic or ontology cruxes yet]

Additional considerations from my own beliefs:

  • I define clarity in terms of what gets understood, rather than what gets said. So, using words with non-standard connotations, without doing a lot of up-front work to redefine your terms, seems to me to be reducing clarity, and/​or mixing clarity, rather than improving it.

  • I think it’s especially worthwhile to develop non-court language, for public discourse, if your intent is not to be punative – repurposing court language for non-punative action is particularly confusing. The first definition for “fraud” that comes up on google is “wrongful or criminal deception intended to result in financial or personal gain”. The connotation I associate it with is “the kind of lying you pay fines or go to jail for or get identified as a criminal for”.

  • By default, language-processing is a mixture of truthseeking and politicking. The more political a conversation feels, the harder it will be for people to remain in truthseeking mode. I see the primary goal of a rationalist/​truthseeking space to be to ensure people remain in truthseeking mode. I don’t think this is completely necessary but I do think it makes the space much more effective (in terms of time spent getting points across).

  • I think it’s very important for language re: how-to-do-politics-while-truthseeking be created separately from any live politics – otherwise, one of the first things that’ll happen is the language get coopted and distorted by the political process. People are right/​just to fear you developing political language if you appear to be actively trying to wield political weapons against people while you develop it.

  • Fact that is (quite plausibly) my true rejection – Highly tense conversations that I get defensive at are among the most stressful things I experience, which cripple my ability to sleep well while doing them. This is high enough cost that if I had to do it all the time, I would probably just tune them out.

  • This is a selfish perspective, and I should perhaps be quite suspicious of the rest of my arguments in light of it. But it’s not obviously wrong to me in the first place – having stressful weeks of sleep wrecked is really bad. When I imagine a world where people are criticizing me all the time [in particular when they’re misunderstanding my frame, see below about deep model differences], it’s not at all obvious that the net benefit I or the community gets from people getting to express their criticism more easily outways the cost in productivity (which would, among other things, be spent on other truthseeking pursuits). When I imagine this multiplied across all orgs it’s not very surprising or unreasonable seeming for people to have learned to tune out criticism.

  • Single Most Important Belief that I endorse – I think trying to develop a language for truthseeking-politics (or politics-adjaecent stuff) could potentially permanently destroy the ability for a given space do politics sanely. It’s possible to do it right, but also very easy to fuck up, and instead of properly transmitting truthseeking-into-politics, politics backpropogates into truthseeking, causes people to view truthseeking norms as a political weapon. I think this is basically what happened with the American Right Wing and their view of science (and I think things like the March for Science are harmful because they exacerbate Science as Politics).

  • In the same way that it’s bad to tell a lie, to accomplish some locally good thing (because the damage you do to the ecosystem is far worse than whatever locally good thing you accomplished), I think it is bad to try to invent truthseeking-politics-on-the-fly without explaining well what you are doing while also making claims that people are (rightly) worried will cost them millions of dollars. Whatever local truth you’re outputting is much less valuable than the risks you are playing with re: the public commons of “ability to ever discuss politics sanely.”

  • I really wish we had developed good tools to discuss politics sanely before we got access to billions of dollars. That was an understandable mistake (I didn’t think about it until just this second), but it probably cost us deeply. Given that we didn’t, I think creating good norms requires much more costly signaling of good faith (on everyone’s part) than it might have needed. [this paragraph is all weak confidence since I just thought of it but feels pretty true to me]

  • People have deep models, in which certain things seem obvious them that are not obvious to others. I think I drastically disagree with you about what your prior should be that “Bob has a non-motivated deep model (or, not any more motivated than average) that you don’t understand”, rather than “Bob’s opinion or his model is different/​frightening because he is motivated, deceptive and/​or non-truth-tracking.”

  • My impression is that everyone with a deep, weird model that I’ve encountered was overly biased in favor of their deep model (including you and Ben), but this seems sufficiently explained by “when you focus all your attention on one particular facet of reality, that facet looms much larger in your thinking, and other facets loom less large”, with some amount of “their personality or circumstance biased them towards their model” (but, not to a degree that seems particularly weird or alarming).

  • Seeing “true reality” involves learning lots of deep models into narrow domains and then letting them settle.

  • [For context/​frame, remember that it took Eliezer 2 years of blogging every day to get everyone up to speed on how to think in his frame. That’s roughly the order-of-magnitude of effort that seems like you should expect to expend to explain a counterintuitive worldview to people]

  • In particular, a lot of the things that seem alarming to you (like, Givewell’s use of numbers that seem wrong) is pretty well (but not completely) explained by “it’s actually very counterintuitive to have the opinions you do about what reasonable numbers are.” I have updated more towards your view on the matter, but a) it took me a couple years, b) it still doesn’t seem very obvious to me. Drowning-Children-are-Rare is a plausible hypothesis but doesn’t seem so overdetermined that anyone thinks otherwise must be deeply motivated or deceptive.

  • I’m not saying this applies across the board. I can think of several people in EA or rationalist space who seem motivated in important ways. My sense of deep models specifically comes from the combination of “the deep model is presented to me when I inquire about it, and makes sense”, and “they have given enough costly signals of trustworthiness that I’m willing to give them the benefit of the doubt.”

  • I have updated over the past couple years on how bad “PR management” and diplomacy are for your ability to think, and I appreciate the cost a bit more, but it still seems less than the penalties you get for truthseeking when people feel unsafe.

  • I have (low confidence) models that seem fairly different from Ben (and I assume your) model of what exactly early LessWrong was like, and what happened to it. This is complicated and I think beyond scope for this comment.

  • Unknown Unknowns, and model-uncertainty. I’m not actually that worried about scissor-attacks, and I’m not sure how confident I am about many of the previous models. But they are all worrisome enough that I think caution is warranted.

“Regular” Cruxes

Many of the above bullet-points are cruxy and suggest natural crux-reframes. I’m going to go into some detail for a few:

  • I could imagine learning that my priors on “deep model divergence” vs “nope, they’re just really deceptive” are wrong. I don’t actually have all that many data points to have longterm confidence here. It’s just that so far, most of the smoking guns that have been presented to me didn’t seem very definitive.

  • The concrete observations that would shift this are “at least one of the people that I have trusted turns out to have a smoking gun that makes me think their deep model was highly motivated” [I will try to think privately about what concrete examples of this might be, to avoid a thing where I confabulate justifications in realtime.]

  • It might be a lot easier than I think to create a public truthseeking space that remains sane in the face of money and politics. Relatedly, I might be overly worried about the risk of destroying longterm ability to talk-about-politics-sanely.

  • If I saw an existing community that operated on a public forum and onboarded new people all the time, which had the norms you are advocating, and interviewing various people involved seemed to suggest it was working sanely, I’d update. I’m not sure if there are easier bits of evidence to find.

  • The costs that come from diplomacy might be higher than the costs of defensiveness.

  • Habryka has described experiences where diplomacy/​PR-concerns seemed bad-for-his-soul in various ways. [not 100% sure this is quite the right characterization but seems about right]. I think so far I haven’t really been “playing on hard mode” in this domain, and I think there’s a decent chance that I will be over the next few years. I could imagine updating about how badly diplomacy cripples thought after having that experience, and for it to turn out to be greater than defensiveness.

  • I might be the only person that suffers from sleep loss or other stress-side-effects as badly as I do.

These were the easier ones. I’m trying to think through the “ontology doublecrux” thing and think about what sorts of things would change my ontology. That may be another while.

Criticism != Accusation of Wrongdoing

Later on, during an in-person conversation with Jessica, someone else (leaving them anonymous) pointed out an additional consideration, which is that criticism isn’t the same as accusations.

[I’m not sure I fully understood the original version of this point, so the following is just me speaking for myself about things I believe]

There’s an important social technology, which is to have norms that people roughly agree on. The costs of everyone having to figure out their own norms are enormous. So most communities have at least some basic things that you don’t do (such as blatantly lying)

Several important properties here are:

  • You can ostracize people who continuously violate norms.

  • If someone accuses you of a norm violation, you feel obligated to defend yourself. (Which is very different from getting criticized for something that’s not a norm violation)

  • If Alice makes an accusation of someone violating norms, and that accusation turns out to be exaggerated or ill-founded, that Alice loses points, and people are less quick to believe her or give her a platform to speak next time.

I think one aspect of the deep disagreements going on here is something like “what exactly are the costs of everyone having to develop their own theory of goodness”, and/​or what are the benefits of the “there are norms, that get enforced and defended” model.

I understand Benquo and Jessica are arguing that we do not in fact have such norms, we just have the illusion of such norms, and in fact what we have are weird political games that benefit the powerful. And they see their approach as helping to dispel that illusion.

Whereas I think we do in fact have those norms – there’s a degree of lying that would get you expelled from the rationalsphere and EAsphere , and this is important. And so insisting on being able to discuss, in public, whether Bob lied [a norm violation], while claiming that this is not an attack on Bob, just an earnest discussion of the truth or model-building of adversarial discourse… is degrading not only the specific norm of “don’t lie” but also “our general ability to have norms.”

My current state

I’m currently in the process of mulling this all over. The high level questions are something like:

  • [Within my current ontology] What sorts of actions by EA leaders would shift my position from “right now we actually have a reasonably good foundation of trustworthiness” to “things are not okay, to the point where it makes more sense to kick the game board over rather than improve things.” Or, alternately “things are not okay, and I need to revise my ontology in order to account for it.”

  • How exactly would/​should I shift my ontology if things were sufficiently bad?

I expect this to be a fairly lengthy process, and require a fair amount of background processing.

There are other things I’m considering here, and writing them up turned out to take more time than I have at the moment. Will hopefully have a Pt 2 of this post.