# jessicata(Jessica Taylor)

Jessica Taylor. CS undergrad and Master’s at Stanford; former research fellow at MIRI.

I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.

Blog: unstableontology.com

• I blog, I think it’s enhanced my life a lot, I think it’s improved my career a lot (i.e. I get grants more easily and have access to more desirable jobs because I blog). I don’t think everyone should blog (?), but I’m going to say how I deal with all the listed problems.

1. Blogging will give society more influence over your thinking.

Sometimes when I disagree with society I write a blog post. That makes it easier to feel justified in disagreeing with society. It’s a thing to point other people to if I want them to also disagree with society in the same way.

There are some things I don’t write about because I’m too embarrassed of my opinions etc. That isn’t really worse than the situation of not blogging at all, though.

A lot of this has more to do with audience than blogging itself. If you promote the blog on LessWrong then you’ll be tempted to adhere to LW frameworks, etc. There are some things I write on my blog and don’t post to LW because I anticipate there being too many annoying comments enforcing local stylistic conventions. Some posts can be written for a small set of specific people you know (e.g. they can be cleaned-up versions of email threads).

1. It will enforce your current beliefs and identity.

It does make it more legible when you change your mind. It also builds up a body of work, it’s tempting to write more things that share the same framework etc. This is like being an academic and building up a body of work. If you change your mind and write about it you can refer to the previous post as defining your earlier position, making it clearer what the update is. Having a body of work has benefits, often the alternative isn’t being free to adopt whatever is best, it’s following fashions due to lacking a leg to stand on, so to speak.

(I titled my blog “unstable ontology” partially to hint that it’s expected that my frameworks will change over time.)

1. It will make your beliefs visible.

I generally find that when I write something controversial, I gain “enemies” but I also gain allies. Overall the alliances gained pay off more than the “enemies” detract, e.g. it’s possible to have interesting conversations with allies and coordinate with them economically, while “enemies” weren’t that useful in the first place and mostly keep their distance. It’s a bit weird for me to call them “enemies” because (a) enmity can’t be inferred from disagreement and (b) the disagreement already existed, you just previously didn’t trust them enough to reveal that...which means revealing the disagreement actually brings you closer, making it possible at all to reconcile. Even hated can be capitalized on (hatred is a form of attention and attention can be monetized, those who hate your haters may become allies), but it takes some emotional resilience to not be brought down by it.

1. It might make you less interesting.

Basically no one writes a large fraction of their thoughts on the internet. Writing a normal amount hints that there’s more. Writing can increase popularity and distinctiveness which help in dating. If someone hasn’t read your whole blog, there’s even some public content they don’t know about… and if they have, they’re attending to you a lot, maybe they’re kind of obsessed with you.

As a common sense check, who has better dating prospects, a rock musician (distinctive), or a random audience member (undistinctive)? What about someone with an interesting personal aesthetic that includes clothing choices etc (distinctive), versus someone trying to look normal (undistinctive)? (Luke Muehlhauser made a similar point previously.)

You can stop for a while. I do find that writing decreases the activation energy of writing more which makes more things seem like my “responsibility”, which causes me to write more. But this is probably good for my career overall, I don’t think I’d otherwise be doing something better with my time.

• Non-brain matter is most of the compute for a naive physics simulation, however it’s plausible that it could be sped up a lot, e.g. the interiors of rocks are pretty static and similar to each other so maybe they can share a lot of computation. For brains it would be harder to speed up the simulation without changing the result a lot.

• Here’s a model made by Median Group estimating total brain compute in evolutionary history. Sorry it’s not better documented, this was put together quickly on-the-fly.

Thanks for reading all the posts!

I’m not sure where you got the idea that this was to solve the spurious counterfactuals problem, that was in the appendix because I anticipated that a MIRI-adjacent person would want to know how it solves that problem.

The core problem it’s solving is that it’s a well-defined mathematical framework in which (a) there are, in some sense, choices, and (b) it is believed that these choices correspond to the results of a particular Turing machine. It goes back to the free will vs determinism paradox, and shows that there’s a formalism that has some properties of “free will” and some properties of “determinism”.

A way that EDT fails to solve 5 and 10 is that it could believe with 100% certainty that it takes $5 so its expected value for$10 is undefined. (I wrote previously about a modification of EDT to avoid this problem.)

CDT solves it by constructing physically impossible counterfactuals which has other problems, e.g. suppose there’s a Laplace’s demon that searches for violations of physics and destroys the universe if physics is violated; this theoretically shouldn’t make a difference but it messes up the CDT counterfactuals.

It does look like your post overall agrees with the view I presented. I would tend to call augmented reality “metaphysics” in that it is a piece of ontology that goes beyond physics. I wrote about metaphysical free will a while ago and didn’t post it on LW because I anticipated people would be allergic to the non-physicalist philosophical language.

It seems like agents in a deterministic universe can falsify theories in at least some sense. Like they take two different weights drop them and see they land at the same time falsifying the fact that heavier objects fall faster

The main problem is that it isn’t meaningful for their theories to make counterfactual predictions about a single situation; they can create multiple situations (across time and space) and assume symmetry and get falsification that way, but it requires extra assumptions. Basically you can’t say different theories really disagree unless there’s some possible world /​ counterfactual /​ whatever in which they disagree; finding a “crux” experiment between two theories (e.g. if one theory says all swans are white and another says there are black swans in a specific lake, the cruxy experiment looks in that lake) involves making choices to optimize disagreement.

In the second case, I would suggest that what we need is counterfactuals not agency. That is, we need to be able to say things like, “If I ran this experiment and obtained this result, then theory X would be falsified”, not “I could have run this experiment and if I did and we obtained this result, then theory X would be falsified”.

Those seem pretty much equivalent? Maybe by agency you mean utility function optimization, which I didn’t mean to imply was required.

The part I thought was relevant was the part where you can believe yourself to have multiple options and yet be implemented by a specific computer.

• I previously wrote a post about reconciling free will with determinism. The metaphysics implicit in Pearlian causality is free will (In Drescher’s words: “Pearl’s formalism models free will rather than mechanical choice.”). The challenge is reconciling this metaphysics with the belief that one is physically embodied. That is what the post attempts to do; these perspectives aren’t inherently irreconcilable, we just have to be really careful about e.g. distinguishing “my action” vs “the action of the computer embodying me” in a the Bayes net and distinguishing the interventions on them.

I wrote another post about two alternatives to logical counterfactuals: one says counterfactuals don’t exist, one says that your choice of policy should affect your anticipation of your own source code. (I notice you already commented on this post, just noting it for completeness)

And a third post, similar to the first, reconciling free will with determinism using linear logic.

I’m interested in what you think of these posts and what feels unclear/​unresolved, I might write a new explanation of the theoretical perspective or improve/​extend/​modify it in response.

• I clearly had more scrupulosity issues than you and that contributed a lot. Relevantly, the original Roko’s Basilisk post is putting AI sci-fi detail on a fear I am pretty sure a lot of EAs feel/​felt in their heart, that something nonspecifically bad will happen to them because they are able to help a lot of people (due to being pivotal on the future), and know this, and don’t do nearly as much as they could. If you’re already having these sorts of fears then the abstract math of extortion and so on can look really threatening.

• My guess is that it has more to do with willingness to compartmentalize than part of MIRI per se. Compartmentalization is negatively correlated with “taking on responsibility” for more of the problem. I’m sure you can see why it would be appealing to avoid giving into extortion in real life, not just on whiteboards, and attempting that with a skewed model of the situation can lead to outlandish behavior like Ziz resisting arrest as hard as possible.

• Thanks for giving your own model and description of the situation!

Regarding latent tendency, I don’t have a family history of psychosis (but I do of bipolar), although that doesn’t rule out latent tendency. It’s unclear what “latent tendency” means exactly, it’s kind of pretending that the real world is a 3-node Bayesian network (self tendency towards X, environment tendency towards inducing X, whether X actually happens) rather than a giant web of causality, but maybe there’s some way to specify it more precisely.

I think the 4 factors you listed are the vast majority, so I partially agree with your “red herring” claim.

The “woo” language was causal, I think, mostly because I feared that others would apply to coercion to me if I used it too much (even if I had a more detailed model that I could explain upon request), and there was a bad feedback loop around thinking that I was crazy and/​or other people would think I was crazy, and other people playing into this.

I think I originally wrote about basilisk type things in the post because I was very clearly freaking out about abstract evil at the time of psychosis (basically a generalization of utility function sign flips), and I thought Scott’s original comment would have led people to think I was thinking about evil mainly because of Michael, when actually I was thinking about evil for a variety of reasons. I was originally going to say “maybe all this modeling of adversarial/​evil scenarios at my workplace contributed, but I’m not sure” but an early reader said “actually wait, based on what you’ve said what you experienced later was a natural continuation of the previous stuff, you’re very much understating things” and suggested (an early version of) the last paragraph of the basilisk section, and that seemed likely enough to include.

It’s pretty clear that thinking about basilisk-y scenarios in the abstract was part of MIRI’s agenda (e.g. the Arbital article). Here’s a comment by Rob Bensinger saying it’s probably bad to try to make an AI that does a lot of interesting stuff and has a good time doing it, because that objective is too related to consicousness and that might create a lot of suffering. (That statement references the “s-risk” concept and if someone doesn’t know what that is and tries to find out, they could easily end up at a Brian Tomasik article recommending thinking about what it’s like to be dropped in lava.)

The thing is it seems pretty hard to evaluate an abstract claim like Rob’s without thinking about details. I get that there are arguments against thinking about the details (e.g. it might drive you crazy or make you more extortable) but natural ways of thinking about the abstract question (e.g. imagination /​ pattern completion /​ concretization /​ etc) would involve thinking about details even if people at MIRI would in fact dis-endorse thinking about the details. It would require a lot of compartmentalization to think about this question in the abstract without thinking about the details, and some people are more disposed to do that than others, and I expect compartmentalization of that sort to cause worse FAI research, e.g. because it might lead to treating “human values” as a LISP token.

[EDIT: Just realized Buck Shlegeris (someone who recently left MIRI) recently wrote a post called “Worst-case thinking in AI alignment”… seems concordant with the point I’m making.]

• One thing to add is I think in the early parts of my psychosis (before the “mind blown by Ra” part) I was as coherent or more coherent than hippies are on regular days, and even after that for some time (before actually being hospitalized) I might have been as coherent as they were on “advanced spiritual practice” days (e.g. middle of a meditation retreat or experiencing Kundalini awakening). I was still controlled pretty aggressively with the justification that I was being incoherent, and I think that control caused me to become more mentally disorganized and verbally incoherent over time. The math test example is striking, I think less than 0.2% of people could pass it (to Zack’s satisfaction) on a good day, and less than 3% could give an answer as good as the one I gave, yet this was still used to “prove” that I was unable to reason.

• I interpret us as both agreeing that there are people talking about auras who are not having psychiatric emergencies (eg random hippies), and they should not be bothered.

Agreed.

I interpret us as both agreeing that you were having a psychotic episode, that you were going further /​ sounded less coherent than the hippies, and that some hypothetical good diagnostician /​ good friend should have noticed that and suggested you seek help.

Agreed during October 2017. Disagreed substantially before then (January-June 2017, when I was at MIRI).

(I edited the post to make it clear how I misinterpreted your comment.)

• I said that “in the context of her being borderline psychotic” ie including this symptom, they should have “[told] her to seek normal medical treatment”. Suggesting that someone seek normal medical treatment is pretty different from saying this is a psychiatric emergency, and hardly an “impingement” on free speech.

It seems like you’re trying to walk back your previous claim, which did use the “psychiatric emergency” term:

Jessica is accusing MIRI of being insufficiently supportive to her by not taking her talk about demons and auras seriously when she was borderline psychotic, and comparing this to Leverage, who she thinks did a better job by promoting an environment where people accepted these ideas. I think MIRI was correct to be concerned and (reading between the lines) telling her to seek normal medical treatment, instead of telling her that demons were real and she was right to worry about them, and I think her disagreement with this is coming from a belief that psychosis is potentially a form of useful creative learning. While I don’t want to assert that I am 100% sure this can never be true, I think it’s true rarely enough, and with enough downside risk, that treating it as a psychiatric emergency is warranted.

Reading again, maybe by “it” in the last sentence you meant “psychosis” not “talking about auras and demons”? Even if that’s what you meant I hope you can see why I interpreted it the way I did?

(Note, I do not think I would have been diagnosed with psychosis if I had talked to a psychiatrist during the time I was still at MIRI, although it’s hard to be certain and it’s hard to prove anyway.)

Yes? That actually sounds pretty bad to me. If I ever go around saying that I have destroyed significant parts of the world with my demonic powers, you have my permission to ask me if maybe I should seek psychiatric treatment.

This is while I was already in the middle of a psychotic break and in a hospital. Obviously we would agree that I needed psychiatric treatment at this point.

MIRI is under no obligation to validate and signal-boost individual employees’ belief in demons, including some sort of metaphorical demons.

“Validating and signal boosting” is not at all what I would want! I would want rational discussion and evaluation. The example you give at the end of challenging Geoff Anders on TV would be an example of rational evaluation.

(I definitely don’t think Leverage handled this optimally, and that the sort of test you describe would have been good for them to do more of; I’m pointing to their lower rate of psychiatric incarceration as a point in favor of what they did, relatively speaking.)

What would a rational discussion of the claim Ben and I agree on (“auras are not obviously less real than charisma”) look like? One thing to do would be to see how much inter-rater agreement there is among aura-readers and charisma-readers, respectively, to see whether there is any perceivable feature being described at all. Another would be to see how predictive each rating is of other measurable phenomena (e.g. maybe “aura theory” predicts that people with “small auras” will allow themselves to be talked over by people with “big auras” more of the time; maybe “charisma theory” predicts people smile more when a “charismatic” person talks). Testing this might be hard but it doesn’t seem impossible.

(P.S. It seems like the AI box experiment (itself similar to the more standard Milgram Experiment) is a test of mind control ability, which in some cases comes out positive, like the Milgram Expeiment; this goes to show that the depending on the setup of the Anders/​Soares demon test, it might not have a completely obvious result.)

• I’m curious which parts resonate most with you (I’d ordinarily not ask this because it would seem rude, but I’m in a revealing-political-motives mood and figure the actual amount of pressure is pretty low).

• I really liked MIRI/​CFAR during 2015-2016 (even though I had lots of criticisms), I think I benefited a lot overall, I think things got bad in 2017 and haven’t recovered. E.g. MIRI has had many fewer good publications since 2017 and for reasons I’ve expressed, I don’t believe their private research is comparably good to their previous public research. (Maybe to some extent I got disillusioned so I’m overestimating how much things changed, I’m not entirely sure how to disentangle.)

As revealed in my posts, I was a “dissident” during 2017 and confusedly/​fearfully trying to learn and share critiques, gather people into a splinter group, etc, so there’s somewhat of a legacy of a past conflict affecting the present, although it’s obviously less intense now, especially after I can write about it.

I’ve noticed people trying to “center” everything around MIRI, justifying their actions in terms of “helping MIRI” etc (one LW mod told me and others in 2018 that LessWrong was primarily a recruiting funnel for MIRI, not a rationality promotion website, and someone else who was in the scene 2016-2017 corroborated that this is a common opinion), and I think this is pretty bad since they have no way of checking how useful MIRI’s work is, and there’s a market for lemons (compare EA arguments against donating to even “reputable” organizations like UNICEF). It resembles idol worship and that’s disappointing.

This is corroborated by some other former MIRI employees, e.g. someone who left sometime in the past 2 years who agreed with someone else’s characterization that MIRI was acting against its original mission.

I think lots of individuals at MIRI are intellectually productive and/​or high-potential but pretty confused about a lot of things. I don’t currently see a more efficient way to communicate with them than by writing things on the Internet.

I have a long-standing disagreement about AI timelines (I wrote a post saying people are grossly distorting things, which I believe and think is important partially due to the content of my recent posts about my experiences; Anna commented that the post was written in a “triggered” mind state which seems pretty likely given the traumatic events I’ve described). I think lots of people are getting freaked out about the world ending soon and this is wrong and bad for their health. It’s like in Wild Wild Country where the leader becomes increasingly isolated and starts making nearer-term doom predictions while the second-in-command becomes the de-facto social leader (this isn’t an exact analogy and I would be inhibited from making it except that I’m specifically being asked about my political motives, I’m not saying I have a good argument for this).

I still think AI risk is a problem in the long term but I have a broader idea of what “AI alignment research” is, e.g. it includes things that would fall under philosophy/​the humanities. I think the problem is really hard and people have to think in inter-disciplinary ways to actually come close to solving it (or to get one of the best achievable partial solutions). I think MIRI is drawing attention to a lot of the difficulties with the problem and that’s good even if I don’t think they can solve it.

Someone I know pointed out that Eliezer’s model might indicate that the AI alignment field has been overall net negative due to it sparking OpenAI and due to MIRI currently having no good plans. If that’s true it seems like a large change in the overall AI safety/​x-risk space would be warranted.

My friends and I have been talking with Anna Salamon (head of CFAR) more over this year, she’s been talking about a lot of the problems that have happened historically and how she intends to do different things going forward, and that seems like a good sign but she isn’t past the threshold of willing+able she would need to be to fix the scene herself.

I’m somewhat worried about criticizing these orgs too hard because I want to maintain relations with people in my previous social network, because I don’t actually think they’re especially bad, because my org (mediangroup.org) has previously gotten funding from a re-granting organization whose representative told me that my org is more likely to get funding if I write fewer “accusatory” blog posts (although, I’m not sure if I believe them about this at this time, maybe writing critiques causes people to think I’m more important and fund me more?), because it might spark “retaliation” (which need not be illegal, e.g. maybe people just criticize me a bunch in a way that’s embarrassing, or give me less money). I feel weird criticizing orgs that were as good for my career as they were even though that doesn’t make that much sense from an ethical perspective.

I very much don’t think the central orgs can accomplish their goals if they can’t learn from criticisms. A lot of the time I’m more comfortable in rat-adjacent/​postrat-ish/​non-rationalist spaces than central rationalist spaces because they are less enamored by the ideology and the central institutions. It’s easier to just attend a party and say lots of weird-but-potentially-revelatory things without getting caught in a bunch of defensiveness related to the history of the scene. One issue with these alternative social settings is that a lot of these people think it’s normal to take ideas less seriously in general so they think e.g. I’m only speaking out about problems because I have a high level of “autism” and it’s too much to expect people to tell the truth when their rent stream depends on them not acknowledging it. I understand how someone could come to this perspective but it seems somewhat of a figure-ground inversion that normalizes parasitic behavior.

• For some context:

• I got a lot of the material for this by trying to explain what I experienced to a “normal” person who wasn’t part of the scene while feeling free to be emotionally expressive (e.g. by screaming). Afterwards I found a new “voice” to talk about the problems in an annoyed way. I think this was really good for healing trauma and recovering memories.

• I have a political motive to prevent Michael from being singled out as the person who caused my psychosis since he’s my friend. I in fact don’t think he was a primary cause, so this isn’t inherently anti-epistemic, but it likely caused me to write in a more lawyer-y fashion than I otherwise would. (Michael definitely didn’t prompt me to write the first draft of the document, and only wrote a few comments on the post.)

• I’ve been working on this document for 1-2 weeks and doing a rolling release where I add more people to the document, it’s been somewhat stressful getting the memories/​interpretations into written form without making false/​misleading/​indefensible statements along the way, or unnecessarily harming the reputations of people I care about the reputations of.

• Some other people helped me edit this. I included some text from them without checking that it was as rigorous as the rest of the text I wrote. I think they made a different tradeoff than me in terms of saying “strong” statements that were less provable and potentially more accusatory, e.g. the one-paragraph overall summary was something I couldn’t have written myself because even as my co-writer was saying it out loud, I was having trouble tracking the semantics and thought it was for trauma-related reasons. I included the paragraph partially since it seemed true (even if hard-to-prove) on reflection and I was inhibited from myself writing a similar paragraph.

• In future posts I’ll rewrite more inclusions in my own words, since that way I can filter better for things I think I can actually rhetorically defend if pressed.

• I originally wrote the post without the summary of core claims, a friend/​co-editor pointed out that a lot of people wouldn’t read the whole thing, and it would be easier to follow with a summary, so I added it.

• Raemon is right that I think people are being overly defensive and finding excuses to reject the information. Overall it seems like people are paying much, much more attention to the quality of my rhetoric than the subject matter the post is about, and that seems like a large problem for improving the problems I’m describing. I wrote the following tweet about it: “Suppose you publish a criticism of X movement/​organization/​etc. The people closest to the center are the most likely to defensively reject the information. People far away are unlikely to understand or care about X. It’s people at middle distance who appreciate it the most.” In fact multiple people somewhat distant from the scene have said they really liked my post, one said he found it helpful for having a more healthy relationship with EA and rationality.

• Looking over this again and thinking for a few minutes, I see why (a) the claim isn’t technically false, and (b) it’s nonetheless confusing.

Why (a): Let’s just take a fragment of the claim: “the problems I experienced at MIRI and CFAR were not unique or unusually severe for people in the professional-managerial class. By the law of excluded middle, the only possible alternative hypothesis is that the problems I experienced at MIRI and CFAR were unique or at least unusually severe”.

This is straightforwardly true: either , or . Where is “how severe were the problems I experienced at MIRI and CFAR were” and is “how severe the problems for people in the professional-managerial class generally are”.

Why (b): in context it’s followed by a claim about regular society being infinitely corrupt etc; that would require to be above some absolute threshold, . So it looks like I’m asserting the disjunction , which isn’t tautological. So there’s a misleading Gricean implicature.

I’ll edit to make this clearer.

Similarly, you seem to partially conflate the actions of Ziz, who I consider an outright enemy of the community, with actions of “mainstream” community leaders. This does not strike me as a very honest way to engage.

In the previous post I said Ziz formed a “splinter group”, in this post I said Ziz was “marginal” and has a “negative reputation among central Berkeley rationalists”.

• Thanks for reading closely enough to have detailed responses and trying to correct the record according to your memory. Appreciate that you’re explicitly not trying to disincentivize saying negative things about one’s former employee (a family member of mine was worried about my writing this post on the basis that it would “burn bridges”).

A couple general points:

1. These events happened years ago and no one’s memory is perfect (although our culture has propaganda saying memories are less reliable than they in fact are). E.g. I mis-stated a fact about Maia’s death, that Maia had been on Ziz’s boat, based on filling in the detail from the other details and impressions I had.

2. I can’t know what someone “really means”, I can know what they say and what the most reasonable apparent interpretations are. I could have asked more clarifying questions at the time, but that felt expensive due to the stressful dynamics the post describes.

In terms of more specific points:

(And I have a decent chunk of probability mass that Jessica would clarify that she’s not accusing me of intentional coercion.) From my own perspective, she was misreading my own frame and feeling pressured into it despite significant efforts on my part to ameliorate the pressure. I happily solicit advice for what to do better next time, but do not consider my comport to have been a mistake.

I’m not accusing you of intentional coercion, I think this sort of problem could result as a side effect of e.g. mental processes trying to play along with coalitions while not adequately modeling effects on others. Some of the reasons I’m saying I was coerced are (a) Anna discouraging researchers from talking with Michael, (b) the remote possibility of assassination, (c) the sort of economic coercion that would be expected on priors at most corporations (even if MIRI is different). I think my threat model was pretty wrong at the time which made me more afraid than I actually had to be (due to conservatism); this is in an important sense irrational (and I’ve tried pretty hard to get better at modeling threats realistically since then), although in a way that would be expected to be common in normal college graduates. Given that I was criticizing MIRI’s ideology more than other researchers, my guess is that I was relatively un-coerced by the frame, although it’s in principle possible that I simply disagreed more.

I don’t recall ever “talking about hellscapes” per se. I recall mentioning them in passing, rarely. In my recollection, that mainly happened in response to someone else broaching the topic of fates worse than death. (Maybe there were other occasional throwaway references? But I don’t recall them.)

I’m not semantically distinguishing “mentioning” from “talking about”. I don’t recall asking about fates worse than death when you mentioned them and drew a corresponding graph (showing ~0 utility for low levels of alignment, negative utility for high but not excellent levels of alignment, and positive utility for excellent levels of alignment).

According to my best recollection of the conversation that I think Jessica is referring to, she was arguing that AGI will not arrive in our own lifetimes, and seemed unresponsive to my attempts to argue that a confident claim of long timelines requires positive knowledge, at which point I exasperatedly remarked that for all we knew, the allegedly missing AGI insights had already been not only had, but published in the literature, and all that remains is someone figuring out how to assemble them.

Edited to make it clear you weren’t trying to assign high probability to this proposition. What you said seems more reasonable given this, although given you were also talking about AI coming in the next 20 years I hope you can see why I thought this reflected your belief.

Here Jessica seems to be implying that, not only did I positively claim that the pieces of AGI were already out there in the literature, but also that I had personally identified them? I deny that, and I’m not sure what claim I made that Jessica misunderstood in that way.

Edited to make it clear you didn’t mean this. The reason I drew this as a Gricean implicature is that figuring out how to make an AGI wouldn’t provide evidence that the pieces to make AGI are already out there, unless such an AGI design would work if scaled up /​ iteratively improved in ways that don’t require advanced theory /​ etc.

The sequence of events as I recall them was: Various researchers wanted to do some closed research. There was much discussion about how much information was private: Research results? Yes, if the project lead wants privacy. Research directions? Yes, if the project lead wants privacy. What about the participant list for each project? Can each project determine their own secrecy bounds individually, or is revealing who’s working with you defecting against (possibly-hypothetical) projects that don’t want to disclose who they’re working with? etc. etc. I recall at least one convo with a bunch of researchers where, in efforts to get everyone to stop circling privacy questions like moths to a flame and get back to the object level research, I said something to the effect of “come to me if you’re having trouble”.

Even if the motive came from other researchers I specifically remember hearing about the policy at a meeting in a top-down fashion. I thought the “don’t ask each other about research” policy was bad enough that I complained about it and it might have been changed. It seems that not everyone remembers this policy (although Eliezer in a recent conversation didn’t disagree about this being the policy at some point), but I must have been interpreting something this way because I remember contesting it.

According to me, I was not trying to say “you shouldn’t talk about ways you believe others to be acting in bad faith”. I was trying to say “I think y’all are usually mistaken when you’re accusing certain types of other people of acting in bad faith”, plus “accusing people of acting in bad faith [in confrontational and adversarial ways, instead of gently clarifying and confirming first] runs a risk of being self-fulfilling, and also burns a commons, and I’m annoyed by the burned commons”. I think those people are wrong and having negative externalities, not that they’re bad for reporting what they believe.

I hope you can see why I interpreted the post as making a pragmatic argument, not simply an epistemic argument, against saying others are acting in bad faith:

When criticism turns to attacking the intentions of others, I perceive that to be burning the commons. Communities often have to deal with actors that in fact have ill intentions, and in that case it’s often worth the damage to prevent an even greater exploitation by malicious actors. But damage is damage in either case, and I suspect that young communities are prone to destroying this particular commons based on false premises.

In the context of 2017, I also had a conversation with Anna Salamon where she said our main disagreement was about whether bad faith should be talked about (which implies our main disagreement wasn’t about how common bad faith was).

I don’t actually know what conversation this is referring to. I recall a separate instance, not involving Jessica, of a non-researcher spending lots of time in the office hanging out and talking with one of our researchers, and me pulling the researcher aside and asking whether they reflectively endorsed having those conversations or whether they kept getting dragged into them and then found themselves unable to politely leave. (In that case, the researcher said they reflectively endorsed them, and thereafter I left them alone.)

Edited to say you don’t recall this. I didn’t hear this from you, I head it secondhand perhaps from Michael Vassar, so I don’t at this point have strong reason to think you said this.

There’s no law saying that, when someone’s making a mistake, there’s some way to explain it to them such that suddenly it’s fixed. I think existing capabilities orgs are making mistakes (at the very least, in publishing capabilities advances (though credit where credit is due, various labs are doing better at keeping their cutting-edge results private, at least until somebody else replicates or nearly-replicates them, than they used to be (though to be clear I think we have a long way to go before I stop saying that I believe I see a big mistake))), and deny the implicit inference from “you can’t quickly convince someone with words that they’re making a mistake” to “you must be using conflict theory”.

I agree that “speed at which you can convince someone” is relevant in a mistake theory. Edited to make this clear.

But, as I told Jessica at the time (IIRC), I expect folks at leading AGI labs to be much more sensitive to solutions to the alignment problem, despite the fact that I don’t think you can talk them into giving up public capabilities research in practice. (This might be what she misunderstood as me saying we’d have better luck “competing”? I don’t recall saying any such thing, but I do recall saying that we’d have better luck solving alignment first and persuading second.)

If I recall correctly you were at the time including some AGI capabilities research as part of alignment research (which makes a significant amount of theoretical sense, given that FAI has to pursue convergent instrumental goals). In this case developing an alignment solution before DeepMind develops AGI would be a form of competition. DeepMind people might be more interested in the alignment solution if it comes along with a capabilities boost (I’m not sure whether this consideration was discussed in the specific conversation I’m referring to, but it might have been considered in another conversation, which doesn’t mean it was in any way planned on).

That said, as I told Jessica at the time (IIRC), you can always just ask me whether I’m speaking as MIRI-the-organization or whether I’m speaking as Nate. Similarly, when I’m speaking as Nate-the-person, you can always just ask me about my honesty protocols.

Ok, this helps me disambiguate your honesty policy. If “employees may say things on the MIRI blog that would be very misleading under the assumption that this blog was not the output of MIRI playing politics and being PC and polite” is consistent with MIRI’s policies, it’s good for that to be generally known. In the case of the OpenAI blog post, the post is polite because it gives a misleadingly positive impression.