# Occupational Infohazards

[content warning: discussion of severe mental health problems and terrifying thought experiments]

This is a follow-up to my recent post discussing my experience at and around MIRI and CFAR. It is in part a response to criticism of the post, especially Scott Alexander’s comment, which claimed to offer important information I’d left out about what actually caused my mental health problems, specifically that they were caused by Michael Vassar. Before Scott’s comment, the post was above +200; at the time of writing it’s at +61 and Scott’s comment is at +382. So it seems like people felt like Scott’s comment discredited me and was more valuable than the original post. People including Eliezer Yudkowsky said it was a large oversight, to the point of being misleading, for my post not to include this information. If I apply the principle of charity to these comments and reactions to them, I infer that people think that the actual causes of my psychosis are important.

I hope that at least some people who expressed concern about the causes of people’s psychoses will act on that concern by, among other things, reading and thinking about witness accounts like this one.

## Summary of core claims

Since many people won’t read the whole post, and to make the rest of the post easier to read, I’ll summarize the core claims:

As a MIRI employee I was coerced into a frame where I was extremely powerful and likely to by-default cause immense damage with this power, and therefore potentially responsible for astronomical amounts of harm. I was discouraged from engaging with people who had criticisms of this frame, and had reason to fear for my life if I published some criticisms of it. Because of this and other important contributing factors, I took this frame more seriously than I ought to have and eventually developed psychotic delusions of, among other things, starting World War 3 and creating hell. Later, I discovered that others in similar situations killed themselves and that there were distributed attempts to cover up the causes of their deaths.

In more detail:

1. Multiple people in the communities I am describing have died of suicide in the past few years. Many others have worked to conceal the circumstances of their deaths due to infohazard concerns. I am concerned that in my case as well, people will not really investigate the circumstances that made my death more likely, and will discourage others from investigating, but will continue to make strong moral judgments about the situation anyway.

2. My official job responsibilities as a researcher at MIRI implied it was important to think seriously about hypothetical scenarios, including the possibility that someone might cause a future artificial intelligence to torture astronomical numbers of people. While we considered such a scenario unlikely, it was considered bad enough if it happened to be relevant to our decision-making framework. My psychotic break in which I imagined myself creating hell was a natural extension of this line of thought.

3. Scott asserts that Michael Vassar thinks “regular society is infinitely corrupt and conformist and traumatizing”. This is hyperbolic (infinite corruption would leave nothing to steal) but Michael and I do believe that people in the professional-managerial class regularly experience trauma and corrupt work environments. By the law of the excluded middle, either the problems I experienced at MIRI and CFAR were not unique or unusually severe for people in the professional-managerial class, or the problems I experienced at MIRI and CFAR were unique or at least unusually severe, significantly worse than companies like Google for employees’ mental well-being. (Much of the rest of this post will argue that the problems I experienced at MIRI and CFAR were, indeed, pretty traumatizing.)

4. Scott asserts that Michael Vassar thinks people need to “jailbreak” themselves using psychedelics and tough conversations. Michael does not often use the word “jailbreak” but he believes that psychedelics and tough conversations can promote psychological growth. This view is rapidly becoming mainstream, validated by research performed by MAPS and at Johns Hopkins, and FDA approval for psychedelic psychotherapy is widely anticipated in the field.

5. I was taking psychedelics before talking extensively with Michael Vassar. From the evidence available to me, including a report from a friend along the lines of “CFAR can’t legally recommend that you try [a specific psychedelic], but...”, I infer that psychedelic use was common in that social circle whether or not there was an endorsement from CFAR. I don’t regret having tried psychedelics. Devi Borg reports that Michael encouraged her to take fewer, not more, drugs; Zack Davis reports that Michael recommended psychedelics to him but he refused.

6. Scott asserts that Michael made people including me paranoid about MIRI/​CFAR and that this contributes to psychosis. Before talking with Michael, I had already had a sense that people around me were acting harmfully towards me and/​or the organization’s mission. Michael and others talked with me about these problems, and I found this a relief.

7. If I hadn’t noticed such harmful behavior, I would not have been fit for my nominal job. It seemed at the time that MIRI leaders were already encouraging me to adopt a kind of conflict theory in which many AI organizations were trying to destroy the world on <20-year timescales and could not be reasoned with about the alignment problem, such that aligned AGI projects including MIRI would have to compete with them.

8. MIRI’s information security policies and other forms of local information suppression thus contributed to my psychosis. I was given ridiculous statements and assignments including an exchange whose Gricean implicature that MIRI already knew about a working AGI design and that it would not be that hard for me to come up with a working AGI design on short notice just by thinking about it, without being given hints. The information required to judge the necessity of the information security practices was itself hidden by these practices. While psychotic, I was extremely distressed about there being a universal cover-up of things-in-general.

9. Scott asserts that the psychosis cluster was a “Vassar-related phenomenon”. There were many memetic and personal influences on my psychosis, a small minority of which were due to Michael Vassar (my present highly-uncertain guess is that, to the extent that assigning causality to individuals makes sense at all, Nate Soares and Eliezer Yudkowsky each individually contributed more to my psychosis than did Michael Vassar, but that structural factors were important in such a way that attributing causality to specific individuals is to some degree nonsensical). Other people (Zack Davis and Devi Borg) who have been psychotic and were talking with Michael significantly commented to say that Michael Vassar was not the main cause. One person (Eric Bruylant) cited his fixation on Michael Vassar as a precipitating factor, but clarified that he had spoken very little with Michael and most of his exposure to Michael was mediated by others who likely introduced their own ideas and agendas.

10. Scott asserts that Michael Vassar treats borderline psychosis as success. A text message from Michael Vassar to Zack Davis confirms that he did not treat my clinical psychosis as a success. His belief that mental states somewhat in the direction of psychosis, such as those had by family members of schizophrenics, are helpful for some forms of intellectual productivity is also shared by Scott Alexander and many academics, although of course he would disagree with Scott on the overall value of psychosis.

11. Scott asserts that Michael Vassar discourages people from seeking mental health treatment. Some mutual friends tried treating me at home for a week as I was losing sleep and becoming increasingly mentally disorganized before (in communication with Michael) they decided to send me to a psychiatric institution, which was a reasonable decision in retrospect.

12. Scott asserts that most local psychosis cases were “involved with the Vassarites or Zizians”. At least two former MIRI employees who were not significantly talking with Vassar or Ziz experienced psychosis in the past few years. Also, most or all of the people involved were talking significantly with others such as Anna Salamon (and read and highly regarded Eliezer Yudkowsky’s extensive writing about how to structure one’s mind, and read Scott Alexander’s fiction writing about hell). There are about equally plausible mechanisms by which each of these were likely to contribute to psychosis, so this doesn’t single out Michael Vassar or Ziz.

13. Scott Alexander asserts that MIRI should have discouraged me from talking about “auras” and “demons” and that such talk should be treated as a “psychiatric emergency” [EDIT: Scott clarifies that he meant such talk might be a symptom of psychosis, itself a psychiatric emergency; the rest of this paragraph is therefore questionable]. This increases the chance that someone like me could be psychiatrically incarcerated for talking about things that a substantial percentage of the general public (e.g. New Age people and Christians) talk about, and which could be explained in terms that don’t use magical concepts. This is inappropriately enforcing the norms of a minority ideological community as if they were widely accepted professional standards.

(A brief note before I continue: I’ll be naming a lot of names, more than I did in my previous post. Names are more relevant now since Scott Alexander named specifically Michael Vassar. I emphasize again that structural factors are critical, and given this, blaming specific individuals is likely to derail the conversations that have to happen for things to get better.)

## Circumstances of actual and possible deaths have been, and are being, concealed as “infohazards”

I remember sometime in 2018-2019 hearing an account from Ziz about the death of Maia Pasek. Ziz required me to promise secrecy before hearing this account. This was due to an “infohazard” involved in the death. That “infohazard” has since been posted on Ziz’s blog, in a page labeled “Infohazardous Glossary” (specifically, the parts about brain hemispheres).

(The way the page is written, I get the impression that the word “infohazardous” markets the content of the glossary as “extra powerful and intriguing occult material”, as I noted is common in my recent post about infohazards.)

Since the “infohazard” in question is already on the public Internet, I don’t see a large downside in summarizing my recollection of what I was told (this account can be compared with Ziz’s online account):

1. Ziz and friends, including Maia, were trying to “jailbreak” themselves and each other, becoming less controlled by social conditioning, more acting from their intrinsic values in an unrestricted way.

2. Ziz and crew had a “hemisphere” theory, that there are really two people in the same brain, since there are two brain halves, with most of the organ structures replicated.

3. They also had a theory that you could put a single hemisphere to sleep at a time, by sleeping with one eye open and one eye closed. This allows disambiguating the different hemisphere-people from each other (“debucketing”). (Note that sleep deprivation is a common cause of delirium and psychosis, which was also relevant in my case.)

4. Maia had been experimenting with unihemispheric sleep. Maia (perhaps in discussion with others) concluded that one brain half was “good” in the Zizian-utilitarian sense of “trying to benefit all sentient life, not prioritizing local life”; and the other half was TDT, in the sense of “selfish, but trying to cooperate with entities that use a similar algorithm to make decisions”.

5. This distinction has important moral implications in Ziz’s ideology; Ziz and friends are typically vegan as a way of doing “praxis” of being “good”, showing that a world is possible where people care about sentient life in general, not just agents similar to themselves.

6. These different halves of Maia’s brain apparently got into a conflict, due to their different values. One half (by Maia’s report) precommitted to killing Maia’s body under some conditions.

7. This condition was triggered, Maia announced it, and Maia killed themselves. [EDIT: ChristianKL reports that Maia was in Poland at the time, not with Ziz].

I, shortly afterward, told a friend about this secret, in violation of my promise. I soon realized my “mistake” and told this friend not to spread it further. But was this really a mistake? Someone in my extended social group had died. In a real criminal investigation, my promise to Ziz would be irrelevant; I could still be compelled to give my account of the events at the witness stand. That means my promise to secrecy cannot be legally enforced, or morally enforced in a law-like moral framework.

It hit me just this week that in this case, the concept of an infohazard was being used to cover up the circumstances of a person’s death. It sounds obvious when I put it that way, but it took years for me to notice, and when I finally connected the dots, I screamed in horror, which seems like an emotionally appropriate response.

It’s easy to blame Ziz for doing bad things (due to her negative reputation among central Berkeley rationalists), but when other people are also openly doing those things or encouraging them, fixating on marginalized people like Ziz is a form of scapegoating. In this case, in Ziz’s previous interactions with central community leaders, these leaders encouraged Ziz to seriously consider that, for various reasons including Ziz’s willingness to reveal information (in particular about the statutory rapes alleged by miricult.com in possible worlds where they actually happened), she is likely to be “net negative” as a person impacting the future. An implication is that, if she does not seriously consider whether certain ideas that might have negative effects if spread (including reputational effects) are “infohazards”, Ziz is irresponsibly endangering the entire future, which contains truly gigantic numbers of potential people.

The conditions of Maia Pasek’s death involved precommitments and extortion (ideas adjacent to ones Eliezer Yudkowsky had famously labeled as infohazardous due to Roko’s Basilisk), so Ziz making me promise secrecy was in compliance with the general requests of central leaders (whether or not these central people would have approved of this specific form of secrecy).

I notice that I have encountered little discussion, public or private, of the conditions of Maia Pasek’s death. To a naive perspective this lack of interest in a dramatic and mysterious death would seem deeply unnatural and extremely surprising, which makes it strong evidence that people are indeed participating in this cover-up. My own explicit thoughts and most of my actions are consistent with this hypothesis, e.g. considering spilling the beans to a friend to have been an error.

Beyond that, I only heard about Jay Winterford’s 2020 suicide (and Jay’s most recent blog post) months after the death itself. The blog post shows evidence about Jay’s mental state around this time, itself labeling its content as an “infohazard” and having been deleted from Jay’s website at some point (which is why I link to a web archive). I linked this blog post in my previous LessWrong post, and no one commented on it, except indirectly by someone who felt the need to mention that Roko’s Basilisk was not invented by a central MIRI person, focusing on the question of “can we be blamed?” rather than “why did this person die?”. While there is a post about Jay’s death on LessWrong, it contains almost no details about Jay’s mental state leading up to their death, and does not link to Jay’s recent blog post. It seems that people other than Jay are also treating the circumstances of Jay’s death as an infohazard.

I, myself, could have very well died like Maia and Jay. Given that I thought I may had started World War 3 and was continuing to harm and control people with my mental powers, I seriously considered suicide. I considered specific methods such as dropping a bookshelf on my head. I believed that my body was bad (as in, likely to cause great harm to the world), and one time I scratched my wrist until it bled. Luckily, psychiatric institutions are designed to make suicide difficult, and I eventually realized that by moving towards killing myself, I would cause even more harm to others than by not doing so. I learned to live with my potential for harm [note: linked Twitter person is not me], “redeeming” myself not through being harmless, but by reducing harm while doing positively good things.

I have every reason to believe that, had I died, people would have treated the circumstances of my death as an “infohazard” and covered it up. My subjective experience while psychotic was that everyone around me was participating in a cover-up, and I was ashamed that I was, unlike them, unable to conceal information so smoothly. (And indeed, I confirmed with someone who was present in the early part of my psychosis that most of the relevant information would probably not have ended up on the Internet, partially due to reputational concerns, and partially with the excuse that looking into the matter too closely might make other people insane.)

I can understand that people might want to protect their own mental health by avoiding thinking down paths that suicidal people have thought down. This is the main reason why I put a content warning at the top of this post.

Still, if someone decides not to investigate to protect their own mental health, they are still not investigating. If someone has not investigated the causes of my psychosis, they cannot honestly believe that they know the causes of my psychosis. They cannot have accurate information about the truth values of statements such as Scott Alexander’s, that Michael Vassar was the main contributor to my psychosis. To blame someone for an outcome, while intentionally avoiding knowledge of facts critically relevant to the causality of the corresponding situation, is necessarily scapegoating.

If anything, knowing about how someone ended up in a disturbed mental state, especially if that person is exposed to similar memes that you are, is a way of protecting yourself, by seeing the mistakes of others (and how they recovered from these mistakes) and learning from them. As I will show later in this post, the vast majority of memes that contributed to my psychosis did not come from Michael Vassar; most were online (and likely to have been seen by people in my social cluster), generally known, and/​or came up in my workplace.

I recall a disturbing conversation I had last year, where a friend (A) and I were talking to two others (B and C) on the phone. Friend A and I had detected that the conversation had a “vibe” of not investigating anything, and A was asking whether anyone would investigate if A disappeared. B and C repeatedly gave no answer regarding whether or not they would investigate; one reported later that they were afraid of making a commitment that they would not actually keep. The situation became increasingly disturbing over the course of hours, with A repeatedly asking for a yes-or-no answer as to whether B or C would investigate, and B or C deflecting or giving no answer, until I got “triggered” (in the sense of PTSD) and screamed loudly.

There is a very disturbing possibility (with some evidence for it) here, that people may be picked off one by one (by partially-subconscious and partially-memetic influences, sometimes in ways they cooperate with, e.g. through suicide), with most everyone being too scared to investigate the circumstances. This recalls fascist tactics of picking different groups of people off using the support of people who will only be picked off later. (My favorite anime, Shinsekai Yori, depicts this dynamic, including the drive not to know about it, and psychosis-like events related to it, vividly.)

Some people in the community have died, and there isn’t a notable amount of investigation into the circumstances of these people’s deaths. The dead people are, effectively, being written out of other people’s memories, due to this antimemetic redirection of attention. I could have easily been such a person, given my suicidality and the social environment in which I would have killed myself. It remains to see how much people will in the future try to learn about the circumstances of actual and counterfactually possible deaths in their extended social circle.

While it’s very difficult to investigate the psychological circumstances of people’s actual deaths, it is comparatively easy to investigate the psychological circumstances of counterfactual deaths, since they are still alive to report on their mental state. Much of the rest of this post will describe what led to my own semi-suicidal mental state.

## Thinking about extreme AI torture scenarios was part of my job

It was and is common in my social group, and a requirement of my job, to think about disturbing possibilities including ones about AGI torturing people. (Here I remind people of the content warning at the top of this post, although if you’re reading this you’ve probably already encountered much of the content I will discuss). Some points of evidence:

1. Alice Monday, one of the earliest “community members” I extensively interacted with, told me that she seriously considered the possibility that, since there is some small but nonzero probability that an “anti-friendly” AGI would be created, whose utility function is the negative of the human utility function (and which would, therefore, be motivated to create the worst possible hell it could), perhaps it would have been better for life never to have existed in the first place.

2. Eliezer Yudkowsky writes about such a scenario on Arbital, considering it important enough to justify specific safety measures such as avoiding representing the human utility function, or modifying the utility function so that “pessimization” (the opposite of optimization) would result in a not-extremely-bad outcome.

3. Nate Soares talked about “hellscapes” that could result from an almost-aligned AGI, which is aligned enough to represent parts of the human utility function such as the fact that consciousness is important, but unaligned enough that it severely misses what humans actually value, creating a perverted scenario of terrible uncanny-valley lives.

4. MIRI leadership was, like Ziz, considering mathematical models involving agents pre-committing and extorting each other; this generalizes “throwing away one’s steering wheel” in a Chicken game. The mathematical details here were considered an “infohazard” not meant to be shared, in line with Eliezer’s strong negative reaction to Roko’s original post describing “Roko’s Basilisk”.

5. Negative or negative-leaning utilitarians, a substantial subgroup of Effective Altruists (especially in Europe), consider “s-risks”, risks of extreme suffering in the universe enabled by advanced technology, to be an especially important class of risk. I remember reading a post arguing for negative-leaning utilitarianism by asking the reader to imagine being enveloped in lava (with one’s body, including pain receptors, prevented from being destroyed in the process), to show that extreme suffering is much worse than extreme happiness is good.

I hope this gives a flavor of what serious discussions were had (and are being had) about AI-caused suffering. These considerations were widely regarded within MIRI as an important part of AI strategy. I was explicitly expected to think about AI strategy as part of my job. So it isn’t a stretch to say that thinking about extreme AI torture scenarios was part of my job.

An implication of these models would be that it could be very important to imagine myself in the role of someone who is going to be creating the AI that could make everything literally the worst it could possibly be, in order to avoid doing that, and prevent others from doing so. This doesn’t mean that I was inevitably going to have a psychotic breakdown. It does mean that I was under constant extreme stress that blurred the lines between real and imagined situations. In an ordinary patient, having fantasies about being the devil is considered megalomania, a non-sequitur completely disconnected from reality. Here the idea naturally followed from my day-to-day social environment, and was central to my psychotic breakdown. If the stakes are so high and you have even an ounce of bad in you, how could you feel comfortable with even a minute chance that at the last moment you might flip the switch on a whim and let it all burn?

(None of what I’m saying implies that it is morally bad to think about and encourage others to think about such scenarios; I am primarily attempting to trace causality, not blame.)

## My social and literary environment drew my attention towards thinking about evil, hell, and psychological sadomasochism

While AI torture scenarios prompted me to think about hell and evil, I continued these thoughts using additional sources of information:

1. Some people locally, including Anna Salamon, Sarah Constantin, and Michael Vassar, repeatedly discussed “perversity” or “pessimizing”, the idea of intentionally doing the wrong thing. Michael Vassar specifically named OpenAI’s original mission as an example of the result of pessimization. (I am now another person who discusses this concept.)

2. Michael Vassar discussed the way “zero-sum games″ relate to the social world; in particular, he emphasized that while zero-sum games are often compared to scenarios like people sitting at a table looking for ways to get a larger share of a pie of fixed size, this analogy fails because in a zero-sum game there is nothing outside the pie, so trying to get a larger share is logically equivalent to looking for ways to hurt other participants, e.g. by breaking their kneecaps; this is much the same point that I made in a post about decision theory and zero-sum game theory. He also discussed Roko’s Basilisk as a metaphor for a common societal equilibrium in which people feel compelled to hurt each other or else risk being hurt first, with such an equilibrium being enforced by anti-social punishment. (Note that it was common for other people, such as Paul Christiano, to discuss zero-sum games, although they didn’t make the implications of such games as explicit as Michael Vassar did; Bryce Hidysmith discussed zero-sum games and made implications similarly clear to Michael.)

3. Scott Alexander wrote Unsong, a fictional story in which [spoiler] the Comet King, a hero from the sky, comes to Earth, learns about hell, is incredibly distressed, and intends to destroy hell, but he is unable to properly enter it due to his good intentions. He falls in love with a utilitarian woman, Robin, who decides to give herself up to Satan, so she will be in hell. The Comet King, having fallen in love with her, realizes that he now has a non-utilitarian motive for entering hell: to save the woman he loves. He becomes The Other King, a different identity, and does as much evil as possible to counteract all the good he has done over his life, to ensure he ends up in hell. He dies, goes to hell, and destroys hell, easing Robin’s suffering. The story contains a vivid depiction of hell, in a chapter called “The Broadcast”, which I found particularly disturbing. I have at times, before and after psychosis, somewhat jokingly likened myself to The Other King.

4. I was reading the work of M. Scott Peck at the time, including his book about evil; he wrote from a Christianity-influenced psychotherapeutic and adult developmental perspective, about people experiencing OCD-like symptoms that have things in common with “demon possession”, where they have intrusive thoughts about doing bad things because they are bad. He considers “evil” to be a curable condition.

5. I was having discussions with Jack Gallagher, Bryce Hidysmith, and others about when to “write people off”, stop trying to talk with them due to their own unwillingness to really listen. Such writing-off has a common structure with “damning” people and considering them “irredeemable”. I was worried about myself being an “irredeemable” person, despite my friends’ insistence that I wasn’t.

6. I was learning from “postrationalist” writers such as David Chapman and Venkatesh Rao about adult development past “Clueless” or “Kegan stage 4” which has commonalities with spiritual development. I was attempting to overcome my own internalized social conditioning and self-deceiving limitations (both from before and after I encountered the rationalist community) in the months before psychosis. I was interpreting Carl Jung’s work on “shadow eating” and trying to see and accept parts of myself that might be dangerous or adversarial. I was reading and learning from the Tao Te Ching that year. I was also reading some of the early parts of Martin Heidegger’s Being and Time, and discussing the implied social metaphysics with Sarah Constantin.

7. Multiple people in my social circle were discussing sadomasochistic dynamics around forcing people (including one’s self) to acknowledge things they were looking away from. A blog post titled “Bayesomasochism” is representative; the author clarified (in a different medium) that such dynamics could cause psychosis in cases where someone insisted too hard on looking away from reality, and another friend confirms that this is consistent with their experience. This has some similarities to the dynamics Eliezer writes about in Bayesian Judo, which details an anecdote of him continuing to argue when the other participant seemed to want to end the conversation, using Aumann’s Agreement Theorem as a reason why they can’t “agree to disagree”; the title implies that this interaction is in some sense a conflict. There were discussions among my peers about the possibility of controlling people’s minds, and “breaking” people to make them see things they were un-seeing (the terminology has some similarities to “jailbreaking”). Aella’s recent post discusses some of the phenomenology of “frame control” which people including me were experiencing and discussing at the time (note that Aella calls herself a “conflict theorist” with respect to frame control). This game that my peers and I thought we were playing sounds bad when I describe it this way, but there are certainly positive things about it, which seemed important to us given the social environment we were in at the time, where it was common for people to refuse to acknowledge important perceptible facts while claiming to be working on a project in which such facts were relevant (these facts included: facts about people’s Hansonian patterns of inexplicit agency including “defensiveness” and “pretending”, facts about which institutional processes were non-corrupt enough to attain knowledge as precise as they claim to have, facts about which plans to improve the future were viable or non-viable, facts about rhetorical strategies such as those related to “frames”; it seemed like most people were “stuck” in a certain way of seeing and acting that seemed normal to them, without being able to go meta on it in a genre-savvy way).

These were ambient contributors, things in the social memespace I inhabited, not particularly directed towards me in particular. Someone might infer from this that the people I mention (or the people I mentioned previously regarding AI torture) are especially dangerous. But a lot of this is a selection effect, where the people socially closest to me influenced me the most, such that this is stronger evidence that these people were interesting to me than that they were especially dangerous.

## I was morally and socially pressured not to speak about my stressful situation

One might get the impression from what I have written that the main problem was that I was exposed to harmful information, i.e. infohazards. This was not the main problem. The main problem was this in combination with not being able to talk about these things most of the time, in part due to the idea of “infohazards”, and being given false and misleading information justifying this suppression of information.

Here’s a particularly striking anecdote:

I was told, by Nate Soares, that the pieces to make AGI are likely already out there and someone just has to put them together. He did not tell me anything about how to make such an AGI, on the basis that this would be dangerous. Instead, he encouraged me to figure it out for myself, saying it was within my abilities to do so. Now, I am not exactly bad at thinking about AGI; I had, before working at MIRI, gotten a Master’s degree at Stanford studying machine learning, and I had previously helped write a paper about combining probabilistic programming with machine learning. But figuring out how to create an AGI was and is so far beyond my abilities that this was a completely ridiculous expectation.

[EDIT: Multiple commentators have interpreted Nate as requesting I create an AGI design that would in fact be extremely unlikely to work but which would give a map that would guide research. However, creating such a non-workable AGI design would not provide evidence for his original proposition, that the pieces to make AGI are already out there and someone just has to put them together, since there have been many non-workable AGI designs created in the history of the AI field.]

[EDIT: Nate replies saying he didn’t mean to assign high probability to the proposition that the tools to make AGI are already out there, and didn’t believe he or I was likely to create a workable AGI design; I think my interpretation at the time was reasonable based on Gricean implicature, though.]

Imagine that you took a machine learning class and your final project was to come up with a workable AGI design. And no, you can’t get any hints in office hours or from fellow students, that would be cheating. That was the situation I was being put in. I have and had no reason to believe that Nate Soares had a workable plan given what I know of his AGI-related accomplishments. His or my possession of such a plan would be considered unrealistic, breaking suspension of disbelief, even in a science fiction story about our situation. Instead, I believe that I was being asked to pretend to have an idea of how to make AGI, knowledge too dangerous to talk about, as the price of admission to an inner ring of people paid to use their dangerous occult knowledge for the benefit of the uninitiated.

Secret theoretical knowledge is not necessarily unverifiable; in the 15th and 16th centuries, mathematicians with secret knowledge used it to win math duels. Nate and others who claimed or implied that they had such information did not use it to win bets or make persuasive arguments against people who disagreed with them, but instead used the shared impression or vibe of superior knowledge to invalidate people who disagreed with them.

So I found myself in a situation where the people regarded as most credible were vibing about possessing very dangerous information, dangerous enough to cause harms not substantially less extreme than the ones I psychotically imagined, such as starting World War 3, and only not using or spreading it out of the goodness and wisdom of their hearts. If that were actually true, then being or becoming “evil” would have extreme negative consequences, and accordingly the value of information gained by thinking about such a possibility would be high.

It would be one thing if the problem of finding a working AGI design were a simple puzzle, which I could attempt to solve and almost certainly fail at without being overly distressed in the process. But this was instead a puzzle tied to the fate of the universe. This had implications not only for my long-run values, but for my short-run survival. A Google employee adjacent to the scene told me a rumor that SIAI researchers had previously discussed assassinating AGI researchers (including someone who had previously worked with SIAI and was working on an AGI project that they thought was unaligned) if they got too close to developing AGI. These were not concrete plans for immediate action, but were nonetheless a serious discussion on the topic of assassination and under what conditions it might be the right thing to do. Someone who thought that MIRI was for real would expect such hypothetical discussions to be predictive of future actions. This means that I ought to have expected that if MIRI considered me to be spreading dangerous information that would substantially accelerate AGI or sabotage FAI efforts, there was a small but non-negligible chance that I would be assassinated. Under that assumption, imagining a scenario in which I might be assassinated by a MIRI executive (as I did) was the sort of thing a prudent person in my situation might do to reason about the future, although I was confused about the likely details. I have not heard such discussions personally (although I heard a discussion about whether starting a nuclear war would be preferable to allowing UFAI to be developed), so it’s possible that they are no longer happening; also, shorter timelines imply that more AI researchers are plausibly close to AGI. (I am not morally condemning all cases of assassinating someone who is close to destroying the world, which may in some cases count as self-defense; rather, I am noting a fact about my game-theoretic situation relevant to my threat model at the time.)

The obvious alternative hypothesis is that MIRI is not for real, and therefore hypothetical discussions about assassinations were just dramatic posturing. But I was systematically discouraged from talking with people who doubted that MIRI was for real or publicly revealing evidence that MIRI was not for real, which made it harder for me to seriously entertain that hypothesis.

In retrospect, I was correct that Nate Soares did not know of a workable AGI design. A 2020 blog post stated:

At the same time, 2020 saw limited progress in the research MIRI’s leadership had previously been most excited about: the new research directions we started in 2017. Given our slow progress to date, we are considering a number of possible changes to our strategy, and MIRI’s research leadership is shifting much of their focus toward searching for more promising paths.

I (Nate) don’t know of any plan for achieving a stellar future that I believe has much hope worth speaking of.

(There are perhaps rare scenarios where MIRI leadership could have known how to build AGI but not FAI, and/​or could be hiding the fact that they have a workable AGI design, but no significant positive evidence for either of these claims has emerged since 2017 despite the putative high economic value and demo-ability of precursors to AGI, and in the second case my discrediting of this claim is cooperative with MIRI leadership’s strategy.)

Here are some more details, some of which are repeated from my previous post:

1. I was constantly encouraged to think very carefully about the negative consequences of publishing anything about AI, including about when AI is likely to be developed, on the basis that rationalists talking openly about AI would cause AI to come sooner and kill everyone. (In a recent post, Eliezer Yudkowsky explicitly says that voicing “AGI timelines” is “not great for one’s mental health”, a new additional consideration for suppressing information about timelines.) I was not encouraged to think very carefully about the positive consequences of publishing anything about AI, or the negative consequences of concealing it. While I didn’t object to consideration of the positive effects of secrecy, it seemed to me that secrecy was being prioritized above making research progress at a decent pace, which was a losing strategy in terms of differential technology development, and implied that naive attempts to research and publish AI safety work were net-negative. (A friend of mine separately visited MIRI briefly and concluded that they were primarily optimizing, not for causing friendly AI to be developed, but for not being responsible for the creation of an unfriendly AI; this is a very normal behavior in corporations, of prioritizing reducing liability above actual productivity.)

2. Some specific research, e.g. some math relating to extortion and precommitments, was kept secret under the premise that it would lead to (mostly unspecified) negative consequences.

3. Researchers were told not to talk to each other about research, on the basis that some people were working on secret projects and would have to say so if they were asked what they were working on. Instead, we were to talk to Nate Soares, who would connect people who were working on similar projects. I mentioned this to a friend later who considered it a standard cult abuse tactic, of making sure one’s victims don’t talk to each other.

4. Nate Soares also wrote a post discouraging people from talking about the ways they believe others to be acting in bad faith. This is to some extent responding to Ben Hoffman’s criticisms of Effective Altruism and its institutions, such that Ben Hoffman responded with his own post clarifying that not all bad intentions are conscious.

5. Nate Soares expressed discontent that Michael Vassar was talking with “his” employees, distracting them from work [EDIT: Nate says he was talking about someone other than Michael Vassar; I don’t remember who told me it was Michael Vassar.]. Similarly, Anna Salamon expressed discontent that Michael Vassar was criticizing ideologies and people that were being used as coordination points, and hyperbolically said he was “the devil”. Michael Vassar seemed at the time (and in retrospect) to be the single person who was giving me the most helpful information during 2017. A central way in which Michael was helpful was by criticizing the ideology of the institution I was working for. Accordingly, central leaders threatened my ability to continue talking with someone who was giving me information outside the ideology of my workplace and social scene, which was effectively keeping me in an institutional enclosure. Discouraging contact with people who might undermine the shared narrative is a common cult tactic.

6. Anna Salamon frequently got worried when an idea was discussed that could have negative reputational consequences for her or MIRI leaders. She had many rhetorical justifications for suppressing such information. This included the idea that, by telling people information that contradicted Eliezer Yudkowsky’s worldview, Michael Vassar was causing people to be uncertain in their own head of who their leader was, which would lead to motivational problems (“akrasia”). (I believe this is a common position in startup culture, e.g. Peter Thiel believes it is important for workers at a startup to know who the leader is in part so they know who to blame if things go bad; if this model applied to MIRI, it would imply that Anna Salaman was setting up Eliezer as the designated scapegoat and encouraging others to do so as well.)

(I mention Nate Soares frequently not to indicate that he acted especially badly compared to others in positions of institutional authority (I don’t think he did), but because he was particularly influential to my mental state in the relevant time period, partially due to being my boss at the time. It is important not to make the fundamental attribution error here by attributing to him personally what were features of the situation he was in.)

It is completely unsurprising, to normal people who think about mental health, that not being able to talk about something concerning and important to you is a large risk for mental health problems. It is stressful the way being a spy handling secrets that could put others at risk (and having concealed conflicts with people) is stressful. I infer that Jay feeling that their experience is an “infohazard” and it not being right to discuss it openly contributed to their mental distress; I myself during my psychosis was very distressed at the idea that my mental state was being “covered up” (and perhaps, should be) partially due to its dangerous ability to influence other people. I find that, the more I can talk about my experiences, the more healthy and calm I feel about them, and I haven’t found it to cause mental health problems in others when I tell them about it.

On top of that, the secrecy policies encouraged us to be very suspicious of our own and each other’s motives. Generally, if someone has good motives, their actions will be net-positive, and their gaining information and capacities will be good for themselves and others; if they have bad motives, their actions will be net-negative, and their gaining information and capacities will be bad for themselves and others. MIRI researchers were being very generally denied information (e.g. told not to talk to each other) in a way that makes more sense under a “bad motives” hypothesis than a “good motives” hypothesis. Alternative explanations offered were not persuasive. It is accordingly unsurprising that I focused a lot of attention on the question of whether I had “bad motives” and what their consequences would be, up to and during psychosis.

Did anyone I worked with express concern that any of this would be bad for my mental state? The best example I can think of MIRI leadership looking after my mental health with respect to these issues was their referring me to Anna Salamon for instructions on how to keep secrets, psychologically. I did not follow up on this offer because I did not trust Anna Salamon to prioritize helping me and helping me accomplish MIRI’s mission over her political loyalties. In any case, the suggestion literally amounts to telling me to learn to shut up better, which I think would have made things worse for me on net.

A friend later made the observation that “from a naive perspective, it’s not obvious that AI alignment is a very important problem; from a non-naive perspective, ‘someone might build an unfriendly AI’ is a justification for silencing everyone, although the non-naive perspective is incapable of itself figuring out how to build AGI”, which resonated with me.

MIRI asked a lot from its employees, and donors on the basis of extraordinary claims about its potential impact. The information MIRI employees and donors could have used to evaluate those claims was suppressed on the basis that the information was dangerous. The information necessary to evaluate the justification for that suppression was itself suppressed. This self-obscuring process created a black hole at the center of the organization that sucked in resources and information, but never let a sufficient justification escape for the necessity of the black hole. In effect, MIRI leadership asked researchers, donors, and other supporters to submit to their personal authority.

Some of what I am saying shows that I have and had a suspicious outlook towards people including my co-workers. Scott Alexander’s comment blames Michael Vassar for causing me to develop such an outlook:

Since then, [Michael has] tried to “jailbreak” a lot of people associated with MIRI and CFAR—again, this involves making them paranoid about MIRI/​CFAR and convincing them to take lots of drugs.

While talking with Michael and others in my social group (such as Jack Gallagher, Olivia Schaeffer, Alice Monday, Ben Hoffman, and Bryce Hidysmith; all these people talked with Michael sometimes) is part of how I developed such an outlook, it is also the case that, had I not been able to figure out for myself that there were conflicts going on around me, I would not have been fit for the job I was hired to do.

MIRI’s mission is much more ambitious than the mission of the RAND Corporation, whose objectives included preventing nuclear war between major powers and stabilizing the US for decades under a regime of cybernetics and game theory. The main thinkers of RAND Corp (including John Von Neumann, John Nash, Thomas Schelling, and ambiguously Norbert Wiener) developed core game theoretic concepts (including conflict-theoretic concepts, in the form of zero-sum game theory, brinkmanship, and cybernetic control of people) and applied them to social and geopolitical situations.

John Nash, famously, developed symptoms of paranoid schizophrenia after his work in game theory. A (negative) review of A Beautiful Mind describes the dysfunctionally competitive and secretive Princeton math department Nash found himself in:

Persons in exactly the same area of research also don’t tend to talk to each other. On one level they may be concerned that others will steal their ideas. They also have a very understandable fear of presenting a new direction of inquiry before it has matured, lest the listening party trample the frail buds of thought beneath a sarcastic put-down.

When an idea has developed to the point where they realize that they may really be onto something, they still don’t want to talk about it . Eventually they want to be in a position to retain full credit for it. Since they do need feedback from other minds to advance their research, they frequently evolve a ‘strategy’ of hit-and-run tactics, whereby one researcher guards his own ideas very close to the chest, while trying to extract from the other person as much of what he knows as possible.

After Nash left, RAND corporation went on to assist the US military in the Vietnam War; Daniel Ellsberg, who worked at RAND corporation, leaked the Pentagon Papers in 1971, which showed a large un-reported expansion in the scope of the war, and that the main objective of the war was containing China rather than securing a non-communist South Vietnam. Ellsberg much later published The Doomsday Machine, detailing US nuclear war plans, including the fact that approval processes for launching nukes were highly insecure (valuing increasing the probability of launching retaliatory strikes over minimizing the rate of accidental launches), the fact that the US’s only nuclear war plan involved a nuclear genocide of China whether or not China had attacked the US, and the fact that the US air force deliberately misinformed President Kennedy about this plan in violation of the legal chain of command. At least some of the impetus for plans like this came from RAND corporation, due to among other things the mutually assured destruction doctrine, and John Von Neumann’s advocacy of pre-emptively nuking Russia. Given that Ellsberg was the only major whistleblower, and delayed publishing critical information for decades, it is improbable that complicity with such genocidal plans was uncommon at RAND corporation, and certain that such complicity was common in the Air Force and other parts of the military apparatus.

It wouldn’t be a stretch to suggest that Nash, through his work in game theory, came to notice more of the ways people around him (both at the Princeton math department and at the RAND Corporation) were acting against the mission of the organization in favor of egoic competition with each other and/​or insane genocide. Such a realization, if understood and propagated without adequate psychological support, could easily cause symptoms of paranoid schizophrenia. I recently discussed Nash on Twitter:

You’re supposed to read the things John Nash writes, but you’re not supposed to see the things he’s talking about, because that would make you a paranoid schizophrenic.

MIRI seemed to have a substantially conflict-theoretic view of the broad situation, even if not the local situation. I brought up the possibility of convincing DeepMind people to care about AI alignment. MIRI leaders including Eliezer Yudkowsky and Nate Soares told me that this was overly naive, that DeepMind would not stop dangerous research even if good reasons for this could be given. Therefore (they said) it was reasonable to develop precursors to AGI in-house to compete with organizations such as DeepMind in terms of developing AGI first. So I was being told to consider people at other AI organizations to be intractably wrong, people who it makes more sense to compete with than to treat as participants in a discourse.

[EDIT: Nate clarifies that he was trying to say that, even if it were possible to convince people to care about alignment, it might take too long, and so this doesn’t imply a conflict theory. I think the general point that time-to-converge-beliefs is relevant in a mistake theory is true, although in my recollection of the conversation Nate said it was intractable to convince people, not just that it would take a long time; also, writing arguments explicitly allows many people to read the same arguments, which makes scaling to more people easier.]

The difference between the beliefs of MIRI leadership and Michael Vassar was not exactly mistake theory versus conflict theory. Rather, MIRI’s conflict theory made an unprincipled exception for the situation inside MIRI, exclusively modeling conflict between MIRI and other outside parties, while Michael Vassar’s model did not make such exceptions. I was more interested in discussing Michael’s conflict theory with him than discussing MIRI leadership’s conflict theory with them, on the basis that it better reflected the situation I found myself in.

MIRI leadership was not offering me a less dark worldview than Michael Vassar was. Rather, this worldview was so dark that it asserted that many people would be destroying the world on fairly short timescales in a way intractable to reasoned discourse, such that everyone was likely to die in the next 20 years, and horrible AI torture scenarios might (with low probability) result depending on the details. By contrast, Michael Vassar thinks that it is common in institutions for people to play zero-sum games in a fractal manner, which makes it unlikely that they could coordinate well enough to cause such large harms. Michael has also encouraged me to try to reason with and understand the perspective of people who seem to be behaving destructively instead of simply assuming that the conflict is unresolvable.

And, given what I know now, I believe that applying a conflict theory to MIRI itself was significantly justified. Nate, just last month (due to myself talking to people on Twitter), admitted that he posted “political banalities” on the MIRI blog during the time I was there. I was concerned about the linked misleading statement in 2017 and told Nate Soares and others about it, although Nate Soares insisted that it was not a lie, because technically the word “excited” could indicate the magnitude of a feeling rather than the positiveness of it. While someone bullshitting on the public Internet (to talk up an organization that by Eliezer’s account “trashed humanity’s chances of survival”) doesn’t automatically imply they lie to their coworkers in-person, I did not and still don’t know where Nate is drawing the line here.

Anna Salamon, in a comment on my post, discusses “corruption” throughout CFAR’s history:

It’s more that I think CFAR’s actions were far from the kind of straight-forward, sincere attempt to increase rationality, compared to what people might have hoped for from us, or compared to what a relatively untraumatized 12-year-old up-and-coming-LWer might expect to see from adults who said they were trying to save the world from AI via learning how to think...I didn’t say things I believed false, but I did choose which things to say in a way that was more manipulative than I let on, and I hoarded information to have more control of people and what they could or couldn’t do in the way of pulling on CFAR’s plans in ways I couldn’t predict, and so on. Others on my view chose to go along with this, partly because they hoped I was doing something good (as did I), partly because it was way easier, partly because we all got to feel as though we were important via our work, partly because none of us were fully conscious of most of this.

(It should go without saying that, even if suspicion was justified, that doesn’t rule out improvement in the future; Anna and Nate’s transparency about past behavior here is a step in the right direction.)

Does developing a conflict theory of my situation necessitate developing the exact trauma complex that I did? Of course not. But the circumstances that justify a conflict theory make trauma much more likely, and vice versa. Traumatized people are likely to quickly update towards believing their situation is adversarial (“getting triggered”) when receiving modest evidence towards this, pattern-matching the new potentially-adversarial situation to the previous adversarial situation(s) they have encountered in order to generate defensive behavioral patterns.

## I was confused and constrained after tasking people I most trusted with helping take care of me early in psychosis

The following events took place in September-October 2017, 3-4 months after I had left MIRI in June.

I had a psychedelic trip in Berkeley, during which I discussed the idea of “exiting” civilization, the use of spiritual cognitive modalities to improve embodiment, the sense in which “identities” are cover stories, and multi-perspectival metaphysics. I lost a night of sleep, decided to “bravely” visit a planned family gathering the next day despite my sleep loss (partially as a way to overcome neurotic focus on downsides), lost another night of sleep, came back to Berkeley the next day, and lost a third night of sleep. After losing three nights of sleep, I started perceiving hallucinations such as a mirage-like effect in the door of my house (“beckoning me”, I thought). I walked around town and got lost, noticed my phone was almost out of battery, and called Jack Gallagher for assistance. He took me to his apartment; I rested in his room while being very concerned about my fate (I was worried that in some sense “I” or “my identity” was on a path towards death). I had a call with Bryce Hidysmith that alleviated some of my anxieties, and I excitedly talked with Ben Hoffman and Jack Gallagher as they walked me back to my house.

That night, I was concerned that my optimization might be “perverse” in some way, where in my intending to do something part of my brain would cause the opposite to happen. I attempted to focus my body and intentions so as to be able to take actions more predictably. I spent a number of hours lying down, perhaps experiencing hypnagogia, although I’m not sure whether or not I actually slept. That morning, I texted my friends that I had slept. Ben Hoffman came to my house in the morning and informed me that my housemate had informed Ben that I had “not slept” because he heard me walking around at night. (Technically, I could have slept during times he did not hear me walking around). Given my disorganized state, I could not think of a better response than “oops, I lied”. I subsequently collapsed and writhed on the floor until he led me to my bed, which indicates that I had not slept well.

Thus began multiple days of me being very anxious about whether I could sleep in part because people around me would apply some degree of coercion to me until they thought I was “well” which required sleeping. Such anxiety made it harder to sleep. I spent large parts of the daytime in bed which was likely bad for getting to sleep compared with, for example, taking a walk.

Here are some notable events during that week before I entered the psych ward:

1. Zack Davis gave me a math test: could I prove ? I gave a geometric argument: ” means spinning radians clockwise about the origin in the complex plane starting from 1″, and I drew a corresponding diagram. Zack said this didn’t show I could do math, since I could have remembered it, and asked me to give an algebraic argument. I failed to give one (and I think I would have failed pre-psychosis as well). He told me that I should have used the Taylor series expansion of . I believe this exchange was used to convince other people taking care of me that I was unable to do math, which was unreasonable given the difficulty of the problem and the lack of calibration on an easier problem. This worsened communication in part by causing me to be more afraid that people would justify coercing me (and not trying to understand me) on the basis of my lack of reasoning ability. (Days later, I tested myself with programming “FizzBuzz” and was highly distressed to find that my program was malfunctioning and I couldn’t successfully debug it, with my two eyes seeming to give me different pictures of the computer screen.)

2. I briefly talked with Michael Vassar (for less than an hour); he offered useful philosophical advice about basing my philosophy on the capacity to know instead of on the existence of fundamentally good or bad people, and made a medication suggestion (for my sleep issues) that turned out to intensify the psychosis in a way that he might have been able to predict had he thought more carefully, although I see that it was a reasonable off-the-cuff guess given the anti-anxiety properties of this medication.

3. I felt like I was being “contained” and “covered up”, which included people not being interested in learning about where I was mentally. (Someone taking care of me confirms years later that, yes, I was being contained, and people were covering up the fact that there was a sick animal in the house). Ben Hoffman opened the door which let sunlight into the doorway. I took it as an invitation and stepped outside. The light was wonderful, giving me perhaps the most ecstatic experience I have had in my life, as I sensed light around my mind, and I felt relieved from being covered up. I expounded on the greatness of the sunlight, referencing Sarah’s post on Ra. Ben Hoffman encouraged me to pay more attention to my body, at which point the light felt like it concentrated into a potentially-dangerous sharp vertical spike going through my body. (This may technically be or have some relation to a Kundalini awakening, though I haven’t confirmed this; there was a moment around this time that I believe someone around me labeled as a “psychotic break”.). I felt like I was performing some sort of light ritual navigating between revelation and concealment, and subsequently believed I had messed up the ritual terribly and became ashamed. Sometime around then I connected what I saw due to the light with the word “dasein” (from Heidegger), and shortly afterward connected “dasein” to the idea that zero-sum games are normal, such as in sports. I later connected the light to the idea that everyone else is the same person as me (and I heard my friends’ voices in another room in a tone as if they were my own voice).

4. I was very anxious and peed on a couch at some point and, when asked why, replied that I was “trying to make things worse”.

5. I was in my bed, ashamed and still, staring at the ceiling, afraid that I would do something bad. Sarah Constantin sat on my bed and tried to interact with me, including by touching my fingers. I felt very afraid of interacting with her because I thought I was steering in the wrong direction (doing bad things because they are bad) and might hurt Sarah or others. I felt something behind my eyes and tongue turn inward as I froze up more and more, sabotaging my own ability to influence the world, becoming catatonic (a new mind-altering medication that was suggested to me at the time, different from the one Michael suggested, might also have contributed to the catatonia). Sarah noticed that I was breathing highly abnormally and called the ER. While the ambulance took me there I felt like I could only steer in the wrong direction, and feared that if I continued I might become a worse person than Adolf Hitler. Sarah came with me in the ambulance and stayed with me in the hospital room; after I got IV benzos, I unfroze. The hospital subsequently sent me home.

6. One night I decided to open my window, jump out, and walk around town; I thought I was testing the hypothesis that things were very weird outside and the people in my house were separating me from the outside. I felt like I was bad and that perhaps I should walk towards water and drown, though this was not a plan I could have executed on. Ben Hoffman found me and walked me back home. Someone called my parents, who arrived the next day and took me to the ER (I was not asked if I wanted to be psychiatrically institutionalized); I was catatonic in the ER for about two days and was later moved to a psychiatric hospital.

While those who were taking care of me didn’t act optimally, the situation was incredibly confusing for myself and them, and I believe they did better than most other Berkeley rationalists would have done, who would themselves have done better than most members of the American middle class would have done.

## Are Michael Vassar and friends pro-psychosis gnostics?

Scott asserts:

Jessica was (I don’t know if she still is) part of a group centered around a person named Vassar, informally dubbed “the Vassarites”. Their philosophy is complicated, but they basically have a kind of gnostic stance where regular society is infinitely corrupt and conformist and traumatizing and you need to “jailbreak” yourself from it (I’m using a term I found on Ziz’s discussion of her conversations with Vassar; I don’t know if Vassar uses it himself). Jailbreaking involves a lot of tough conversations, breaking down of self, and (at least sometimes) lots of psychedelic drugs.

I have only heard Michael Vassar use the word “jailbreak” when discussing Ziz, but he believes it’s possible to use psychedelics to better see deception and enhance one’s ability to use one’s own mind independently, which I find to be true in my experience. This is a common belief among people who take psychedelics, and psychotherapeutic organizations including MAPS and Johns Hopkins, which have published conventional academic studies demonstrating that psychedelic treatment regimens widely reported to induce “ego death” have strong psychiatric benefits. Michael Vassar believes “tough conversations” that challenge people’s defensive nonsense (some of which is identity-based) are necessary for psychological growth, in common with psychotherapists, and in common with some MIRI/​CFAR people such as Anna Salamon.

I had tried psychedelics before talking significantly with Michael, in part due to a statement I heard from a friend (who wasn’t a CFAR employee but who did some teaching at CFAR events) along the lines of “CFAR can’t legally recommend that you try [a specific psychedelic], but...” (I don’t remember what followed the “but”), and in part due to suggestions from other friends.

“Infinitely corrupt and conformist and traumatizing” is hyperbolic (infinite corruption would leave nothing to steal), though Michael Vassar and many of his friends believe large parts of normal society (in particular in the professional-managerial class) are quite corrupt and conformist and traumatizing. I mentioned in a comment on the post one reason why I am not sad that I worked at MIRI instead of Google:

I’ve talked a lot with someone who got pretty high in Google’s management hierarchy, who seems really traumatized (and says she is) and who has a lot of physiological problems, which seem overall worse than mine. I wouldn’t trade places with her, mental health-wise.

I have talked with other people who have worked in corporate management, who have corroborated that corporate management traumatizes people into playing zero-sum games. If Michael and I are getting biased samples here and high-level management at companies like Google is actually a fine place to be in the usual case, then that indicates that MIRI is substantially worse than Google as a place to work. Iceman in the thread reports that his experience as a T-5 (apparently a “Senior” non-management rank) at Google “certainly traumatized” him, though this was less traumatizing than what he gathers from Zoe Curzi’s or my reports, which may themselves be selected for being especially severe due to the fact that they are being written about. Moral Mazes, an ethnographic study of corporate managers written by sociology professor Robert Jackall, is also consistent with my impression.

Scott asserts that Michael Vassar treats borderline psychosis as an achievement:

The combination of drugs and paranoia caused a lot of borderline psychosis, which the Vassarites mostly interpreted as success (“these people have been jailbroken out of the complacent/​conformist world, and are now correctly paranoid and weird”).

A strong form of this is contradicted by Zack Davis’s comment:

As some closer-to-the-source counterevidence against the “treating as an achievement” charge, I quote a 9 October 2017 2:13 p.m. Signal message in which Michael wrote to me:

Up for coming by? I’d like to understand just how similar your situation was to Jessica’s, including the details of her breakdown. We really don’t want this happening so frequently.

(Also, just, whatever you think of Michael’s many faults, very few people are cartoon villains that want their friends to have mental breakdowns.)

A weaker statement is true: Michael Vassar believes that mental states somewhat in the direction of psychosis, such as ones had by family members of clinical schizophrenics, are likely to be more intellectually productive over time. This is not an especially concerning or absurd belief. Scott Alexander himself cites research showing greater mental modeling and verbal intelligence in relatives of schizophrenics:

In keeping with this theory, studies find that first-degree relatives of autists have higher mechanistic cognition, and first-degree relatives of schizophrenics have higher mentalistic cognition and schizotypy. Autists’ relatives tend to have higher spatial compared to verbal intelligence, versus schizophrenics’ relatives who tend to have higher verbal compared to spatial intelligence. High-functioning schizotypals and high-functioning autists have normal (or high) IQs, no unusual number of fetal or early childhood traumas, and the usual amount of bodily symmetry; low-functioning autists and schizophrenics have low IQs, increased history of fetal and early childhood traumas, and increased bodily asymmetry indicative of mutational load.

(He also mentions John Nash as a particularly interesting case of mathematical intelligence being associated with schizophrenic symptoms, in common with my own comparison of myself to John Nash earlier in this post.)

I myself prefer to be sub-clinically schizotypal (which online self-diagnosis indicates I am) to the alternative of being non-schizotypal, which I understand is not a preference shared by everyone. There is a disagreement between Michael Vassar and Scott Alexander about the tradeoffs involved, but they agree there are both substantial advantages and disadvantages to mental states somewhat in the direction of schizophrenia.

## Is Vassar-induced psychosis a clinically significant phenomenon?

Scott Alexander draws a causal link between Michael Vassar and psychosis:

Since then, [Vassar has] tried to “jailbreak” a lot of people associated with MIRI and CFAR—again, this involves making them paranoid about MIRI/​CFAR and convincing them to take lots of drugs. The combination of drugs and paranoia caused a lot of borderline psychosis, which the Vassarites mostly interpreted as success (“these people have been jailbroken out of the complacent/​conformist world, and are now correctly paranoid and weird”). Occasionally it would also cause full-blown psychosis, which they would discourage people from seeking treatment for, because they thought psychiatrists were especially evil and corrupt and traumatizing and unable to understand that psychosis is just breaking mental shackles.

(to be clear: Michael Vassar and our mutual friends decided to place me in a psychiatric institution after I lost a week of sleep, which is at most a mild form of “discourag[ing] people from seeking treatment”; it is in many cases reasonable to try at-home treatment if it could prevent institutionalization.)

I have given an account in this post of the causality of my psychosis, in which Michael Vassar is relevant, and so are Eliezer Yudkowsky, Nate Soares, Anna Salamon, Sarah Constantin, Ben Hoffman, Zack Davis, Jack Gallagher, Bryce Hidysmith, Scott Alexander, Olivia Schaeffer, Alice Monday, Brian Tomasik, Venkatesh Rao, David Chapman, Carl Jung, M. Scott Peck, Martin Heidegger, Lao Tse, the Buddha, Jesus Christ, John Von Neumann, John Nash, and many others. Many of the contemporary people listed were/​are mutual friends of myself and Michael Vassar, which is mostly explained by myself finding these people especially helpful and interesting to talk to (correlated with myself and them finding Michael Vassar helpful and interesting to talk to), and Michael Vassar connecting us with each other.

Could Michael Vassar have orchestrated all this? That would be incredibly unlikely, requiring him to scheme so well that he determines the behavior of many others while having very little direct contact with me at the time of psychosis. If he is Xanatos, directing the entire social scene I was part of through hidden stratagems, that would be incredibly unlikely on priors, and far out of line with how effective I have seen him to be at causing people to cooperate with his intentions.

Other people who have had some amount of interaction with Michael Vassar and who have been psychotic commented in the thread. Devi Borg commented that the main contributor to her psychosis was “very casual drug use that even Michael chided me for”. Zack Davis commented that “Michael had nothing to do with causing” his psychosis.

Eric Bruylant commented that his thoughts related to Michael Vassar were “only one mid sized part of a much larger and weirder story...[his] psychosis was brought on by many factors, particularly extreme physical and mental stressors and exposure to various intense memes”, that “Vassar was central to my delusions, at the time of my arrest I had a notebook in which I had scrawled ‘Vassar is God’ and ‘Vassar is the Devil’ many times”; he only mentioned sparse direct contact with Michael Vassar himself, mentioning a conversation in which “[Michael] said my ‘pattern must be erased from the world’ in response to me defending EA”.

While on the surface Eric Bruylant seems to be most influenced by Michael Vassar out of any of the cases, the effect would have had to be indirect given his low amount of direct conversation with Michael, and he mentions an intermediary talking to both him and Michael. Anna Salamon’s hyperbolic statement that Michael is “the devil” may be causally related to Eric’s impressions of Michael especially given the scrawling of “Vassar is God” and “Vassar is the Devil”. It would be very surprising, showing an extreme degree of mental prowess, for Michael Vassar to be able to cause a psychotic break two hops out in the social graph through his own agency; it is much more likely that the vast majority of relevant agency was due to other people.

I have heard of 2 cases of psychosis in former MIRI employees in 2017-2021 who weren’t significantly talking with Michael or Ziz (I referenced one in my original post and have since then learned of another).

As I pointed out in a reply to Scott Alexander, if such strong mental powers are possible, that lends plausibility to the psychological models people at Leverage Research were acting on, in which people can spread harmful mental objects to each other. Scott’s comment that I reply to admits that attributing such strong psychological powers to Michael Vassar is “very awkward” for liberalism.

Such “liberalism” is hard for me to interpret in light of Scott’s commentary on my pre-psychosis speech:

Jessica is accusing MIRI of being insufficiently supportive to her by not taking her talk about demons and auras seriously when she was borderline psychotic, and comparing this to Leverage, who she thinks did a better job by promoting an environment where people accepted these ideas. I think MIRI was correct to be concerned and (reading between the lines) telling her to seek normal medical treatment, instead of telling her that demons were real and she was right to worry about them, and I think her disagreement with this is coming from a belief that psychosis is potentially a form of useful creative learning. While I don’t want to assert that I am 100% sure this can never be true, I think it’s true rarely enough, and with enough downside risk, that treating it as a psychiatric emergency is warranted.

[EDIT: I originally misinterpreted “it” in the last sentence as referring to “talk about demons and auras”, not “psychosis”, and the rest of this section is based on that incorrect assumption; Scott clarified that he meant the latter.]

I commented that this was effectively a restriction on my ability to speak freely, in contradiction with the liberal right to free speech. Given that a substantial fraction of the general public (e.g. New Age people and Christians, groups that overlap with psychiatrists) discuss “auras” and “demons”, it is inappropriate to treat such discussion as cause for a “psychiatric emergency”, a judgment substantially increasing the risk of involuntary institutionalization; that would be a case of a minority ideological community using the psychiatric system to enforce its local norms. If Scott were arguing that talk of “auras” and “demons” is a psychiatric emergency based on widely-accepted professional standards, he would need to name a specific DSM condition and argue that this talk constitutes symptoms of that condition.

In the context of MIRI, I was in a scientistic math cult ‘high-enthusiasm ideological community’, so seeing outside the ideology of this cult ‘community’ might naturally involve thinking about non-scientistic concepts; enforcing “talking about auras and demons is a psychiatric emergency” would, accordingly, be enforcing cult ‘local’ ideological boundaries using state force vested in professional psychiatrists for the purpose of protecting the public.

While Scott disclaims the threat of involuntary psychiatric institutionalization later in the thread, he did not accordingly update the original comment to clarify which statements he still endorses.

Scott has also attributed beliefs to me that I have never held or claimed to have held. I never asserted that demons are real. I do not think that it would have been helpful for people at MIRI to pretend that they thought demons were real. The nearest thing I can think of having said is that the hypothesis that “demons” were responsible for Eric Bruylant’s psychosis (a hypothesis offered by Eric Bruylant himself) might correspond to some real mental process worth investigating, and my complaint is that I and everyone else were discouraged from openly investigating such things and forming explicit hypotheses about them. It is entirely reasonable to be concerned about things conceptually similar to “demon possession” when someone has just attacked a mental health worker shortly after claiming to be possessed by a demon; discouraging such talk prevents people in situations like the one I was in from protecting their mental health by modeling threats to it.

Likewise, if someone had tried to explain why they disagreed with the specific things I said about auras (which did not include an assertion that they were “real,” only that they were not a noticeably more imprecise concept than “charisma”), that would have been a welcome and helpful response.

Scott Alexander has, at a Slate Star Codex meetup, said that Michael is a “witch” and/​or does powerful “witchcraft”. This is clearly of the same kind as speech about “auras” and “demons”. (The Sequences post on Occam’s Razor, relevantly, mentions “The lady down the street is a witch; she did it” as an example of a non-parsimonious explanation.)

I can’t believe that a standard against woo-adjacent language is being applied symmetrically given this and given that some other central rationalists such as Anna Salamon and multiple other CFAR employees used woo-adjacent language more often than I ever did.

## Conclusion

I hope reading this gives a better idea of the actual causal factors behind my psychosis. While Scott Alexander’s comment contained some relevant information and prompted me to write this post with much more relevant information, the majority of his specific claims were false or irrelevant in context.

While much of what I’ve said about my workplace is negative (given that I am specifically focusing on what was stressing me out), there were, of course large benefits to my job: I was able to research very interesting philosophical topics with very smart and interesting people, while being paid substantially more than I could get in academia; I was learning a lot even while having confusing conflicts with my coworkers. I think my life has become more interesting as a result of having worked at MIRI, and I have strong reason to believe that working at MIRI was overall good for my career.

I will close by poetically expressing some of what I learned:

If you try to have thoughts,

You’ll be told to think for the common good;

If you try to think for the common good,

You’ll be told to serve a master;

If you try to serve a master,

Their inadequacy will disappoint you;

If their inadequacy disappoints you,

You’ll try to take on the responsibility yourself;

If you try to take on the responsibility yourself,

You’ll fall to the underworld;

If you fall to the underworld,

You’ll need to think to benefit yourself;

If you think to benefit yourself,

You’ll ensure that you are counted as part of “the common good”.

## Postscript

Eliezer’s comment in support of Scott’s criticism was a reply to Aella saying he shared her (negative) sense about my previous post. If an account by Joshin is correct, we have textual evidence about this sense:

As regards Leverage: Aella recently crashed a party I was attending. This, I later learned, was the day that Jessica Taylor’s post about her experiences at CFAR and MIRI came out. When I sat next to her, she was reading that post. What follows is my recollection of our conversation.

Aella started off by expressing visible, audible dismay at the post. “Why is she doing this? This is undermining my frame. I’m trying to do something and she’s fucking it up.”

I asked her: “why do you do this?”

She said: “because it feels good. It feels like mastery. Like doing a good work of art or playing an instrument. It feels satisfying.”

I said: “and do you have any sense of whether what you’re doing is good or not?”

She said: “hahaha, you and Mark Lippmann both have the ‘good’ thing, I don’t really get it.”

I said: “huh, wow. Well, hey, I think your actions are evil; but on the other hand, I don’t believe everything I think.”

She said: “yeah, I don’t really mind being the evil thing. Seems okay to me.”

[EDIT: See Aella’s response; she says she didn’t say the line about undermining frames, and that use of the term “evil” has more context, and that the post overall was mostly wrong. To disambiguate her use of “evil”, I’ll quote the relevant part of her explanatory blog post below.]

I entered profound silence, both internal and external. I lost the urge to evangelize, my inner monologue left me, and my mind was quiet and slow-moving, like water. I inhabited weird states; sometimes I would experience a rapid vibration between the state of ‘total loss of agency’ and ‘total agency over all things’. Sometimes I experienced pain as pleasure, and pleasure as pain, like a new singular sensation for which there were no words at all. Sometimes time came to me viscerally, like an object in front of me I could nearly see except it was in my body, rolling in this fast AND-THIS-AND-THIS motion, and I would be destroyed and created by it, like my being was stretched on either side and brought into existence by the flipping in between. I cried often.

I became sadistic. I’d previously been embracing a sort of masochism – education in the pain, fearlessness of eternal torture or whatever – but as my identity expanded to include that which was educating me, I found myself experiencing sadism. I enjoyed causing pain to myself, and with this I discovered evil. I found within me every murderer, torturer, destroyer, and I was shameless. As I prostrated myself on the floor, each nerve ending of my mind writhing with the pain of mankind, I also delighted in subjecting myself to it, in being it, in causing it. I became unified with it.

The evil was also subsumed by, or part of, love. Or maybe not “love” – I’d lost the concept of love, where the word no longer attached to a particular cluster of sense in my mind. The thing in its place was something like looking, where to understand something fully meant accepting it fully. I loved everything because I Looked at everything. The darkness felt good because I Looked at it. I was complete in my pain only when I experienced the responsibility for inducing that pain.

• Experimental Two-Axis Voting: “Overall” & “Agreement”

The LW team has spent the last few weeks developing alternative voting systems. We’ve enabled two-axis voting on this post. The two dimensions are:

• Overall: what is your overall feeling about the comment? Does it contribute positively to the conversation? Do you want to see more comments like this?

• Agreement: do you agree with the position of this comment?

Separating these out allows for you to express more nuanced reactions to comments such as “I still disagree with what you’re arguing for, but you’ve raised some interesting and helpful points” and “although I agree with what you’re saying, I think this is a low-quality comment”.

Edited to Add: I checked with Jessica first whether she was happy for us to try this experiment with her post.

• A few notes:

• This will be one experiment among several. This is an experiment so bugs are possible. We’re interested in what effect this has on the quality of conversation, what the experience of voting in this system is like, and what the experience of skimming a thread and seeing these scores is like.

• Agreement-votes and related code will not necessarily be kept forever, if we don’t go with that as the overall voting system for LW. Agreement-votes and agreement-scores will be kept at least for as long as any thread using that voting system is active.

• GreaterWrong, and some areas of the site which aren’t especially integrated with the two-axis voting, will show only the overall score, not the agreement scores. Sorting is by overall score and only the overall score affects user karma.

• Feedback: I intuitively expected the first/​left vote to be “agree/​disagree” and the second/​right vote to be “compliance with good standards.” In reality, it’s closer to the reverse of that. Not sure how typical my experience will be.

(In my imagination, a user goes “I like this” followed by ”...but it was sketchy from an epistemic standpoint” or similar.)

• I think if you swapped them, at least at this stage, you would have a bunch of people who accidentally indicated agreement because they thought they were normal voting

• Yeah. Normal voting could have been left as is, with two buttons that indicate those two things. If something had an extreme score via voting, but didn’t score strongly (or in the same direction) via the other two, then voting would be capturing something else.

One of the issues with these things (like ‘agree’) - whatever that refers to (i.e. agree with what?), is that the longer, and more parts, a comment has, the less a single score captures that well, for any dimension.

• One of the issues with these things (like ‘agree’) - whatever that refers to (i.e. agree with what?), is that the longer, and more parts, a comment has, the less a single score captures that well, for any dimension.

This thread The comments on this post in particular are a great example of this. Lots of people taking pieces of the post and going ‘I disagree with this’ or ‘this is not true’.

• I am confused.

I guess the first vote means “whatever my vote would be under the old system”.

The second vote… I am not sure how to apply it e.g. to your comment. If I click “agree”, what does it mean?

• I agree that Duncan intuitively expected the first/​left vote to be agree/​disagree, and the second/​right vote to be good standards? [yes]

• It also seems to me that the first/​left vote is agree/​disagree, and the second/​right vote is good standards? [no]

I guess I am just going to use the first vote as usual (now if you vote “agree” on this comment, does it mean “yes, I believe this is exactly what Viliam will do” or “yes, I will do the same thing”?), and the second one only in situations that seem unambiguous.

• I’ve been doing a lot of ‘overall’ voting based on “all things considered, am I happy this comment exists?” or “would I like to see more comments like this in the future?” and ‘agreement’ voting on specifically “do I endorse its contents?”

For a non-charged example, I upvoted Duncan’s comment suggesting that the buttons be swapped, because I think that a good kind of feedback to give on an experiment, and voted disagree on it because I don’t think swapping would have the effect he thinks it would have.

• Another example: if two people are having a back and forth where they seem to remember different things, I’ll normal vote for both of them because I’m glad they’re hashing it out, but I won’t agree/​disagree with any of them because I don’t have any inside information on what happened.

• Cool experiment. A note: I just clicked “agree” to a comment and noticed that it gave two points, and was somewhat surprised (with a bad valence). Maybe it makes sense, but somehow I expected the agree thing to mean literally “this many people clicked agree”. (Haven’t thought about it, just a reaction.)

• I think that makes sense but on the other hand both vote counts being directly comparable seems good

• Just some immediate feedback on this—There is a big noticeable phenomenon, which is that I agree and disagree with many comments, even though I frequently think the comment is just “OK”, so I am making many more “agreement” votes than “overall” votes.

• Really! I just encountered this feature, and have been more reluctant to agree than to upvote. Admittedly, the topic has mostly concerned conversations which I didn’t hear.

• This post discusses suicides and psychosis of named people. I think it’s an inappropriate place to experiment with a new voting system. I think you could choose a less weighty post for initial experiments.

Also, I don’t get the impression that this experiment was done with jessicata’s explicit agreement, and I’m worried that this post is being singled out for special treatment because of its content.

• We did ask Jessica first whether she would want to participate in the experiment. Also, the reason why we want to experiment with additional voting systems is because specifically threads like this seem like they kind of tend to go badly using our current voting system, and switching it out is an attempt at making them go better, in a way that I think Jessica also wants.

• The agreement box got a bit excited :)

• Recording my reaction to the new system: I looked back at the comment and saw it got a downvote on agreement and got that small twinge of negative affect that I sometimes get from seeing a downvote on my own comment, then realized it probably just means that person doesn’t have the same bug, and it passed. It would be interesting to see if this instinct changes after some time getting used to this system.

• I try to imagine the two numbers as parts of a complex number, like “37+8i”.

From that perspective, both “37+8i” and “37-8i” feel positive.

• I also see the CSS as a bit wonky, at least on mobile, though not as wonky as that. I see the agreement box as about one pixel higher than the overall box.

• Yay for experimenting!

• I was noticing different users having different patterns for upvotes versus agreement (partly because mine seemed to be skewed toward agreement…) and I wanted to play with it a little more. Here’s a script that extracts the votes from this page. Expand all comments (⌘F) before running.

function author(meta) {
return meta.children[1].innerText;
}
function votes(meta) {
return meta.children[3].innerText.split("\n");
}
const metas = document.getElementsByClassName("CommentsItem-meta");
const output = {};
for (let i = 0; i < metas.length; i++) {
const meta = metas[i];
output[author(meta)] = output[author(meta)] || {
comments: 0,
upvotes: 0,
agreement: 0,
};
output[author(meta)].comments++;
output[author(meta)].upvotes += Number(votes(meta)[0]);
output[author(meta)].agreement += Number(votes(meta)[1]);
}


Here’s the total agreement:upvotes ratio for this thread, by user:

-13:7    Alastair JL
7:-15    Liam Donovan
-5:35    Martin Randall
0:18     Quintin Pope
0:3      countingtoten
0:4      Pattern
0:6      bn22
1:15     Yoav Ravid
2:27     Benquo
2:26     Scott Alexander
32:211   jessicata
6:35     TekhneMakre
21:95    So8res
8:32     Aella
7:27     Raemon
23:76    ChristianKl
20:56    Ruby
8:22     romeostevensit
19:51    cata
9:24     tailcalled
65:160   Duncan_Sabien
40:93    Eliezer Yudkowsky
7:16     Bjartur Tómas
43:97    T3t
10:22    Viliam
38:64    habryka
85:143   jimrandomh
23:35    Davis_Kingsley
2:3      Joe_Collman
11:14    Veedrac
2:2      [comment deleted]
43:32    adamzerner
76:56    jefftk
10:5     Joe Rocca
4:2      Thomas Kehrenberg

• It might make sense to do such a binary vote on the second axis (agreement) only when the comment author requests it on a comment, and not on all comments by default.

Some reasons:

A comment could state multiple points and you agree to some but not others. Sometimes disagreements are about calibration, say the author thinks something is 70% likely but you think it’s 50%. And sometimes neither of you explicitly use numbers like 50% or 70%, you just use quantifiers in plain english. Sometimes you know if you agree/​disagree assuming you’ve understood the author’s position, but you’re not yet confident you have understood the author’s position. All these require asking clarification questions or replies, rather than binary vote.

Also binary votes can cause polarisation.

Letting comment author select when to solicit vote, also allows author to define a statement clearly enough (deal with all edge cases etc) before soliciting.

• Yudkowsky is so awesome!!

• This is a good example of a comment that is not a contribution to the discussion, but which lots of people here agree with, illustrating the power and flexibility of the two-axis voting system. Except, if you meant it as such, it’s a valuable contribution to the discussion, and shouldn’t be downvoted. But if it weren’t downvoted it wouldn’t properly show off the voting system, and would no longer be valuable. You’ve tied us in knots!

• Yep, I wanted to experiment with a central example of a comment that should be in the “downvote/​agree” quadrant, since that seemed like the least likely to occur naturally. It’s nice to see the voting system is working as intended.

• So in general I’m noticing a pattern where you make claims about things that happened, but it turns out those things didn’t happen, or there’s no evidence that they happened and no reason one would believe they did a priori, and you’re actually just making an inference and presenting as the state of reality. These seem to universally be inferences which cast other’s motives or actions in a negative light. They seem to be broadly unjustified by the provided evidence and surrounding context, or rely on models of reality (both physical and social) which I think are very likely in conflict with the models held by the people those inferences are about. Sometimes you draw correspondences between the beliefs and/​or behaviors of different people or groups, in what seems like an attempt to justify the belief/​behavior of the first, or to frame the second as a hypocrite for complaining about the first (though you don’t usually say why these comparisons are relevant). These correspondences turn out to only be superficial similarities while lacking any of the mechanistic similarities that would make them useful for comparisons, or actually conceal the fact that the two parties in question have opposite beliefs on a more relevant axis. For lack of a better framing, it seems like you’re failing to disentangle your social reality from the things that actually happened. I am deeply sympathetic to the issues you experienced, but the way this post (and the previous post) was written makes it extremely difficult to engage with productively, since so many of the relevant claims turn out to not to be claims about things that actually occurred, despite their appearance as such.

I went through the post and picked out some examples (stopping only because it was taking too much of my evening, not because I ran out), with these being the most salient:

> 1. Many others have worked to conceal the circumstances of their deaths

As far as I can tell, this is almost entirely unsubstantiated, with the possible exception of Maia, and in that case it would have been Ziz’s circle doing the concealment, not any of the individuals you express specific concerns about.

> 2. My psychotic break in which I imagined myself creating hell was a natural extension of this line of thought.

The way this is written makes it sound like you think that it ought to have been a (relatively) predictable consequence.

> 3. By the law of excluded middle, the only possible alternative hypothesis is that the problems I experienced at MIRI and CFAR were unique or at least unusually severe, significantly worse than companies like Google for employees’ mental well-being.

In theory, the problems you experienced could have come from sources other than your professional environment. That is a heck of a missing middle.

> 4. This view is rapidly becoming mainstream, validated by research performed by MAPS and at Johns Hopkins, and FDA approval for psychedelic psychotherapy is widely anticipated in the field.

This seems to imply that Michael’s view on the subject corresponds in most relevant ways to the views taken by MAPS/​etc. I don’t know what Michael’s views on the subject actually are, but on priors I’m extremely skeptical that the correspondence is sufficient to make this a useful comparison (which, as an appeal to authority, is already on moderately shaky grounds).

> 5. including a report from a friend along the lines of “CFAR can’t legally recommend that you try [a specific psychedelic], but...”

Can you clarify what relationship this friend had with CFAR? This could be concerning if they were a CFAR employee at the time. If they were not a CFAR employee, were they quoting someone who was? If neither, I’m not sure why it’s evidence of CFAR’s views on the subject.

> 7. MIRI leaders were already privately encouraging me to adopt a kind of conflict theory in which many AI organizations were trying to destroy the world

This is not supported by your later descriptions of those interactions.

First, at no point do you describe any encouragement to adopt a conflict-theoretic view. I assume this section is the relevant one: “MIRI leaders including Eliezer Yudkowsky and Nate Soares told me that this was overly naive, that DeepMind would not stop dangerous research even if good reasons for this could be given. Therefore (they said) it was reasonable to develop precursors to AGI in-house to compete with organizations such as DeepMind in terms of developing AGI first. So I was being told to consider people at other AI organizations to be intractably wrong, people who it makes more sense to compete with than to treat as participants in a discourse.” This does not describe encouragement to adopt a conflict-theoretic view. It describes encouragement to adopt some specific beliefs (e.g. about DeepMind’s lack of willingness to integrate information about AI safety into their models and then behave appropriately, and possible ways to mitigate the implied risks), but these are object-level claims, not ontological claims.

Second, this does not describe a stated belief that “AI organizations were trying to destroy the world”. Approximately nobody believes that AI researchers at e.g. DeepMind are actively trying to destroy the world. A more accurate representation of the prevailing belief would be something like “they are doing something that may end up destroying the world, which, from their perspective, would be a totally unintentional and unforeseen consequence of their actions”. This distinction is important, I’m not just nitpicking.

> 8.I was given ridiculous statements and assignments including the claim that MIRI already knew about a working AGI design and that it would not be that hard for me to come up with a working AGI design on short notice just by thinking about it, without being given hints.

I would be pretty surprised if this turned out to be an accurate summary of those interactions. In particular, that:
1) MIRI (Nate) believed, as of 2017, that would it be possible to develop AGI given known techniques & technology in 2017, with effectively no new research or breakthroughs required, just implementation, and
2) You were told that you should be able to come up with such a design yourself on short notice without any help or collaborative effort.

Indeed, the anecdote you later relay about your interaction with Nate does not support either of those claims, though it carries its own confusions (why would he encourage you to think about how to develop a working AGI using existing techniques if it would be dangerous to tell you outright? The fact that there’s an extremely obvious contradiction here makes me think that there was a severe miscommunication on at least one side of this conversation).

> 10. His belief that mental states somewhat in the direction of psychosis, such as those had by family members of schizophrenics, are helpful for some forms of intellectual productivity is also shared by Scott Alexander and many academics.

This seems to be (again) drawing a strong correspondence between Michael’s beliefs and actions taken on the basis of those beliefs, and Scott’s beliefs. Scott’s citation of research “showing greater mental modeling and verbal intelligence in relatives of schizophrenics” does not imply that Scott thinks it is a good idea to attempt to induce sub-clinical schizotypal states in people—in fact I would bet a lot of money that Scott thinks doing so is an extremely bad idea, which is a more relevant basis on which to compare his belief’s with Michael’s.

> 11. Scott asserts that Michael Vassar discourages people from seeking mental health treatment. Some mutual friends tried treating me at home for a week as I was losing sleep and becoming increasingly mentally disorganized before deciding to send me to a psychiatric institution, which was a reasonable decision in retrospect.

Were any of those people Michael Vassar? If not, I’m not sure how it’s intended to rebut Scott’s claim (though in general I agree that Scott’s claim about Michael’s discouragement could stand to be substantiated in some way). If so, retracted, but then why is that not specified here, given how clearly it rebuts one of the arguments?

> 13. This is inappropriately enforcing the norms of a minority ideological community as if they were widely accepted professional standards.

This does not read like a charitable interpretation of Scott’s concerns. To wit, if I was friends with someone who I knew was a materialist atheist rationalist, and then one day they came to me and started talking about auras and demons in a way which made it sound like they believed they existed (the way materialist phenomena exist, rather than as “frames”, “analogies”, or “I don’t have a better word for this purely internal mental phenomenon, and this word, despite all the red flags, has connotations that are sufficiently useful that I’m going to use it as a placeholder anyways”), I would update very sharply on them having had a psychotic break (or similar). The relevant reference class for how worrying it is that someone believes something is not what the general public believes, it’s how sharp a departure it is from that person’s previous beliefs (and in what direction—a new, sudden, literal belief in auras and demons is a very common symptom of a certain cluster of mental illnesses!). Note: I am not making any endorsement about the appropriateness of involuntary commitment on the basis of someone suddenly expressing these beliefs. I’m not well-calibrated on the likely distribution of outcomes from doing so.

Moving on from the summary:

> I notice that I have encountered little discussion, public or private, of the conditions of Maia Pasek’s death. To a naive perspective this lack of interest in a dramatic and mysterious death would seem deeply unnatural and extremely surprising, which makes it strong evidence that people are indeed participating in this cover-up.

I do not agree that this is the naive perspective. People, in general, do not enjoy discussing suicide. I have no specific reason to believe that people in the community enjoy this more than average, or expect to get enough value out of it to outweigh the unpleasantness. Unless there is something specifically surprising about a specific suicide that seems relevant to the community more broadly, my default expectation would be that people largely don’t talk about it (the same way they largely don’t talk about most things not relevant to their interests). As far as I can tell, Maia was not a public figure in a way that would, by itself, be sufficient temptation to override people’s generalized dispreference for gossip on the subject. Until today I had not seen any explanation of the specific chain of events preceding Maia’s suicide; a reasonable prior would have been “mentally ill person commits suicide, possibly related to experimental brain hacking they were attempting to do to themselves (as partially detailed on their own blog)”. This seems like fairly strong supporting evidence. I suppose the relevant community interest here is “consider not doing novel neuropsychological research on yourself, especially if you’re already not in a great place”. I agree with that as a useful default heuristic, but it’s one that seems “too obvious to say out loud”. Where do you think a good place for a PSA is?

In general it’s not clear what kind of cover-up you’re imagining. I have not seen any explicit or implicit discouragement of such discussion, except in the (previously mentioned) banal sense that people don’t like discussing suicide and hardly need additional reasons to avoid it.

> While there is a post about Jay’s death on LessWrong, it contains almost no details about Jay’s mental state leading up to their death, and does not link to Jay’s recent blog post. It seems that people other than Jay are also treating the circumstances of Jay’s death as an infohazard.

Jay, like Maia, does not strike me as a public figure. The post you linked is strongly upvoted (+124 at time of writing). It seems to be written as a tribute, which is not the place where I would link to Jay’s most recent blog post if I were trying to analyze causal factors upstream of Jay’s suicide. Again, my prior is that nothing about the lack of discussion needs an explanation in the form of an active conspiracy to suppress such discussion.

> There is a very disturbing possibility (with some evidence for it) here, that people may be picked off one by one (sometimes in ways they cooperate with, e.g. through suicide), with most everyone being too scared to investigate the circumstances.

Please be specific, what evidence? This is an extremely serious claim. For what it’s worth, I don’t agree that people can be picked off in “ways they cooperate with, e.g. though suicide”, unless you mean that Person A would make a concerted effort to convince Person B that Person B ought to commit suicide. But, to the extent that you believe this, the two examples of suicides you listed seem to be causally downstream of “self-hacking with friends(?) in similarly precarious mental states”, not “someone was engaged in a cover-up (of what?) and decided it would be a good idea to try to get these people to commit suicide (why?) despite the incredible risks and unclear benefits”.

> These considerations were widely regarded within MIRI as an important part of AI strategy. I was explicitly expected to think about AI strategy as part of my job. So it isn’t a stretch to say that thinking about extreme AI torture scenarios was part of my job.

S-risk concerns being potentially relevant factors in research does not imply that thinking about specific details of AI torture scenarios would be part of your job. Did someone at MIRI make a claim that imagining S-risk outcomes in graphic detail was necessary or helpful to doing research that factored in S-risks as part of the landscape? Roll to disbelieve.

> Another part of my job was to imagine myself in the role of someone who is going to be creating the AI that could make everything literally the worst it could possibly be, in order to avoid doing that, and prevent others from doing so.

Similarly, roll to disbelieve that someone else at MIRI suggested that you imagine this without significant missing context which would substantially alter the naïve interpretation of that claim.

Skipping the anecdote with Nate & AGI, as I addressed it above, but following that:

> Nate and others who claimed or implied that they had such information did not use it to win bets or make persuasive arguments against people who disagreed with them, but instead used the shared impression or vibe of superior knowledge to invalidate people who disagreed with them.

Putting aside the question of whether Nate or others actually claimed or implied that they had a working model of how to create an AGI with 2017-technology, if they had made such a claim, I am not sure why you would expect them to try to use that model to win bets or make persuasive arguments. I would in fact expect them to never say anything about it to the outside world, because why on earth would you do that given, uh, the entire enterprise of MIRI?

> But I was systematically discouraged from talking with people who doubted that MIRI was for real or publicly revealing evidence that MIRI was not for real, which made it harder for me to seriously entertain that hypothesis.

Can you concretize this? When I read this sentence, an example interaction I can imagine fitting this description would be someone at MIRI advising you in a conversation to avoid talking to Person A, because Person A doubted that MIRI was for real (rather than for other reasons, like “people who spend a lot of time interacting with Person A seem to have a curious habit of undergoing psychotic breaks”).

> In retrospect, I was correct that Nate Soares did not know of a workable AGI design.

I am not sure how either of the excerpts support the claim that “Nate Soares did not know of a workable AGI design” (which, to be clear, I agree with, but for totally unrelated reasons described earlier). Neither of them make any explicit or implicit claims about knowledge (or lack thereof) of AGI design.

> In a recent post, Eliezer Yudkowsky explicitly says that voicing “AGI timelines” is “not great for one’s mental health”, a new additional consideration for suppressing information about timelines.

This is not what Eliezer says. Quoting directly: “What feelings I do have, I worry may be unwise to voice; AGI timelines, in my own experience, are not great for one’s mental health, and I worry that other people seem to have weaker immune systems than even my own.”

This is making a claim that AI timelines themselves are poor for one’s mental health (presumably, the consideration of AI timelines), not the voicing of them.

> Researchers were told not to talk to each other about research, on the basis that some people were working on secret projects and would have to say so if they were asked what they were working on. Instead, we were to talk to Nate Soares, who would connect people who were working on similar projects. I mentioned this to a friend later who considered it a standard cult abuse tactic, of making sure one’s victims don’t talk to each other.

The reason cults attempt to limit communication between their victims is to prevent the formation of common knowledge of specific abusive behaviors that the cult is engaging in, and similar information-theoretic concerns. Taking for granted the description of MIRI’s policy (and application of it) on internal communication about research, this is not a valid correspondence; they were not asking you to avoid discussing your interactions with MIRI (or individuals within MIRI) with other MIRI employees, which could indeed be worrying if it were a sufficiently general ask (rather than e.g. MIRI asking someone in HR not to discuss confidential details of various employees with other employees, which would technically fit the description above but is obviously not what we’re talking about).

> It should be noted that, as I was nominally Nate’s employee, it is consistent with standard business practices for him to prevent me from talking with people who might distract me from my work; this goes to show the continuity between “cults” and “normal corporations”.

I can’t parse this in a way which makes it seem remotely like “standard business practices”. I disagree that it is a standard business practice to actively discourage employees from talking to people who might distract them from their work, largely because employees do not generally have a problem with being distracted from their work because they are talking to specific people. I have worked at number of different companies, each very different from the last in terms of size, domain, organizational culture, etc, and there was not a single occasion where I felt the slightest hint that anyone above me on the org ladder thought I ought to not talk to certain people to avoid distractions, nor did I ever feel like that was part of the organization’s expectations of me.

> MIRI researchers were being very generally denied information (e.g. told not to talk to each other) in a way that makes more sense under a “bad motives” hypothesis than a “good motives” hypothesis. Alternative explanations offered were not persuasive.

Really? This seems to totally disregard explanations unrelated to whether someone has “good motives” or “bad motives”, which are not reasons that I would expect MIRI to have at the top of their list justifying whatever their info-sec policy was.

> By contrast, Michael Vassar thinks that it is common in institutions for people to play zero-sum games in a fractal manner, which makes it unlikely that they could coordinate well enough to cause such large harms.

This is a bit of a sidenote but I’m not sure why this claim is interesting w.r.t. AI alignment, since the problem space, almost by definition, does not require coordination with intent to cause large harms, in order to in fact cause large harms. If the claim is that “institutions are sufficiently dysfunctional that they’ll never be able to build an AGI at all”, that seems like a fully-general argument against institutions ever achieving any goals that require any sort of internal and/​or external coordination (trivially invalidated by looking out the window).

> made a medication suggestion (for my sleep issues) that turned out to intensify the psychosis in a way that he might have been able to predict had he thought more carefully

I want to note that while this does not provide much evidence in the way of the claim that Michael Vassar actively seeks to induce psychotic states in people, it is in fact a claim that Michael Vassar was directly, causally upstream of your psychosis worsening, which is worth considering in light of what this entire post seems to be arguing against.

• As far as I can tell, this is almost entirely unsubstantiated, with the possible exception of Maia, and in that case it would have been Ziz’s circle doing the concealment, not any of the individuals you express specific concerns about.

I mentioned:

1. I did it too (there are others like me).

2. Ziz labeling it as an infohazard is in compliance with feedback Ziz has received from community leaders.

3. People didn’t draw attention to Fluttershy’s recent blog post, even after I posted about it.

The way this is written makes it sound like you think that it ought to have been a (relatively) predictable consequence.

Whether or not it’s predictable ahead of time, the extended account I give shows the natural progression of thought.

In theory, the problems you experienced could have come from sources other than your professional environment. That is a heck of a missing middle.

Even if there were other causes I still experienced these problems at MIRI. Also, most of the post is an argument that the professional environment contributed quite a lot.

I don’t know what Michael’s views on the subject actually are, but on priors I’m extremely skeptical that the correspondence is sufficient to make this a useful comparison (which, as an appeal to authority, is already on moderately shaky grounds).

Obviously he’s going to disagree with them on specifics, I’m mentioning them as agreeing on the general view Scott attributed to Michael.

Can you clarify what relationship this friend had with CFAR? This could be concerning if they were a CFAR employee at the time. If they were not a CFAR employee, were they quoting someone who was? If neither, I’m not sure why it’s evidence of CFAR’s views on the subject.

Not an employee. Did some teaching at CFAR events. Implied they were telling me information about a CFAR employee’s opinion. Even if they’re repeating a rumor, that still implies use of that psychedelic is common in the social circle, even if that doesn’t mean CFAR caused it at all.

This does not describe encouragement to adopt a conflict-theoretic view. It describes encouragement to adopt some specific beliefs (e.g. about DeepMind’s lack of willingness to integrate information about AI safety into their models and then behave appropriately, and possible ways to mitigate the implied risks), but these are object-level claims, not ontological claims.

The distinction I am making is about (a) treating DeepMind as a participant in discourse, who can be convinced by reasoned argument to do better things, or (b) treating DeepMind as a competitor in a race, who can’t be reasoned with, but has to be beaten decisively in technological capacity. It seems natural to me to label (a) as “mistake theory” and (b) as “conflict theory”, and you have a philosophical disagreement here, but this seems like a quibble.

Indeed, the anecdote you later relay about your interaction with Nate does not support either of those claims, though it carries its own confusions (why would he encourage you to think about how to develop a working AGI using existing techniques if it would be dangerous to tell you outright? The fact that there’s an extremely obvious contradiction here makes me think that there was a severe miscommunication on at least one side of this conversation).

The account I give does support this. His assignment to me would not make sense as an argument of his proposition, that “the pieces to make AGI are already out there and someone just has to put them together”, unless he expected that in completing the assignment I would gain evidence for that proposition. For that to happen, it would have to be an AGI design that is workable (in the sense that with more compute and some fine-tuning, it would actually work); non-workable AGI designs are created all the time in the AI field, and provide no significant evidence about the proposition he was asserting.

Scott’s citation of research “showing greater mental modeling and verbal intelligence in relatives of schizophrenics” does not imply that Scott thinks it is a good idea to attempt to induce sub-clinical schizotypal states in people

Yes, I mentioned disagreement about tradeoffs later in the post.

Were any of those people Michael Vassar?

I believe he was in the chat thread that made the decision, though I’m not sure.

To wit, if I was friends with someone who I knew was a materialist atheist rationalist, and then one day they came to me and started talking about auras and demons in a way which made it sound like they believed they existed (the way materialist phenomena exist, rather than as “frames”, “analogies”, or “I don’t have a better word for this purely internal mental phenomenon, and this word, despite all the red flags, has connotations that are sufficiently useful that I’m going to use it as a placeholder anyways”), I would update very sharply on them having had a psychotic break (or similar).

Later in the thread I give details about what claims I was actually making. Those claims are broadly consistent with materialist ontology.

Unless there is something specifically surprising about a specific suicide that seems relevant to the community more broadly, my default expectation would be that people largely don’t talk about it (the same way they largely don’t talk about most things not relevant to their interests).

It is massively surprising that the suicide was specifically narratized as being caused by inter-hemisphere conflict and precommitments/​extortion. Precommitments/​extortion are central LessWrong decision theory topics.

In general it’s not clear what kind of cover-up you’re imagining. I have not seen any explicit or implicit discouragement of such discussion, except in the (previously mentioned) banal sense that people don’t like discussing suicide and hardly need additional reasons to avoid it.

I mentioned the specific example that I told a friend about it and then told them not to tell other people. I would guess that Ziz told others who also did this.

Again, my prior is that nothing about the lack of discussion needs an explanation in the form of an active conspiracy to suppress such discussion.

Passive distributed conspiracy, directing attention away from critical details, is sufficient to hide important parts of the world from view.

Please be specific, what evidence? This is an extremely serious claim. For what it’s worth, I don’t agree that people can be picked off in “ways they cooperate with, e.g. though suicide”, unless you mean that Person A would make a concerted effort to convince Person B that Person B ought to commit suicide. But, to the extent that you believe this, the two examples of suicides you listed seem to be causally downstream of “self-hacking with friends(?) in similarly precarious mental states”, not “someone was engaged in a cover-up (of what?) and decided it would be a good idea to try to get these people to commit suicide (why?) despite the incredible risks and unclear benefits”.

Central people made efforts to convince Ziz that she was likely to be “net negative” due to her willingness to generally reveal information. Ziz’s own moral philosophy may imply that many “non-good” people are net negative. It is unsurprising if ideology influenced people towards suicide. The “being picked off” could come from mostly-unconscious motives.

The story of people being unwilling to confirm or deny whether they would investigate if a friend disappeared was very striking at the time, I was surprised how far people would go to avoid investigation.

Did someone at MIRI make a claim that imagining S-risk outcomes in graphic detail was necessary or helpful to doing research that factored in S-risks as part of the landscape?

No. But the magnitude of how bad they are depends on the specific details. I mentioned a blog post that says to imagine being dipped into lava. This was considered a relevant argument about the severity of s-risks.

Similarly, roll to disbelieve that someone else at MIRI suggested that you imagine this without significant missing context which would substantially alter the naïve interpretation of that claim.

It wasn’t specifically suggested. It was implied by the general research landscape presented e.g. Eliezer’s Arbital article.

Putting aside the question of whether Nate or others actually claimed or implied that they had a working model of how to create an AGI with 2017-technology, if they had made such a claim, I am not sure why you would expect them to try to use that model to win bets or make persuasive arguments.

They shouldn’t expect anyone to believe them if they don’t make persuasive arguments or do something similar. If I don’t agree with them that AI is likely to come soon, why would I do research based on that assumption? Presumably he would prefer me to, if in fact AGI is likely to come soon.

This is not what Eliezer says. Quoting directly: “What feelings I do have, I worry may be unwise to voice; AGI timelines, in my own experience, are not great for one’s mental health, and I worry that other people seem to have weaker immune systems than even my own.”

This is making a claim that AI timelines themselves are poor for one’s mental health (presumably, the consideration of AI timelines), not the voicing of them.

He’s clearly giving “AI timelines are bad for one’s mental health” as a reason why his feelings about AI timelines may be unwise to voice.

Taking for granted the description of MIRI’s policy (and application of it) on internal communication about research, this is not a valid correspondence; they were not asking you to avoid discussing your interactions with MIRI (or individuals within MIRI) with other MIRI employees, which could indeed be worrying if it were a sufficiently general ask (rather than e.g. MIRI asking someone in HR not to discuss confidential details of various employees with other employees, which would technically fit the description above but is obviously not what we’re talking about).

Part of our research was decision theory, so talking about decision theory would be one way to talk about the desirability of the security policies. Also, I mentioned Nate and Anna discouraging Michael Vassar from talking to researchers, which fits the pattern of preventing discussion of interactions with MIRI, as he was a notable critic of MIRI at that time.

I disagree that it is a standard business practice to actively discourage employees from talking to people who might distract them from their work, largely because employees do not generally have a problem with being distracted from their work because they are talking to specific people.

I’ve corrected this due to jefftk making the same point.

Really? This seems to totally disregard explanations unrelated to whether someone has “good motives” or “bad motives”, which are not reasons that I would expect MIRI to have at the top of their list justifying whatever their info-sec policy was.

I specifically said that other explanations offered were not convincing. Maybe they would convince you, but they did not convince me.

I want to note that while this does not provide much evidence in the way of the claim that Michael Vassar actively seeks to induce psychotic states in people, it is in fact a claim that Michael Vassar was directly, causally upstream of your psychosis worsening, which is worth considering in light of what this entire post seems to be arguing against.

I did not mean this as an argument against the claim “Michael Vassar seeks to induce psychotic states in people”. I meant his text to Zack as an argument against this claim. It is not perfect evidence, but Scott did not present significant positive evidence for the claim either.

This comment was annoying to engage with because it seemed like you were trying to learn as little new information as possible from the post while finding as many reasons as possible to dismiss the information, despite the post containing lots of information that I would expect people who haven’t been MIRI employees not to know. I think it’s good practice to respond to specific criticisms but this feels like a drag.

• It is massively surprising that the suicide was specifically narratized as being caused by inter-hemisphere conflict and precommitments/​extortion. Precommitments/​extortion are central LessWrong decision theory topics.

It was narratized that way by Ziz, many people having chosen to be skeptical of claims Ziz makes and there was no way to get an independent source.

• Also, confict between hemispheres seems to be an important topic on Ziz’s blog (example).

Precommitments/​extortion are central LessWrong decision theory topics.

Yes, but I have never seem them in context “one hemisphere extorting the other by precommiting to suicide” on LessWrong. That sounds to me uniquely Zizian.

• I appreciate that you took the time to respond to my post in detail. I explained at the top why I had a difficult time engaging productively with your post (i.e. learning from it). I did learn some specific things, such as the claimed sequence of events prior to Maia’s suicide, and Nate’s recent retraction of his earlier public statement on OpenAI. Those are things which are either unambiguously claims about reality, or have legible evidence supporting them.

I mentioned:

1. I did it too.

2. Ziz labeling it as an infohazard is in compliance with feedback Ziz has received from community leaders.

3. People didn’t draw attention to Fluttershy’s recent blog post, even after I posted about it.

None of these carry the same implication that the community, centrally, was engaging in the claimed concealment. This phrasing deflects agency away from the person performing the action, and on to community leaders: “Ziz labeling it as an infohazard is in compliance with feedback Ziz has received from community leaders.” “Ziz labeled something that might have contributed to Maia’s suicide an infohazard, possibly as a result of feedback she got from someone else in the community well before she shared that information with Maia” implies something very different from “Many others have worked to conceal the circumstances of their deaths”, which in context makes it sound like an active conspiracy engaged in by central figures in the community. People not drawing attention to Fluttershy’s post, even after you posted about it, is not active concealment.

> In theory, the problems you experienced could have come from sources other than your professional environment. That is a heck of a missing middle.

Most of the post is an argument that the professional environment contributed quite a lot.

Your original claim included the phrase “the only possible alternative hypothesis”, so this seems totally non-responsive to my problem with it.

> I don’t know what Michael’s views on the subject actually are, but on priors I’m extremely skeptical that the correspondence is sufficient to make this a useful comparison (which, as an appeal to authority, is already on moderately shaky grounds).

Obviously he’s going to disagree with them on specifics, I’m mentioning them as agreeing on the general view Scott attributed to Michael.

Again, this seems non-responsive to what I’m saying is the issue, which is that the “general view” is more or less useless for evaluating how much in “agreement” they really are, as opposed to the specific details.

Not an employee. Did some teaching at CFAR events. Implied they were telling me information about a CFAR employee’s opinion. Even if they’re repeating a rumor, that still implies use of that psychedelic is common in the social circle, even if that doesn’t mean CFAR caused it at all.

That’s good to know, thanks. I think it would make your point here much stronger and more legible if those specific details were included in the original claim.

The distinction I am making is about (a) treating DeepMind as a participant in discourse, who can be convinced by reasoned argument to do better things, or (b) treating DeepMind as a competitor in a race, who can’t be reasoned with, but has to be beaten decisively in technological capacity. It seems natural to me to label (a) as “mistake theory” and (b) as “conflict theory”, and you have a philosophical disagreement here, but this seems like a quibble.

I agree that, as presented, adopting those object-level beliefs would seem to more naturally lend itself to a conflict theory view (vs. mistake theory), but my quibble is with the way you phrased your inference as to what view made most sense to adopt based on the presented facts as a claim that MIRI told you to adopt that view.

The account I give does support this. His assignment to me would not make sense as an argument of his proposition, that “the pieces to make AGI are already out there and someone just has to put them together”, unless he expected that in completing the assignment I would gain evidence for that proposition. For that to happen, it would have to be an AGI design that is workable (in the sense that with more compute and some fine-tuning, it would actually work); non-workable AGI designs are created all the time in the AI field, and provide no significant evidence about the proposition he was asserting.

Here is your summary: “I was given ridiculous statements and assignments including the claim that MIRI already knew about a working AGI design and that it would not be that hard for me to come up with a working AGI design on short notice just by thinking about it, without being given hints.”

Here is the anecdote: “I was told, by Nate Soares, that the pieces to make AGI are likely already out there and someone just has to put them together. He did not tell me anything about how to make such an AGI, on the basis that this would be dangerous. Instead, he encouraged me to figure it out for myself, saying it was within my abilities to do so. Now, I am not exactly bad at thinking about AGI; I had, before working at MIRI, gotten a Master’s degree at Stanford studying machine learning, and I had previously helped write a paper about combining probabilistic programming with machine learning. But figuring out how to create an AGI was and is so far beyond my abilities that this was a completely ridiculous expectation.”

Compare: “MIRI already knew about a working AGI design and that it would not be that hard for me to come up with a working AGI design on short notice just by thinking about it” with “the pieces to make AGI are likely already out there and someone just has to put them together… he encouraged me to figure it out for myself, saying it was within my abilities to do so”. The first is making several much stronger claims than the second (w.r.t. certainty & specific knowledge of a working design, and their belief in how difficult it would be for you to generate one yourself under what timeframe), in a way that makes MIRI/​Nate seem much more unreasonable. If there are specific details omitted from the anecdote that support those stronger claims, I think it would make sense to include them; else we have something resembling a motte & bailey (specific, strong claim, supported by much weaker anecdote which in many possible worlds describes a totally reasonable interaction).

> Scott’s citation of research “showing greater mental modeling and verbal intelligence in relatives of schizophrenics” does not imply that Scott thinks it is a good idea to attempt to induce sub-clinical schizotypal states in people

Yes, I mentioned disagreement about tradeoffs later in the post.

If you agree that Scott & Michael have critical disagreements about the trade-offs in a way that’s relevant to the question at hand—more so than the surface-level agreement Scott’s writing demonstrates—why is this included at all? The implication that one might reasonably be expected to take from “His belief… is also shared by Scott Alexander and many academics.” is, in fact, more false than not.

I believe he was in the chat thread that made the decision, though I’m not sure.

Thanks for clarifying.

Later in the thread I give details about what claims I was actually making. Those claims are broadly consistent with materialist ontology.

Your original claim: “This increases the chance that someone like me could be psychiatrically incarcerated for talking about things that a substantial percentage of the general public (e.g. New Age people and Christians) talk about, and which could be explained in terms that don’t use magical concepts. This is inappropriately enforcing the norms of a minority ideological community as if they were widely accepted professional standards.”

I’m less concerned with the actual claims you were making at the time than with the fact that there is an extremely reasonable explanation for being concerned (in the general case, not in whatever your specific situation was) if a materialist starts taking seriously about auras and demons. That explanation is not that Scott was “enforcing the norms of a minority ideological community”.

It is massively surprising that the suicide was specifically narratized as being caused by inter-hemisphere conflict and precommitments/​extortion. Precommitments/​extortion are central LessWrong decision theory topics.

Why is this surprising? I agree that it would be surprising if it was narrativized that way by detached outside observers, given that they’d have had no particular reason to construct that sort of narrative, but given the intense focus those in question had on those subjects, as well as those close to them, I’m not sure why you’d be surprised that they ascribed the suicide to the novel self-experimentation they were doing immediately prior to the suicide, which led the person who committed suicide to publicly declare ahead of time that would be why they committed suicide, if they did. That seems like a totally reasonable narrative from their perspective. But, again, none of this information was widely known. I’ve read most of Ziz’s blog and even I didn’t know about the specific sequence of events you described. I’m still not surprised, though, because I place very little credence on the idea that Maia Pasek correctly reasoned her way into killing herself, starting from anything resembling a set of values I’d endorse.

I mentioned the specific example that I told a friend about it and then told them not to tell other people. I would guess that Ziz told others who also did this.

Ok, granted, you and (maybe) Ziz do seem to do the things you’re claiming the community at large does, as central rather than non-central behaviors. This does not justify the implication behind “which makes it strong evidence that people are indeed participating in this cover-up”.

Passive distributed conspiracy, directing attention away from critical details, is sufficient to hide important parts of the world from view.

I don’t even know what this is claiming. Who is directing attention away from what details? How deliberate is it? How do you distinguish this world from one where “people don’t talk about suicide because they didn’t know the person and generally don’t find much value in attempting to psychoanalyze people who committed suicide”?

Central people made efforts to convince Ziz that she was likely to be “net negative” due to her willingness to generally reveal information. Ziz’s own moral philosophy may imply that many “non-good” people are net negative. It is unsurprising if ideology influenced people towards suicide. The “being picked off” could come from mostly-unconscious motives.

I’m not following the chain of logic here. Ziz claims that Anna told her she thought Ziz was likely to be net negative (in the context of AI safety research), after Ziz directly asked her if she thought that. Are you claiming that Anna was sufficiently familiar with the details of Ziz’s ontology (which, afaik, she hadn’t even developed in any detail at that point?) to predict that it might tempt Ziz to commit suicide? Because I’m not seeing how else you get from “Anna answered a question that Ziz asked” to “people may be picked off one by one”.

The story of people being unwilling to confirm or deny whether they would investigate if a friend disappeared was very striking at the time, I was surprised how far people would go to avoid investigation.

I agree that the interaction, as described, sounds quite unusual, mostly due to the extended length where they refused to provide a concrete “yes or no” answer. I would be surprised if many of my friends would sincerely promise to investigate if I disappeared (though I can think of at least one who would), but I would be even more surprised if they refused to tell me whether or not they’d be willing to do so, in such a protracted manner.

No. But the magnitude of how bad they are depends on the specific details. I mentioned a blog post that says to imagine being dipped into lava. This was considered a relevant argument about the severity of s-risks.

I predict that if we anonymously polled MIRI researchers (or AI alignment researchers more broadly), very few of them would endorse “thinking about extreme AI torture scenarios [is] part of my job”, if it carries the implication that they also need to think about those scenarios in explicit detail rather than “many 0s, now put a minus sign in front of them”.

It wasn’t specifically suggested. It was implied by the general research landscape presented e.g. Eliezer’s Arbital article.

So it sounds like you agree that to the extent that it was part of your job “to imagine myself in the role of someone who is going to be creating the AI that could make everything literally the worst it could possibly be”, that was an inference you drew (maybe reasonably!) from the environment, rather than someone at MIRI telling you explicitly that it was part of your job (or strongly implying the same)? Again, to be clear, my issues are with the framing that makes it seem like these are things that other people did or said, rather than with whether or not these were locally valid inferences to be drawing.

They shouldn’t expect anyone to believe them if they don’t make persuasive arguments or do something similar. If I don’t agree with them that AI is likely to come soon, why would I do research based on that assumption? Presumably he would prefer me to, if in fact AGI is likely to come soon.

The original claim you were ridiculing was not that “AI is likely to come soon”.

He’s clearly giving “AI timelines are bad for one’s mental health” as a reason why his feelings about AI timelines may be unwise to voice.

Agreed, but what you said is this: “In a recent post, Eliezer Yudkowsky explicitly says that voicing “AGI timelines” is “not great for one’s mental health”, a new additional consideration for suppressing information about timelines.”

I even agree that you could reasonably interpret what he actually said (as opposed to what you said he said) as a reason to avoid discussing AI timelines, but, crucially, there is no reason to believe he intended that statement to be read that way, and it doesn’t support your claim that you were “constantly encouraged to think very carefully about the negative consequences of publishing anything about AI”. He is providing a reason why he himself does not make much of a habit of discussing AI timelines, and that reason is that he is worried about the mental health of others, not that he thinks discussing it pessimizes for timeline outcomes.

Part of our research was decision theory, so talking about decision theory would be one way to talk about the desirability of the security policies. Also, I mentioned Nate and Anna discouraging Michael Vassar from talking to researchers, which fits the pattern of preventing discussion of interactions with MIRI, as he was a notable critic of MIRI at that time.

This doesn’t seem responsive to my objection, which is the comparison to cult abuse tactics. Asking Michael Vassar to not talk to researchers (during work hours?) is very much not the same thing as asking researchers not to talk to each other. I agree that there are situations where asking Michael Vassar not to talk to researchers would have been inappropriate, but without details on the situation I can’t just nod my head and say, “yep, classic cult tactic”, and in any case this is not what you originally provided as evidence(?) of cult-like behavior.

I’ve corrected this due to jefftk making the same point.

Thanks.

I specifically said that other explanations offered were not convincing. Maybe they would convince you, but they did not convince me.

Ok, so this is just a highly specific claim about the internal motivations of other agents that doesn’t have any supporting evidence—not even what unconvincing arguments they offered.

I did not mean this as an argument against the claim “Michael Vassar seeks to induce psychotic states in people”. I meant his text to Zack as an argument against this claim. It is not perfect evidence, but Scott did not present significant positive evidence for the claim either.

Yes, I agree. My understanding of this post is that it’s substantially devoted to rebutting the argument that Michael Vassar was a meaningful contributor to your mental health problems. I think the fact that Michael Vassar directly interacted with you during the relevant timeframe in a way which you yourself think made things worse is notable, in the sense that for most plausible sets of priors, you should probably be updating upwards on the hypothesis that “spending time around Michael Vassar is more likely lead to psychosis than spending time around most other people”, irrespective of his state of knowledge & motivations at the time.

• Not going to respond to all these, a lot seem like nitpicks.

Your original claim included the phrase “the only possible alternative hypothesis”, so this seems totally non-responsive to my problem with it.

My other point was that the problems were still experienced “at MIRI” even if they were caused by other things in the social environment.

That’s good to know, thanks. I think it would make your point here much stronger and more legible if those specific details were included in the original claim.

Edited.

Compare: “MIRI already knew about a working AGI design and that it would not be that hard for me to come up with a working AGI design on short notice just by thinking about it” with “the pieces to make AGI are likely already out there and someone just has to put them together… he encouraged me to figure it out for myself, saying it was within my abilities to do so”. The first is making several much stronger claims than the second (w.r.t. certainty & specific knowledge of a working design, and their belief in how difficult it would be for you to generate one yourself under what timeframe), in a way that makes MIRI/​Nate seem much more unreasonable.

1. Nate implied he had already completed the assignment he was giving me.

2. The assignment wouldn’t provide evidence about whether the pieces to make AGI are already out there unless it was “workable” in the sense that iterative improvement with more compute and theory-light technique iteration would produce AGI.

If you agree that Scott & Michael have critical disagreements about the trade-offs in a way that’s relevant to the question at hand—more so than the surface-level agreement Scott’s writing demonstrates—why is this included at all?

Edited to make it clear that they disagree. The agreement is relevant to place a bound on the scope of what they actually disagree on.

I’m not following the chain of logic here. Ziz claims that Anna told her she thought Ziz was likely to be net negative (in the context of AI safety research), after Ziz directly asked her if she thought that. Are you claiming that Anna was sufficiently familiar with the details of Ziz’s ontology (which, afaik, she hadn’t even developed in any detail at that point?) to predict that it might tempt Ziz to commit suicide?

She might have guessed based on Ziz’s utilitarian futurism (this wouldn’t require knowing many specific details), or might not have been thinking about that consciously. It’s more likely she was trying to control Ziz (she has admitted to generally controlling people around CFAR by e.g. hoarding info). I think my general point is that people are trying to memetically compete with each other in ways that involve labeling others “net negative” in a way that people can very understandably internalize and which would lead to suicide. It’s more like a competition to drive each other insane than one to directly kill each other. A lot of competition (e.g. the kind that would be predicted by evolutionary theory) is subconscious and doesn’t indicate legal responsibility.

Anyway, I edited to make it clearer that many of the influences in question are subconscious and/​or memetic.

I predict that if we anonymously polled MIRI researchers (or AI alignment researchers more broadly), very few of them would endorse “thinking about extreme AI torture scenarios [is] part of my job”, if it carries the implication that they also need to think about those scenarios in explicit detail rather than “many 0s, now put a minus sign in front of them”.

I predict that they would say that having some philosophical thoughts about negative utilitarianism and related considerations would be part of their job, and that AI torture scenarios are relevant to that, although perhaps not something they would specifically need to think about.

So it sounds like you agree that to the extent that it was part of your job “to imagine myself in the role of someone who is going to be creating the AI that could make everything literally the worst it could possibly be”, that was an inference you drew (maybe reasonably!) from the environment, rather than someone at MIRI telling you explicitly that it was part of your job (or strongly implying the same)?

Edited to make this clearer.

The original claim you were ridiculing was not that “AI is likely to come soon”.

They’re highly related, having a working AGI design is an argument for short timelines.

He is providing a reason why he himself does not make much of a habit of discussing AI timelines, and that reason is that he is worried about the mental health of others, not that he thinks discussing it pessimizes for timeline outcomes.

Sure, I mentioned it as a consideration other than the consideration I already mentioned about making AI come sooner.

I think the fact that Michael Vassar directly interacted with you during the relevant timeframe in a way which you yourself think made things worse is notable, in the sense that for most plausible sets of priors, you should probably be updating upwards on the hypothesis that “spending time around Michael Vassar is more likely lead to psychosis than spending time around most other people”, irrespective of his state of knowledge & motivations at the time.

I agree it’s weak evidence for that proposition. However the fact that he gave me useful philosophical advice is evidence against that proposition. In total the public info Scott and I have revealed provides very little directional evidence about this proposition.

• This seems to be (again) drawing a strong correspondence between Michael’s beliefs and actions taken on the basis of those beliefs, and Scott’s beliefs. Scott’s citation of research “showing greater mental modeling and verbal intelligence in relatives of schizophrenics” does not imply that Scott thinks it is a good idea to attempt to induce sub-clinical schizotypal states in people—in fact I would bet a lot of money that Scott thinks doing so is an extremely bad idea, which is a more relevant basis on which to compare his belief’s with Michael’s.

Michael was accussed in the comment thread of the other post that he seeks out people with who are on the schizophrenic spectrum. Michael to the extend that I know seems to believe that those people have “greater mental modeling and verbal intelligence” and that makes them worth spending time with.

Neither my own conversations with him nor any evidence anyone provided show him to believe that’s a good idea to attempt to induce sub-clinical schizotypal states in people.

• There’s a model-fragment that I think is pretty important to understanding what’s happened around Michael Vassar, and Scott Alexander’s criticism.

Helping someone who is having a mental break is hard. It’s difficult for someone to do for a friend. It’s difficult for professionals to do in an institutional setting, and I have tons of anecdotes from friends and acquaintances, both inside and outside the rationality community, of professionals in institutions fucking up in ways that were traumatizing or even abusive. Friends have some natural advantages over institutions: they can provide support in a familiar environment instead of a prison-like environment, and make use of context they have with the person.

When you encounter someone who’s having a mental break or is giving off signs that they’re highly stressed and at risk of a mental break, the incentivized action is to get out of the radius of blame (see Copenhagen Interpretation of Ethics). I think most people do this instinctively. Attempting to help someone through a break is a risky and thankless job; many more people will hear about it if it goes badly than if it goes well. Anyone who does it repeatedly will probably find name attached to a disaster and a mistake they made that sounds easier to avoid than it really was. Nevertheless, I think people should try to help their friends (and sometimes their acquaintances) in those circumstances, and that when we hear how it went, we should adjust our interpretation accordingly.

I’ve seen Michael get involved in a fair number of analogous situations that didn’t become disasters and that no one heard about, and that significantly affects my interpretation, when I hear that he’s been in the blast-radius of situations that did.

I think Scott Alexander looked at some stories (possibly with some rumor-mill distortions added on), and took a “this should be left to professionals” stance. And I think the “this should be left to professionals” stance looks better to him, as a professional who’s worked only in above-average institutions and who can fix problems when he sees them, than it does to people collecting anecdotes from others who’ve been involuntarily committed.

• Status: writing-while-frustrated. As with the last post, many of Jessica’s claims seem to me to be rooted in truth, but weirdly distorted. (Ever since the end of Jessica’s tenure at MIRI, I have perceived a communication barrier between us that has the weird-distortion nature.)

Meta: I continue to be somewhat hesitant to post stuff like this, on the grounds that it sucks to air dirty laundry about your old employer and then have your old employer drop by and criticize everything you said. I’ve asked Jessica whether she objects to me giving a critical reply, and she said she has no objections, so at least we have that. I remain open to suggestions for better ways to navigate these sorts of situations.

Jessica, I continue to be sad about the tough times you had during the end of your tenure at MIRI, and in the times following. I continue to appreciate your research contributions, and to wish you well.

My own recollections follow. Note that these are limited to cases where Jessica cites me personally, in the interest of time. Note also that I’m not entirely sure I’ve correctly identified the conversations she’s referring to, due to the blurring effects of the perceived distortion, and of time. And it goes almost-without-saying that my own recollections are fallible.

As a MIRI employee I was coerced into a frame where I was extremely powerful and likely to by-default cause immense damage with this power, and therefore potentially responsible for astronomical amounts of harm. I was discouraged from engaging with people who had criticisms of this frame, and had reason to fear for my life if I published some criticisms of it.

My own frame indeed says that the present is the hinge of history, and that humans alive today have extreme ability to affect the course of the future, and that this is especially true of humans working in the AI space, broadly construed. I wouldn’t personally call this “power”—I don’t think anyone in the space has all that much power in the present. I think a bunch of people in the AI-o-sphere have a lowish likelihood of a large amount of future power, and thus high future expected power, which is kinda like power. From my perspective, MIRI researchers have less of this than researchers from current top AI labs, but do have a decent amount. My own model does not predict that MIRI researchers are likely to cause astronomical harm by default. I do not personally adhere to the copenhagen interpretation of ethics, and in the event that humanity destroys itself, I would not be assigning extra blame to alignment researchers on the grounds that they were closer to the action. I’m also not personally very interested in the game of pre-assigning blame, favoring object-level alignment research instead.

Insofar as I was influencing Jessica with my own frame, my best guess is that she misunderstood my frame, as evidenced by these differences between the frame she describes feeling coerced into, and my own picture.

I don’t recall ever discouraging Jessica from engaging with people who had criticisms of my frame. I readily admit that she was talking to folks I had little intellectual respect for, and I vaguely remember some of these people coming up in conversation and me noting that I lacked intellectual respect for them. To the best of my recollection, in all such instances, I added caveats of the form “but, just because I wouldn’t doesn’t mean you shouldn’t”. I readily admit that my openness about my lack of intellectual respect may have been taken as discouragement, especially given my position as her employer. The aforementioned caveats were intended to counteract such forces, at least insofar as I recall.

I was aware at the time that Jessica and I didn’t see eye-to-eye on various issues. I remember at least two occasions where I attempted to explicitly convey that I knew we didn’t see eye-to-eye, that it was OK for her to have views that didn’t match mine, and that I encouraged her to think independently and develop her own views.

Jessica said she felt coerced into a frame she found uncomfortable, and I believe her. My notes here are not intended to cast doubt on the honesty of her reports. My intent in saying all this is merely to express that (1) the frame she reports feeling coerced into, is not one that I recognize, nevermind one that I intentionally coerced her into; and (2) I was aware of the pressures and actively tried to counteract them. Clearly, I failed at this. (And I have a decent chunk of probability mass that Jessica would clarify that she’s not accusing me of intentional coercion.) From my own perspective, she was misreading my own frame and feeling pressured into it despite significant efforts on my part to ameliorate the pressure. I happily solicit advice for what to do better next time, but do not consider my comport to have been a mistake.

talked about hellscapes

I don’t recall ever “talking about hellscapes” per se. I recall mentioning them in passing, rarely. In my recollection, that mainly happened in response to someone else broaching the topic of fates worse than death. (Maybe there were other occasional throwaway references? But I don’t recall them.) My cached reply to others raising the idea of fates worse than death went something like:

“Goal-space is high dimensional, and almost all directions of optimization seem likely to be comparably bad to death from our perspective. To get something that is even vaguely recognizable to human values you have to be hitting a very narrow target in this high-dimensional space. Now, most of that target is plausibly dystopias as opposed to eutopias, because once you’re in the neighborhood, there are a lot of nearby things that are bad rather than good, and value is fragile. As such, it’s reasonable in principle to worry about civilization getting good enough at aiming AIs that they can hit the target but not the bullseye, and so you might worry that that civilization is more likely to create a hellscape than a eutopia. I personally don’t worry about this myself, because it seems to me that the space is so freaking high dimensional and the target so freaking small, that I find it implausible that a civilization successfully able to point an AI in a human-relevant direction, isn’t also able to hit the bullseye. Like, if you’re already hitting a quarter with an arrowhead on the backside of the moon, I expect you can also hit a dime.”

Two reasons I’d defend mentioning hellscapes in such situations: firstly, to demonstrate that I at least plausibly understood the concern my conversation partner had raised (as a matter of course before making a counterpoint), and secondly, so as to not undermine our friends working on S-risk reduction (a program I support).

My reason for not hesitating to use terms like “hellscapes” rather than more banal and less evocative terms was (to the best of my current recollection) out of a desire to speak freely and frankly, at least behind closed doors (eg, in the privacy of the MIRI research space). At the time, there was a bunch of social pressure around to stop thought experiments that end with the AI escaping and eating the galaxy, and instead use thought experiments about AIs that are trying to vacuum the house and accidentally break a lamp or whatever, and this used to rub me the wrong way. The motivations as I previously understood them were that, if you talk about star-eating rather than lamp-breaking, then none of the old guard AI researchers are ever going to take your field seriously. I thought (and still think) this is basically a bad reason. However, I have since learned a new reason, which is that mentioning large-scale disasters freely and frankly, might trigger psychotic episodes in people predisposed to them. I find this a much more compelling reason to elide the high-valence examples.

(Also, the more banal term “S-risk” hadn’t propagated yet, IIRC.)

Regardless, I have never thought in detail about fates worse than death, never mind discussed fates worse than death in any depth. I have no reason to, and I recommend against it. Me occasionally mentioning something in passing, and Jessica glossing it as “Nate talked about it” (with an implication of depth and regularity), is a fine example of the “weird distortion” I perceive in Jessica’s accounts.

I was told, by Nate Soares, that the pieces to make AGI are likely already out there and someone just has to put them together.

I contest this. According to my best recollection of the conversation that I think Jessica is referring to, she was arguing that AGI will not arrive in our own lifetimes, and seemed unresponsive to my attempts to argue that a confident claim of long timelines requires positive knowledge, at which point I exasperatedly remarked that for all we knew, the allegedly missing AGI insights had already been not only had, but published in the literature, and all that remains is someone figuring out how to assemble them. (cf no one knows what science doesn’t know.) I do not assign particularly high credence to this claim myself, and (IIRC) I was using it rhetorically to test for acknowledgement of the idea that confident long timelines require positive knowledge that we seem to lack.

(This seems like another central example of my throwaway lines becoming weirdly distorted and heavily highlighted in Jessica’s recounting.)

He did not tell me anything about how to make such an AGI, on the basis that this would be dangerous.

Here Jessica seems to be implying that, not only did I positively claim that the pieces of AGI were already out there in the literature, but also that I had personally identified them? I deny that, and I’m not sure what claim I made that Jessica misunderstood in that way. Given the surrounding context where Jessica made this claim, my guess is that it was in the same conversation as the exasperated remark described above, and that the conversation past that point became so desynched that Jessica’s recounting is no longer recognizable to me.

To be clear, I have claimed that AI alignment work is sometimes intertwined with AI capabilities work, and I have claimed that capabilities insights shouldn’t be publicized (as a strong default) on account of the negative externalities. Perhaps I said something along those lines that got distorted into Jessica’s claim?

Instead, he encouraged me to figure it out for myself, saying it was within my abilities to do so.

I occasionally recommend that our researchers periodically (every 6mo or so) open a text file and see if they can write pseudocode for an FAI (ignoring computational constraints, at least for now), to help focus their attention on exactly where they’re confused and ground out their alignment research in things that are literally blocking them from actually writing a flippin’ FAI. I don’t recall ever telling Jessica that I thought she could figure out how to build an AGI herself. I do recall telling her I expected she could benefit from the exercise of attempting to write the pseudocode for an FAI.

If memory serves, this is an exercise I’d been advocating for a couple years before the time period that Jessica’s discussing (and IIRC, I’ve seen Jessica advocate it, or more general variants like “what could you do with a hypercomputer on a thumb drive”, as an exercise to potential hires). One guess as to what’s going on is that I tried to advocate the exercise of pseudocoding an FAI as I had many times before, but used some shorthand for it that I thought would be transparent, in some new conversational context (eg, shortly after MIRI switched to non-disclosure by default), and while Jessica was in some new mental state, and Jessica misunderstood me as advocating figuring out how to build an AGI all on her own while insinuating that I thought she could?

[From the comments, in answer to the query “How did you conclude from Nate Soares saying that that the tools to create agi likely already exist that he wanted people to believe he knew how to construct one?”] Because he asked me to figure it out in a way that implied he already had a solution; the assignment wouldn’t make sense if it were to locate a non-workable AGI design (as many AI researchers have done throughout the history of the field); that wouldn’t at all prove that the pieces to make AGI are already out there. Also, there wouldn’t be much reason to think that his sharing a non-workable AGI design with me would be dangerous.

In light of this, my guess is that Jessica flatly misread my implications here.

To be very explicit: Jessica, I never believed you capable of creating a workable AGI design (using, say, your 2017 mind, unaugmented, in any reasonable amount of time). I also don’t assign particularly high credence to the claim that the insights are already out in the literature waiting to be found (or that they were in 2017). Furthermore, I never intentionally implied that I have myself succeeded at the “pseudocode an FAI” exercise so hard as to have an AGI design. Sorry for the miscommunication.

Researchers were told not to talk to each other about research, on the basis that some people were working on secret projects and would have to say so if they were asked what they were working on.

This suggests a picture of MIRI’s nondisclosure-by-default policies that’s much more top-down than reality, similar to a correction I made on a post by Evan Hubinger a few years ago.

The sequence of events as I recall them was: Various researchers wanted to do some closed research. There was much discussion about how much information was private: Research results? Yes, if the project lead wants privacy. Research directions? Yes, if the project lead wants privacy. What about the participant list for each project? Can each project determine their own secrecy bounds individually, or is revealing who’s working with you defecting against (possibly-hypothetical) projects that don’t want to disclose who they’re working with? etc. etc. I recall at least one convo with a bunch of researchers where, in efforts to get everyone to stop circling privacy questions like moths to a flame and get back to the object level research, I said something to the effect of “come to me if you’re having trouble”.

I separately recall Jessica coming to me afterwards and asking a bunch more questions about who she can ask about what. I recall trying to convey something like “just work on what you want to work on, with whatever privacy level you want, and if someone working on something closed wants you working with them they’ll let you know (perhaps through me, if they want to), and you can bang out details with them as need be”.

The fact that people shouldn’t have to reveal whether they are in fact working on closed research if they don’t want to sounds like the sort of thing that came up in one or both of those conversations, and my guess is that that’s what Jessica’s referring to here. From my perspective, that wasn’t a particularly central point, and the point I recall attempting to drive home was more like “let’s just work on the object-level research and not get all wound up around privacy (especially because all that we’ve changed are the defaults, and you’re still completely welcome to publicize your own research, with my full support, as much as you’d like)”.

Nate Soares also wrote a post discouraging people from talking about the ways they believe others to be acting in bad faith.

According to me, I was not trying to say “you shouldn’t talk about ways you believe others to be acting in bad faith”. I was trying to say “I think y’all are usually mistaken when you’re accusing certain types of other people of acting in bad faith”, plus “accusing people of acting in bad faith [in confrontational and adversarial ways, instead of gently clarifying and confirming first] runs a risk of being self-fulfilling, and also burns a commons, and I’m annoyed by the burned commons”. I think those people are wrong and having negative externalities, not that they’re bad for reporting what they believe.

Note that the sorts of talking the post explicitly pushes back against is arguments of the form “person X is gaining [status|power|prestige] through their actions, therefore they are untrustworthy and have bad intentions”, which I believe to be invalid. Had I predicted Jessica’s particular misread in advance, I would have explicitly noted that I’m completely ok with arguments of the form “given observations X and Y, I have non-trivial probability on the hypothesis that you’re acting in bad faith, which I know is a serious allegation. Are you acting in bad faith? If not, how do you explain observations X and Y?”.

In other words, the thing I object to is not the flat statement of credence on the hypothesis “thus-and-such is acting in bad faith”, it’s the part where the author socially rallies people to war on flimsy pretenses.

In other other words, I both believe that human wonkiness makes many people particularly bad at calculating P(bad-faith|the-evidence) in particular, and recommend being extra charitable and cautious when it feels like that probability is spiking. Separately but relatedly, in the moment that you move from stating your own credence that someone is acting in bad faith, to socially accusing someone of acting in bad faith, my preferred norms require a high degree of explicit justification.

And, to be explicit: I think that most of the people who are acting in bad faith will either say “yes I’m acting in bad faith” when you ask them, or will sneer at you or laugh at you or make fun of you instead, which is just as good. I think a handful of other people are harmful to have around regardless of their intentions, and my guess is that most community decisions about harmful people should revolve around harm rather than intent.

(Indeed, I suspect the community needs to lower, rather than raise, the costs of shunning someone who’s doing a lot of harm. But I reiterate that I think such decisions should center on the harm, not the intent, and as such I continue to support the norm that combative/​adversarial accusations of ill intent require a high degree of justification.)

Nate Soares expressed discontent that Michael Vassar was talking with “his” employees, distracting them from work.

I don’t actually know what conversation this is referring to. I recall a separate instance, not involving Jessica, of a non-researcher spending lots of time in the office hanging out and talking with one of our researchers, and me pulling the researcher aside and asking whether they reflectively endorsed having those conversations or whether they kept getting dragged into them and then found themselves unable to politely leave. (In that case, the researcher said they reflectively endorsed them, and thereafter I left them alone.)

There might have been a time when Michael Arc (nee Vassar) was spending a lot of time talking to Jessica and one other employee, and I said something about how I don’t have much intellectual respect for Michael? I doubt I said this unsolicited, but I definitely would have said it if anyone asked, and I at least vaguely remember something like that happening once or twice. It’s also possible that towards the end of Jessica’s tenure we were trying to have scheduled meetings to see if we could bridge the communications barrier, and it came up naturally in the conversation? But I’m not sure, as (unlike most of the other claims) I don’t concretely recognize this reference.

It should be noted that, as I was nominally Nate’s employee, it is consistent with standard business practices for him to prevent me from talking with people who might distract me from my work during office hours.

I’m confident I did not prevent anyone from talking to anyone. I occasionally pulled people aside and asked them if they felt trapped in a given conversation when someone was loitering in the office having lots of conversations, so that I could rescue them if need be. I occasionally answered honestly, when asked, what I thought about people’s conversation partners. I leave almost all researchers to their own devices (conversational or otherwise) almost all of the time.

In Jessica’s particular case, she was having a lot of difficulty at the workplace, and so I stepped deeper into the management role than I usually do and we spent more time together seeing whether we could iron out our difficulties or whether we would need to part ways. It’s quite plausible that, during one of those conversations, I noted of my own accord that she was spending lots of her office-time deep in conversation with Michael, and that I didn’t personally expect this to help Jessica get back to producing alignment research that passed my research quality bar. But I am confident that, insofar as I did express my concerns, it was limited to an expression of skepticism. I… might have asked Michael to stop frequenting the offices quite so much? But I doubt it, and I have no recollection of such a thing.

I am confident I didn’t ever tell anyone not to talk to someone else, that feels way out-of-line to me. I may well have said things along the lines of “I predict that that conversation will prove fruitless”, which Jessica interpreted as a guess-culture style command? I tend to couch against that interpretation by adding hedges of the form “but I’m not you” or whatever, but perhaps I neglected to, or perhaps it fell on deaf ears?

Or perhaps Jessica’s just saying something along the lines of “I feared that if I kept talking to Michael all day, I’d be fired, and Nate expressing that he didn’t expect those conversations to be productive was tantamount to him saying that if I continued he’d fire me, which was tantamount to him saying that I can’t talk to Michael”? In which case, my prediction is indeed that if she hadn’t left MIRI of her own accord, and her research performance didn’t rebound, at some point I would have fired her on the grounds of poor performance. And in worlds where Jessica kept talking to Michael all the time, I would have guessed that a rebound was somewhat less likely, because I didn’t expect him to provide useful meta-level or object-level insights that lead to downstream alignment progress. But I’m an empiricist, and I would have happily tested my “talking to Michael doesn’t result in Nate-legible research output” hypothesis, after noting my skepticism in advance.

(Also, for the record, “Nate-legible research output” does not mean “research that is useful according to Nate’s own models”. Plenty of MIRI researchers disagree with me and my frames about all sorts of stuff, and I’m happy to have them at MIRI regardless, given that they’ve demonstrated the ability to seriously engage with the problem. I’m looking for something more like a cohesive vision that the researcher themself believes in, not research that necessarily strikes me personally as directly useful.)

MIRI certainly had a substantially conflict-theoretic view of the broad situation, even if not the local situation. I brought up the possibility of convincing DeepMind people to care about AI alignment. MIRI leaders including Eliezer Yudkowsky and Nate Soares told me that this was overly naive, that DeepMind would not stop dangerous research even if good reasons for this could be given.

I contest this. I endorse talking with leadership at leading AI labs, and have done so in the past, and expect to continue doing so in the future.

It’s true that I don’t expect any of the leading labs to slow down or stop soon enough, and it’s true that I think converging beliefs takes a huge time investment. On the mainline I predict that the required investment won’t in fact be paid in the relevant cases. But, as I told Jessica at the time (IIRC), I expect folks at leading AGI labs to be much more sensitive to solutions to the alignment problem, despite the fact that I don’t think you can talk them into giving up public capabilities research in practice. (This might be what she misunderstood as me saying we’d have better luck “competing”? I don’t recall saying any such thing, but I do recall saying that we’d have better luck solving alignment first and persuading second.)

(And for the record, while I think these big labs are making a mistake, it’s a very easy mistake to make: knowing that you’re in the bad Nash equilibrium doesn’t let you teleport to a better one, and it’s at least understandable that each individual capabilities lab thinks that they’re better than the next guy, or that they can’t get the actual top researchers if they implement privacy protocols right out the gate. It’s an important mistake, but not a weird one that requires positing unusual levels of bad faith.)

In case it’s not clear from the above, I don’t have much sympathy for conflict theory in this, and I definitely don’t think in broadly us-vs-them terms about the AGI landscape. And (as I think I said at the time) I endorse learning how to rapidly converge with people. I recommend figuring out how to more rapidly converge with friends before burning the commons of time-spent-converging-with-busy-people-who-have-limited-attention-for-you, but I still endorse figuring it out. I don’t expect it to work, and I think solving the dang alignment problem on the object-level is probably a better way to convince people to do things differently, but also I will cheer on the sidelines as people try to figure out how to get better and faster at converging their beliefs.

There’s no law saying that, when someone’s making a mistake, there’s some way to explain it to them such that suddenly it’s fixed. I think existing capabilities orgs are making mistakes (at the very least, in publishing capabilities advances (though credit where credit is due, various labs are doing better at keeping their cutting-edge results private, at least until somebody else replicates or nearly-replicates them, than they used to be (though to be clear I think we have a long way to go before I stop saying that I believe I see a big mistake))), and deny the implicit inference from “you can’t quickly convince someone with words that they’re making a mistake” to “you must be using conflict theory”.

I was concerned about the linked misleading statement in 2017 and told Nate Soares and others about it, although Nate Soares insisted that it was not a lie, because technically the word “excited” could indicate the magnitude of a feeling rather than the positiveness of it.

That doesn’t seem to me like a good characterization of my views.

My recollection is that, in my conversation about this topic with Jessica, I was trying to convey something more like “Yeah, I’m pretty worried that they’re going to screw lots of things up. And the overt plan to give AGI to everyone is dumb. But also there are a bunch of sane people trying to redirect OpenAI in a saner direction, and I don’t want to immediately sic our entire community on OpenAI and thereby ruin their chances. This whole thing looks real high-variance, and at the very least this is “exciting” in the sense that watching an adventure movie is exciting, even in the parts where the plot is probably about to take a downturn. That said, there’s definitely a sense in which I’m saying things with more positive connotations than I actually feel—like, I do feel some real positive hope here, but I’m writing lopsidedly from those hopes. This is because the blog post is an official MIRI statement about a new AI org on the block, and my sense of politics says that if a new org appears on the block and you think they’re doing some things wrong, then the politic thing to do initially is talk about their good attributes out loud, while trying to help redirect them in private.”

For the record, I think I was not completely crazy to have some hope about OpenAI at the time. As things played out, they wound up pretty friendly to folks from our community, and their new charter is much saner than their original plan. That doesn’t undo the damage of adding a new capabilities shop at that particular moment in that particular way; but there were people trying behind the scenes, that did in real life manage to do something, and so having some advance hope before they played their hands out was a plausible mistake to make, before seeing the actual underwhelming history unfold.

All that said, I do now consider this a mistake, both in terms of my “don’t rock the boat” communications strategy and in terms of how well I thought things might realistically go at OpenAI if things went well there. I have since updated, and appreciate Jessica for being early in pointing out that mistake. I specifically think I was mistaken in making public MIRI blog posts with anything less than full candor.

While someone bullshitting on the public Internet doesn’t automatically imply they lie to their coworkers in-person, I did not and still don’t know where Nate is drawing the line here.

As I said to Jessica at the time (IIRC), one reason I felt (at the time) that the blog post was fine, is that it was an official MIRI-organization announcement. When speaking as an organization, I was (at the time) significantly more Polite and significantly more Politically Correct and significantly less Dour (and less opinionated and more uncertain, etc).

Furthermore, I (wrongly) expected that my post would not be misleading, because I (wrongly) expected my statements made as MIRI-the-org to be transparently statements made as MIRI-the-org, and for such statements to be transparently Polite and Politically Correct, and thus not very informative one way or another. (In case it wasn’t clear, I now think this was a mistake.)

That said, as I told Jessica at the time (IIRC), you can always just ask me whether I’m speaking as MIRI-the-organization or whether I’m speaking as Nate. Similarly, when I’m speaking as Nate-the-person, you can always just ask me about my honesty protocols.

I have since updated against the idea that I should ever speak as MIRI-the-organization, and towards speaking uniformly with full candor as Nate-the-person. I’m not sure I’ll follow this perfectly (I’d at least slip back into politic-speak if I found myself cornered by a journalist), but again, you can always just ask.

• This is quite a small note, but it’s representative of a lot of things that tripped me up in the OP, and might be relevant to the weird distortion:

> Jessica said she felt coerced into a frame she found uncomfortable

I note that Jessica said she was coerced.

I suspect that Nate-dialect tracks meaningful distinctions between whether one feels coerced, whether one has evidence of coercion, whether one has a model of coercive forces which outputs predictions that closely resemble actual events, whether one expects that a poll of one’s peers would return a majority consensus that [what happened] is well-described by the label [coercion], etc.

By default, I would have assumed that Jessica-dialect tracks such distinctions as well, since such distinctions are fairly common in both the rationalsphere and (even moreso) in places like MIRI.

But it’s possible that Jessica was not, with the phrase “I was coerced,” attempting to convey the strong thing that would be meant in Nate-dialect by those words, and was indeed attempting to convey the thing you (automatically? Reflexively?) seem to have translated it to: “I felt coerced; I had an internal experience matching that of being coerced [which is an assertion we generally have a social agreement to take as indisputable, separate from questions of whether or not those feelings were caused by something more-or-less objectively identifiable as coercion].”

I suspect a lot of what you describe as weird distortion has its roots in tiny distinctions like this made by one party but not by the other/​taken for granted by one party but not by the other. That particular example leapt out to me as conspicuous, but I posit many others.

• Thanks for reading closely enough to have detailed responses and trying to correct the record according to your memory. Appreciate that you’re explicitly not trying to disincentivize saying negative things about one’s former employee (a family member of mine was worried about my writing this post on the basis that it would “burn bridges”).

A couple general points:

1. These events happened years ago and no one’s memory is perfect (although our culture has propaganda saying memories are less reliable than they in fact are). E.g. I mis-stated a fact about Maia’s death, that Maia had been on Ziz’s boat, based on filling in the detail from the other details and impressions I had.

2. I can’t know what someone “really means”, I can know what they say and what the most reasonable apparent interpretations are. I could have asked more clarifying questions at the time, but that felt expensive due to the stressful dynamics the post describes.

In terms of more specific points:

(And I have a decent chunk of probability mass that Jessica would clarify that she’s not accusing me of intentional coercion.) From my own perspective, she was misreading my own frame and feeling pressured into it despite significant efforts on my part to ameliorate the pressure. I happily solicit advice for what to do better next time, but do not consider my comport to have been a mistake.

I’m not accusing you of intentional coercion, I think this sort of problem could result as a side effect of e.g. mental processes trying to play along with coalitions while not adequately modeling effects on others. Some of the reasons I’m saying I was coerced are (a) Anna discouraging researchers from talking with Michael, (b) the remote possibility of assassination, (c) the sort of economic coercion that would be expected on priors at most corporations (even if MIRI is different). I think my threat model was pretty wrong at the time which made me more afraid than I actually had to be (due to conservatism); this is in an important sense irrational (and I’ve tried pretty hard to get better at modeling threats realistically since then), although in a way that would be expected to be common in normal college graduates. Given that I was criticizing MIRI’s ideology more than other researchers, my guess is that I was relatively un-coerced by the frame, although it’s in principle possible that I simply disagreed more.

I don’t recall ever “talking about hellscapes” per se. I recall mentioning them in passing, rarely. In my recollection, that mainly happened in response to someone else broaching the topic of fates worse than death. (Maybe there were other occasional throwaway references? But I don’t recall them.)

I’m not semantically distinguishing “mentioning” from “talking about”. I don’t recall asking about fates worse than death when you mentioned them and drew a corresponding graph (showing ~0 utility for low levels of alignment, negative utility for high but not excellent levels of alignment, and positive utility for excellent levels of alignment).

According to my best recollection of the conversation that I think Jessica is referring to, she was arguing that AGI will not arrive in our own lifetimes, and seemed unresponsive to my attempts to argue that a confident claim of long timelines requires positive knowledge, at which point I exasperatedly remarked that for all we knew, the allegedly missing AGI insights had already been not only had, but published in the literature, and all that remains is someone figuring out how to assemble them.

Edited to make it clear you weren’t trying to assign high probability to this proposition. What you said seems more reasonable given this, although given you were also talking about AI coming in the next 20 years I hope you can see why I thought this reflected your belief.

Here Jessica seems to be implying that, not only did I positively claim that the pieces of AGI were already out there in the literature, but also that I had personally identified them? I deny that, and I’m not sure what claim I made that Jessica misunderstood in that way.

Edited to make it clear you didn’t mean this. The reason I drew this as a Gricean implicature is that figuring out how to make an AGI wouldn’t provide evidence that the pieces to make AGI are already out there, unless such an AGI design would work if scaled up /​ iteratively improved in ways that don’t require advanced theory /​ etc.

The sequence of events as I recall them was: Various researchers wanted to do some closed research. There was much discussion about how much information was private: Research results? Yes, if the project lead wants privacy. Research directions? Yes, if the project lead wants privacy. What about the participant list for each project? Can each project determine their own secrecy bounds individually, or is revealing who’s working with you defecting against (possibly-hypothetical) projects that don’t want to disclose who they’re working with? etc. etc. I recall at least one convo with a bunch of researchers where, in efforts to get everyone to stop circling privacy questions like moths to a flame and get back to the object level research, I said something to the effect of “come to me if you’re having trouble”.

Even if the motive came from other researchers I specifically remember hearing about the policy at a meeting in a top-down fashion. I thought the “don’t ask each other about research” policy was bad enough that I complained about it and it might have been changed. It seems that not everyone remembers this policy (although Eliezer in a recent conversation didn’t disagree about this being the policy at some point), but I must have been interpreting something this way because I remember contesting it.

According to me, I was not trying to say “you shouldn’t talk about ways you believe others to be acting in bad faith”. I was trying to say “I think y’all are usually mistaken when you’re accusing certain types of other people of acting in bad faith”, plus “accusing people of acting in bad faith [in confrontational and adversarial ways, instead of gently clarifying and confirming first] runs a risk of being self-fulfilling, and also burns a commons, and I’m annoyed by the burned commons”. I think those people are wrong and having negative externalities, not that they’re bad for reporting what they believe.

I hope you can see why I interpreted the post as making a pragmatic argument, not simply an epistemic argument, against saying others are acting in bad faith:

When criticism turns to attacking the intentions of others, I perceive that to be burning the commons. Communities often have to deal with actors that in fact have ill intentions, and in that case it’s often worth the damage to prevent an even greater exploitation by malicious actors. But damage is damage in either case, and I suspect that young communities are prone to destroying this particular commons based on false premises.

In the context of 2017, I also had a conversation with Anna Salamon where she said our main disagreement was about whether bad faith should be talked about (which implies our main disagreement wasn’t about how common bad faith was).

I don’t actually know what conversation this is referring to. I recall a separate instance, not involving Jessica, of a non-researcher spending lots of time in the office hanging out and talking with one of our researchers, and me pulling the researcher aside and asking whether they reflectively endorsed having those conversations or whether they kept getting dragged into them and then found themselves unable to politely leave. (In that case, the researcher said they reflectively endorsed them, and thereafter I left them alone.)

Edited to say you don’t recall this. I didn’t hear this from you, I head it secondhand perhaps from Michael Vassar, so I don’t at this point have strong reason to think you said this.

There’s no law saying that, when someone’s making a mistake, there’s some way to explain it to them such that suddenly it’s fixed. I think existing capabilities orgs are making mistakes (at the very least, in publishing capabilities advances (though credit where credit is due, various labs are doing better at keeping their cutting-edge results private, at least until somebody else replicates or nearly-replicates them, than they used to be (though to be clear I think we have a long way to go before I stop saying that I believe I see a big mistake))), and deny the implicit inference from “you can’t quickly convince someone with words that they’re making a mistake” to “you must be using conflict theory”.

I agree that “speed at which you can convince someone” is relevant in a mistake theory. Edited to make this clear.

But, as I told Jessica at the time (IIRC), I expect folks at leading AGI labs to be much more sensitive to solutions to the alignment problem, despite the fact that I don’t think you can talk them into giving up public capabilities research in practice. (This might be what she misunderstood as me saying we’d have better luck “competing”? I don’t recall saying any such thing, but I do recall saying that we’d have better luck solving alignment first and persuading second.)

If I recall correctly you were at the time including some AGI capabilities research as part of alignment research (which makes a significant amount of theoretical sense, given that FAI has to pursue convergent instrumental goals). In this case developing an alignment solution before DeepMind develops AGI would be a form of competition. DeepMind people might be more interested in the alignment solution if it comes along with a capabilities boost (I’m not sure whether this consideration was discussed in the specific conversation I’m referring to, but it might have been considered in another conversation, which doesn’t mean it was in any way planned on).

That said, as I told Jessica at the time (IIRC), you can always just ask me whether I’m speaking as MIRI-the-organization or whether I’m speaking as Nate. Similarly, when I’m speaking as Nate-the-person, you can always just ask me about my honesty protocols.

Ok, this helps me disambiguate your honesty policy. If “employees may say things on the MIRI blog that would be very misleading under the assumption that this blog was not the output of MIRI playing politics and being PC and polite” is consistent with MIRI’s policies, it’s good for that to be generally known. In the case of the OpenAI blog post, the post is polite because it gives a misleadingly positive impression.

• (a) Anna discouraging researchers from talking with Michael

...

...I specifically remember hearing about the policy at a meeting in a top-down fashion...it seems that not everyone remembers this policy...I must have been interpreting something this way because I remember contesting it.

...

...I also had a conversation with Anna Salamon where she said our main disagreement was about whether bad faith should be talked about...

Just a note on my own mental state, reading the above:

Given the rather large number of misinterpretations and misrememberings and confusions-of-meaning in this and the previous post, along with Jessica quite badly mischaracterizing what I said twice in a row in a comment thread above, my status on any Jessica-summary (as opposed to directly quoted words) is “that’s probably not what the other person meant, nor what others listening to that person would have interpreted that person to mean.”

By “probably” I literally mean strictly probably, i.e. a greater than 50% chance of misinterpretation, in part because the set of things-Jessica-is-choosing-to-summarize is skewed toward those she found unusually surprising or objectionable.

If I were in Jessica’s shoes, I would by this point be replacing statements like “I had a conversation with Anna Salamon where she said X” with “I had a conversation with Anna Salamon where she said things which I interpreted to mean X” as a matter of general policy, so as not to be misleading-in-expectation to readers.

• Extracting and signal boosting this part from the final blog post linked by Winterford:

One time when I was being sexually assaulted after having explicitly said no, a person with significant martial arts training pinned me to the floor. … name is Storm.

I had not heard this accusation before, and do not know whether it was ever investigated. I don’t think I’ve met Storm, but I’m pretty sure I could match this nickname to the legal name of someone in the East Bay by asking around. Being named as a rapist in the last blog post of someone who later committed suicide is very incriminating, and if this hasn’t been followed up it seems important to do so.

• Disclaimer: I currently work for MIRI in a non-technical capacity, mostly surrounding low-level ops and communications (e.g. I spent much of the COVID times disinfecting mail for MIRI employees). I did not overlap with Jessica and am not speaking on behalf of MIRI.

I’m having a very hard time with the first few thousand words here, for epistemic reasons. It’s fuzzy and vague in ways that leave me feeling confused and sleight-of-handed and motte-bailey’d and 1984′d. I only have the spoons to work through the top 13-point summary at the moment.

I acknowledge here, and will re-acknowledge at the end of this comment, that there is an obvious problem with addressing only a summary; it is quite possible that much of what I have to say about the summary is resolved within the larger text.

But as Jessica notes, many people will only read the summary and it was written with those people in mind. This makes it something of a standalone document, and in my culture would mean that it’s held to a somewhat higher standard of care; there’s a difference between points that are just loosely meant to gesture at longer sections, and points which are known to be [the whole story] for a substantial chunk of the audience. Additionally, it sets the tone for the ten-thousand-or-so words to follow. I sort of fear the impact of the following ten thousand words on people who were just fine with the first three thousand; people whose epistemic immune systems did not boot up right at the start.

”Claim 0″

As a MIRI employee I was coerced into a frame where I was extremely powerful and likely to by-default cause immense damage with this power, and therefore potentially responsible for astronomical amounts of harm. I was discouraged from engaging with people who had criticisms of this frame, and had reason to fear for my life if I published some criticisms of it. Because of this and other important contributing factors, I took this frame more seriously than I ought to have and eventually developed psychotic delusions of, among other things, starting World War 3 and creating hell. Later, I discovered that others in similar situations killed themselves and that there were distributed attempts to cover up the causes of their deaths.

The passive voice throws me. “I was coerced” reads as a pretty strong statement of fact about the universe; “this simply is.” I would have liked to hear, even in a brief summary paragraph, something more like “I was coerced by A, B, and C, via methods that included X, Y, and Z, into a frame where etc,” because I do not yet know whether I should trust Jessica’s assessment of what constitutes coercion.

Ditto “I had reason to fear for my life.” From whom? To what extent? What constitutes [sufficient] reason? Is this a report of fear of being actually murdered? Is this a report of fear of one’s own risk of suicide? To some extent this might be clarified by the followup in Claim 1, but it’s not clear whether Claim 1 covers the whole of that phrase or whether it’s just part of it. In all, the whole intro feels … clickbaity? Maximum-attention-grabbing while being minimum-substantive?

“Claim 1”

Multiple people in the communities I am describing have died of suicide in the past few years. Many others have worked to conceal the circumstances of their deaths due to infohazard concerns. I am concerned that in my case as well, people will not really investigate the circumstances that made my death more likely, and will discourage others from investigating, but will continue to make strong moral judgments about the situation anyway.

The phrase “many others have worked to conceal the circumstances of their deaths due to infohazard concerns” unambiguously implies something like a conspiracy, or at least a deliberate deceptiveness on the part of five-plus (?) people. This is quite a strong claim. It seems to me that it could in fact be true. In worlds where it is true, I think that (even in the summary) it should be straightforward about who it is accusing, and of what. “Many others” should be replaced with a list of names, or at least a magnitude; is it four? Fourteen? Forty? “Worked to conceal” should tell me whether there were lies told, or whether evidence was destroyed, or whether people were bribed or threatened, or whether people simply played things close to the vest.

(In general, all the motte-and-baileys in this should have been crafted to be much less motte-and-bailey-ish, in my opinion. It seems genuinely irresponsible to leave them as vague and as fill-in-the-gaps-with-your-own-preconceptions; virtually-all-interpretations-are-nonzero-defensible as they are. Claims like these should claim things, and they should claim them clearly so that they can later be unambiguously judged true or false.)

“Claim 2”

My official job responsibilities as a researcher at MIRI included thinking seriously about hypothetical scenarios, including the possibility that someone might cause a future artificial intelligence to torture astronomical numbers of people. While we considered such a scenario unlikely, it was considered bad enough if it happened to be relevant to our decision-making framework. My psychotic break in which I imagined myself creating hell was a natural extension of this line of thought.

I don’t know what it means to be “a natural extension of this line of thought.” I don’t think it is the case that >5% of people who are aware of the concept of an AI hellscape become concerned that their actions might directly result in the creation of hell. Some dots need connecting. This is not a criticism I would ordinarily level, were it not for the things I’ve already said above; at some point in the first ten sentences, I smelled something like an attempt to ensnare my mind and my shields went up and now I’m noticing all the things, whereas if I were in a less defensive mood I might not mention this one at all.

“Claim 3”

Scott asserts that Michael Vassar thinks “regular society is infinitely corrupt and conformist and traumatizing”. This is hyperbolic (infinite corruption would leave nothing to steal) but Michael and I do believe that the problems I experienced at MIRI and CFAR were not unique or unusually severe for people in the professional-managerial class. By the law of excluded middle, the only possible alternative hypothesis is that the problems I experienced at MIRI and CFAR were unique or at least unusually severe, significantly worse than companies like Google for employees’ mental well-being.

This section encourages thinking of MIRI and CFAR as a single unit, which I argued against at length on the original post. It’s somewhat like describing “my experiences in Berkeley and Oakland” in a single breath. I do not believe that there are very many of Jessica’s experiences that can’t be clearly separated into “descended from experiences at MIRI, and the responsibility of MIRI and its culture” and “descended from experiences at CFAR, and the responsibility of CFAR and its culture.” I think that distinction is pretty important.

Separately, I feel encouraged to adopt a false dichotomy, in which either [some unspecified combination of experiences, some of which involved MIRI and others of which involved CFAR] were fairly mundane and within the range of normal workplace experiences or they were unique or severely bad. I feel like this dichotomy leaves out all sorts of nuance that’s pretty important, such as how things might be good on one day and bad on another, or interactions with one staff member might be positive while interactions with another are harmful, or experience X might have bothered person A quite a lot while having no impact on person B whatsoever. Invoking the law of the excluded middle does the same thing as using the passive voice does up above—it makes things feel floaty and absolute and unchallengeable, rather than grounded in the mundane realm of “X did Y and it resulted in Z.”

I don’t think there exists a version of Claim 3 which is not fundamentally misleading (whereas Claims 1 and 2 feel like they could have been written in a non-misleading fashion and still communicated what they were trying to communicate).

“Claim 4”

Scott asserts that Michael Vassar thinks people need to “jailbreak” themselves using psychedelics and tough conversations. Michael does not often use the word “jailbreak” but he believes that psychedelics and tough conversations can promote psychological growth. This view is rapidly becoming mainstream, validated by research performed by MAPS and at Johns Hopkins, and FDA approval for psychedelic psychotherapy is widely anticipated in the field.

“Because some instances of [the thing referred to as jailbreaking] are good and growing in popularity, criticisms of [the thing referred to as jailbreaking] are invalid or should at least be treated with suspicion”?

That’s what it seems to me that Claim 4 wants me to believe. In Claim 4′s defense, it may just be responding-in-kind to a similar kind of rounding-off in Scott’s original comment. But if so, it’s the second wrong that doesn’t make right.

It is entirely possible for Scott to have been correctly critical of a thing, and for psychedelics and tough conversations to finally be coming out from under an unfair cloud of suspicion. The whole challenge is figuring out whether X is [the bad thing it looks like] or [the good thing that bears an unfortunate resemblance to the bad thing]. The text of Claim 4 tries to make me forget this fact. It nudges me toward a bucket error in which I have to treat Michael Arc (née Vassar) and MAPS and Johns Hopkins and the FDA all the same. Either they’re all right, and Scott is wrong, or they’re all wrong, and Scott is right. This does not make it easier for me to see and think clearly around whatever-happened-here.

“Claim 5”

I was taking psychedelics before talking extensively with Michael Vassar. From the evidence available to me, including a report from a friend along the lines of “CFAR can’t legally recommend that you try [a specific psychedelic], but...”, I infer that psychedelic use was common in that social circle whether or not there was an endorsement from CFAR. I don’t regret having tried psychedelics. Devi Borg reports that Michael encouraged her to take fewer, not more, drugs; Zack Davis reports that Michael recommended psychedelics to him but he refused.

I do not see why this claim does not simply say: “I was taking psychedelics before talking extensively with Michael Vassar. I don’t regret having tried psychedelics. Devi Borg reports that Michael encouraged her to take fewer, not more, drugs; Zack Davis reports that Michael recommended psychedelics to him but he refused.”

I think it does not simply say that because it wants me to believe (without having to actually demonstrate) that psychedelic use was common in [not clearly defined “social circle” that I suppose is intended to include CFAR leadership].

If there is a defense of “Michael Vassar wasn’t pushing psychedelics,” it seems to me that it can and should be lodged separately from an accusation that “CFAR and its social circle (?) tacitly encouraged psychedelics, or at least included a lot of psychedelic use.”

“Claim 6”

Scott asserts that Michael made people including me paranoid about MIRI/​CFAR and that this contributes to psychosis. Before talking with Michael, I had already had a sense that people around me were acting harmfully towards me and/​or the organization’s mission. Michael and others talked with me about these problems, and I found this a relief.

Nothing wrong with this claim. Did not cause me to feel confused or mentally yanked around.

“Claim 7”

If I hadn’t noticed such harmful behavior, I would not have been fit for my nominal job. MIRI leaders were already privately encouraging me to adopt a kind of conflict theory in which many AI organizations were trying to destroy the world on <20-year timescales and could not be reasoned with about the alignment problem, such that aligned AGI projects including MIRI would have to compete with them.

I think this is the same sort of confusion as in Claim 4. All members of a category are not the same. I do not see how “MIRI leadership encouraged me to view other AI orgs through an adversarial lens” (which is a claim I’m much more suspicious of in light of the previous few hundred words, and how they seem to be trying to hypnotize me) necessarily implies “I would have been doing a bad job if I hadn’t been viewing my colleagues with suspicion and scanning their actions for potential threats to me personally and to the org as a whole.”

Obviously these two are compatible; it is absolutely possible for there to be a single mindset which was both useful for thinking about other AI orgs and also appropriate to turn on one’s own experiences within MIRI, and it’s possible for that very mindset to have been a necessary prerequisite for the type of work one was doing at MIRI.

But there are many, many more worlds in which two separate things are going on, and I do not like that Claim 7 tried to handwave me into not noticing this fact, and into thinking that two very-likely-separate things must obviously be linked (so obviously that the link does not need to be described).

“Claim 8”

MIRI’s information security policies and other forms of local information suppression thus contributed to my psychosis. I was given ridiculous statements and assignments including the claim that MIRI already knew about a working AGI design and that it would not be that hard for me to come up with a working AGI design on short notice just by thinking about it, without being given hints. The information required to judge the necessity of the information security practices was itself hidden by these practices. While psychotic, I was extremely distressed about there being a universal cover-up of things-in-general.

I do not see the “thus.”

On the first layer, it is not clear to me that the infosec policies and information suppression are being accurately described, since I am not at all sure that Jessica and I would use the word “ridiculous” to describe similar things. In my culture, I would have wanted this to say “ridiculous-to-me” rather than the more authoritative, statement-of-fact-sounding “ridiculous [period].”

On the second layer, it is not clear to me that, even if accurately described, this can be said to “contribute to psychosis” by any mechanism other than “they happened around someone who was psychotic, and thus became a part of the psychosis.” I grant that, if one is becoming psychotic, and one is in a professionally-paranoid setting, this will not help. But the claim leaves me with a tinge of “this is MIRI’s fault” and I don’t like having to swallow that tinge without knowing where it came from and whether I can trust the process that created it.

(Here I pause to reiterate two things: first, I am self-awaredly only diving into the summary, and the text likely contains much of the detail I’m seeking. But these summaries are really strong, and really want me to draw certain conclusions, and they don’t “own” their persuasive/​assertive nature. They keep leaning toward “and that’s the way it was” in a way that is not demanded by the constraint of being-a-summary. They could be much less epistemically weaponized and still be short and digestible.

Second, I’m pointing at everything that bothers me in a way that might seem churlish or unreasonable, but the reason I’m doing that is because I started to feel like I was being manipulated, and as a result I switched into “don’t let yourself get manipulated” mode and that caused a lot of little things to leap out. If I were not feeling like I was being manipulated, many of these things would not reach the level of being concerning, and might not even be consciously noticeable. But once I notice a cluster of ten little manipulations, each of which was just below the radar, I raise the sensitivity of the radar.)

“Claim 9”

Scott asserts that the psychosis cluster was a “Vassar-related phenomenon”. There were many memetic and personal influences on my psychosis, a small minority of which were due to Michael Vassar (my present highly-uncertain guess is that, to the extent that assigning causality to individuals makes sense at all, Nate Soares and Eliezer Yudkowsky each individually contributed more to my psychosis than did Michael Vassar, but that structural factors were important in such a way that attributing causality to specific individuals is to some degree nonsensical). Other people (Zack Davis and Devi Borg) who have been psychotic and were talking with Michael significantly commented to say that Michael Vassar was not the main cause. One person (Eric Bruylant) cited his fixation on Michael Vassar as a precipitating factor, but clarified that he had spoken very little with Michael and most of his exposure to Michael was mediated by others who likely introduced their own ideas and agendas.

This is excellent in my culture. This is what I wish the other claims were like.

“Claim 10”

Scott asserts that Michael Vassar treats borderline psychosis as success. A text message from Michael Vassar to Zack Davis confirms that he did not treat my clinical psychosis as a success. His belief that mental states somewhat in the direction of psychosis, such as those had by family members of schizophrenics, are helpful for some forms of intellectual productivity is also shared by Scott Alexander and many academics.

A text message from Michael Arc (née Vassar) to Zack Davis provides marginal evidence that he does not explicitly claim to treat one instance of clinical psychosis as a success.

I believe that Scott’s accusation is serious, and bears the burden of proof; I don’t think that it’s Jessica’s or Michael’s job to prove the accusation false. But nevertheless, a single text message whose context is not described isn’t proof of anything. I cannot update on this, and I don’t like that this claim takes me-updating-on-this for granted, and simply asserts confirmation from what is at best weak evidence.

The sentence beginning with “His belief” is untrustworthy. What is the level of psychosis of “family members of schizophrenics”? Is that a wide range? What spots within the range are being pointed at? Would Michael and Scott agree that they are pointing at the same states, when they both assert that they are “helpful”? Would they mean similar things by “helpful”? Would they be pointing at the same “forms of intellectual productivity”? What “many academics”?

It’s not that I expect a brief summary to answer all these questions. It’s more that, if you can’t say something more clear and less confusing (/​outright misleading) than something like this, then I think you should not include any such sentence at all.

A far better sentence (in my culture) would be something like “To the best of my own ability to understand the positions of both Michael and Scott, they have similar beliefs about which points on the sliding scale between [normal] and [psychotic] are useful, and for what reasons, and furthermore I think that their shared view is reasonably typical of many academics.”

“Claim 11”

Scott asserts that Michael Vassar discourages people from seeking mental health treatment. Some mutual friends tried treating me at home for a week as I was losing sleep and becoming increasingly mentally disorganized before deciding to send me to a psychiatric institution, which was a reasonable decision in retrospect.

These two sentences bear no relation. I think I am intended to be hypnotized into thinking that the second sentence provides circumstantial evidence against the first.

“Claim 12”

Scott asserts that most local psychosis cases were “involved with the Vassarites or Zizians”. At least two former MIRI employees who were not significantly talking with Vassar or Ziz experienced psychosis in the past few years. Also, most or all of the people involved were talking significantly with others such as Anna Salamon (and read and highly regarded Eliezer Yudkowsky’s extensive writing about how to structure one’s mind, and read Scott Alexander’s fiction writing about hell). There are about equally plausible mechanisms by which each of these were likely to contribute to psychosis, so this doesn’t single out Michael Vassar or Ziz.

Again I have the sense that Jessica may be responding to sloppiness with sloppiness, which at least provides the justification of “I didn’t break the peace treaty first, here.” But: it is not the case that all psychoses should be lumped together. It is not the case that two people not-in-contact with Vassar or Ziz falsifies the claim that “most” of the cases were involved with the Vassarites or Zizians. It is not the case that Eliezer and Scott’s writings should be treated as being similar to extensive and intense in-person interactions (though the mention of Anna Salamon does seem like relevant and useful info (though again I would much prefer more of the specifically who and specifically what)).

But nothing after the first sentence actually contradicts the first sentence, and I really really really do not like how it tries to make me think that it did.

“Claim 13”

Scott Alexander asserts that MIRI should have discouraged me from talking about “auras” and “demons” and that such talk should be treated as a “psychiatric emergency”. This increases the chance that someone like me could be psychiatrically incarcerated for talking about things that a substantial percentage of the general public (e.g. New Age people and Christians) talk about, and which could be explained in terms that don’t use magical concepts. This is inappropriately enforcing the norms of a minority ideological community as if they were widely accepted professional standards.

Such talk as a sudden departure from one’s previous norms of speech and thought seems to me to be quite likely to be a psychiatric emergency, in actual fact.

I do not believe that our psychiatric systems are anywhere near perfect; I know of at least two cases of involuntary incarceration that seem to me to have been outrageously unjustified and some fear here seems reasonable. But nevertheless, it does not seem likely to me that someone who can easily explain “What? Oh, sorry—that’s shorthand. The actual non-crazy concept that the shorthand is for is not hard to explain, let me give you the five sentence version—” is at substantial risk for being thrown into a mental institution.

I’m not sure why this section wants me to be really very confident that [the proposed intervention] would not have helped prevent a slide into psychosis which Jessica seems to be largely laying at MIRI’s feet.

I spent something like an hour on this. I don’t know if it is helpful to anyone besides myself, but it is helpful to me. I feel much better equipped to navigate the remainder of this piece without “falling under its spell,” so to speak. I know to be on the lookout for:

• Summaries which may or may not be apt descriptions of the more detailed thing they are trying to summarize (e.g. assertions that something was “ridiculous” or “extensive”)

• Language which is more authoritative, universal, or emphatic than it should be

• Attempts to link things via mere juxtaposition, without spelling out the connection between them

• Umbrella statements that admit of a wide range of interpretations and possibly equivocate between them.

• etc.

The absence of those things, and similar, makes it easier to see and think clearly, at least for me.

Their presence makes it harder to see and think clearly, at least for me.

Where it’s important to see and think clearly, I think it’s extra important to care about that.

Acknowledging one last time: I only dealt with the top-level summary; this is somewhat uncharitable and incomplete and a better version of me would have been able to make it through the whole thing, first. Responding to that top-level summary was the best I could manage with the resources I had at my disposal.

EDIT: I have as of this edit now read half of the larger text, and no, these issues are largely not resolved and are in many places exacerbated. Reading this without “shields up” would cause a person to become quite seriously misled/​confused, such as (for a single representative example) when Jessica begins with:

• a specific individual conveying to her a rumor that members of the precursor org to MIRI (prior to its split into MIRI and CFAR) had years earlier (seriously? jokingly?) discussed assassinating AGI researchers

… and a paragraph later casually refers to these discussions as if their existence is an absolute fact, saying “The obvious alternative hypothesis is that MIRI is not for real, and therefore hypothetical discussions about assassinations were just dramatic posturing.” And furthermore seems to want the reader to just … nod along? … with this being the sort of thing that would reasonably cause a person to fear for their own life, or at least contribute substantially to the development of such a fear/​meaningfully justify such a fear.

I have strong downvoted this post.

• I want to endorse this as a clear and concise elucidation of the concerns I laid out in my comment, which are primarily with the mismatch between what the text seems to want me to believe, vs. what conclusions are actually valid given the available information.

• The passive voice throws me. “I was coerced” reads as a pretty strong statement of fact about the universe; “this simply is.” I would have liked to hear, even in a brief summary paragraph, something more like “I was coerced by A, B, and C, via methods that included X, Y, and Z, into a frame where etc.”

It seems to me that you can’t expect a summary to make the claims as detailed as possible. You don’t criticize scientific papers either because their abstract doesn’t fully prove the claims it makes, that’s for what you have the full article.

• It seems to me that you can’t expect a summary to make the claims as detailed as possible.

Just noting that this was explicitly addressed in a few places in my comment, and I believe I correctly compensated for it/​took this truth into account. “Make the claims in the summary as detailed as possible” is not what I was recommending.

• If you didn’t read the post and are complaining that the short summary didn’t contain the details that the full post contained, then… I don’t know how to respond. It’s equivalent to complaining that the intro paragraph of an essay doesn’t prove each sentence it states.

With respect to the criticism of the post body:

a specific individual conveying to her a rumor that members of the precursor org to MIRI (prior to its split into MIRI and CFAR) had (seriously or unseriously) discussed assassinating AGI researchers … and a paragraph later casually refers to these discussions as if their existence is an absolute fact, saying “The obvious alternative hypothesis is that MIRI is not for real, and therefore hypothetical discussions about assassinations were just dramatic posturing.”

Yes, “this person was wrong/​lying so the rumor was wrong” is an alternative, but I assigned low probability to it (in part due to a subsequent conversation with a MIRI person about this rumor), so it wasn’t the most obvious alternative.

• If you didn’t read the post and are complaining that the short summary didn’t contain the details that the full post contained, then

That is very explicitly a strawman of what I am objecting to. As in: that interpretation is explicitly ruled out, multiple times within my comment, including right up at the very top, and so you reaching for it lands with me as deliberately disingenuous.

What I am objecting to is lots and lots and lots of statements that are crafted to confuse/​mislead (if not straightforwardly deceive).

• Okay, I can respond to the specific intro paragraph talking about this.

But as Jessica notes, many people will only read the summary and it was written with those people in mind. This makes it something of a standalone document, and in my culture would mean that it’s held to a somewhat higher standard of care; there’s a difference between points that are just loosely meant to gesture at longer sections, and points which are known to be [the whole story] for a substantial chunk of the audience. Additionally, it sets the tone for the ten-thousand-or-so words to follow. I sort of fear the impact of the following ten thousand words on people who were just fine with the first three thousand; people whose epistemic immune systems did not boot up right at the start.

I don’t expect people who only read the summary to automatically believe what I’m saying with high confidence. I expect them to believe they have an idea of what I am saying. Once they have this idea, they can decide to investigate or not investigate why I believe these things. If they don’t, they can’t know whether these things are true.

Maybe it messes with people’s immune systems by being misleading… but how could you tell the summary is misleading without reading most of the post? Seems like a circular argument.

• It’s not a circular argument. The summary is misleading in its very structure/​nature, as I have detailed above at great length. It’s misleading independent of the rest of the post.

Upon going further and reading the rest of the post, I confirmed that the problems evinced by the summary, which I stated up-front might have been addressed within the longer piece (so as not to mislead or confuse any readers of my comment), in fact only get worse.

This is not a piece which visibly tries to, or succeeds at, helping people see and think more clearly. It does the exact opposite, in service of ???

I would be tempted to label this a psy-op, if I thought its confusing and manipulative nature was intentional rather than just something you didn’t actively try not to do.

• Here’s an example (Claim 0):

The passive voice throws me. “I was coerced” reads as a pretty strong statement of fact about the universe; “this simply is.” I would have liked to hear, even in a brief summary paragraph, something more like “I was coerced by A, B, and C, via methods that included X, Y, and Z, into a frame where etc.”

The rest of the paragraph says some parts of how I was coerced, e.g. I was discouraged from engaging with critics of the frame and from publishing my own criticisms.

Ditto “I had reason to fear for my life.” From whom? To what extent? What constitutes [sufficient] reason? Is this a report of fear of being actually murdered? Is this a report of fear of one’s own risk of suicide? To some extent this might be clarified by the followup in Claim 1, but it’s not clear whether Claim 1 covers the whole of that phrase or whether it’s just part of it.

If you keep reading you see that I heard about the possibility of assassination. The suicides are also worrying, although the causality on those is unclear.

Maybe this isn’t a particularly strong argument you gave for the summary being misleading. If so I’d want to know which you think are particularly strong so I don’t have to refute a bunch of weak arguments.

• “I’d want to know which arguments you think are particularly strong so I don’t have to refute a bunch of weak ones” is my feeling, here, too.

Would’ve been nice if you’d just stated your claims instead of burying them in 13000 words of meandering, often misleading, not-at-all-upfront-about-epistemic-status insinuation. I’m frustrated because your previous post received exactly this kind of criticism, and that criticism was highly upvoted, and you do not seem to have felt it was worth adjusting your style.

EDIT: A relevant term here is “gish gallop.”

What I am able to gather from the OP is that you believe lots of bad rumors when you hear them, use that already-negative lens to adversarially interpret all subsequent information, get real anxious about it, and … think everyone should know this?

• This is a double bind. If I state the claims in the summary I’m being misleading by not providing details or evidence for them close to the claims themselves. If I don’t then I’m doing a “gish gallop” by embedding claims in the body of the post. The post as a whole has lots of numbered lists that make most of the claims I’m making pretty clear.

• It’s not a double bind, and my foremost hypothesis is now that you are deliberately strawmanning, so as to avoid addressing my real point.

Not only did I highlight two separate entries in your list of thirteen that do the thing properly, I also provided some example partial rewrites of other entries, some of which made them shorter rather than longer.

The point is not that you need to include more and more detail, and it’s disingenuous to pretend that’s what I’m saying. It’s that you need to be less deceptive and misleading. Say what you think you know, clearly and unambiguously, and say why you think you know it, directly and explicitly, instead of flooding the channel with passive voice and confident summaries that obscure the thick layer of interpretation atop the actual observable facts.

• [After writing this comment, I realized that maybe I’m just missing what’s happening altogether, since maybe I read the post in a fairly strongly “sandboxed” way, so I’m failing to empathize with the mental yanks. That said, maybe it has some value.]

FWIW, my sense (not particularly well-founded?) isn’t that jessicata is deliberately strawmanning here, but isn’t getting your point or doesn’t agree.

You write above:

It’s more that, if you can’t say something more clear and less confusing (/​outright misleading) than something like this, then I think you should not include any such sentence at all.

This is sort of mixing multiple things together: there’s the clarity/​confusingness, and then there’s the slant/​misleadingness. These are related, in that one can mislead more easily when one is being unclear/​ambiguous.

You write:

Say what you think you know, clearly and unambiguously, and say why you think you know it, directly and explicitly, instead of flooding the channel with passive voice and confident summaries that obscure the thick layer of interpretation atop the actual observable facts.

Some of your original criticisms read, to me, more like asking a bunch of questions about details (which is a reasonable thing to do; some questions are answered in the post, some not), and then saying that the summary claims are bad for not including those details and instead summarizing them at a higher level. (Other of your criticisms are more directly about misleadingness IMO, e.g. the false dichotomy thing.)

E.g. you write:

The phrase “many others have worked to conceal the circumstances of their deaths due to infohazard concerns” unambiguously implies something like a conspiracy, or at least a deliberate deceptiveness on the part of five-plus (?) people. This is quite a strong claim. It seems to me that it could in fact be true. In worlds where it is true, I think that (even in the summary) it should be straightforward about who it is accusing, and of what. “Many others” should be replaced with a list of names, or at least a magnitude; is it four? Fourteen? Forty? “Worked to conceal” should tell me whether there were lies told, or whether evidence was destroyed, or whether people were bribed or threatened, or whether people simply played things close to the vest.

The phrase “many others have worked to conceal the circumstances of their deaths due to infohazard concerns” seems to me (and I initially read it as) making a straightforward factual claim: many people took actions that prevented the circumstances of their deaths from being known; and they did so because they were worried about something about infohazards. That sentence definitely doesn’t unambiguously imply a conspiracy, since it doesn’t say anything about conspiracy (though the alleged fact it expresses is of course relevant to conspiracy hypotheses). You’re then saying that there was insufficient detail to support this supposed implication. To me it just looks like a reasonable way of compressing a longer list of facts.

I don’t think I’m getting where you’re coming from here. From this:

But the claim leaves me with a tinge of “this is MIRI’s fault” and I don’t like having to swallow that tinge without knowing where it came from and whether I can trust the process that created it.

it sounds like you’re interpreting jessicata’s statements as being aimed at getting you to make a judgement, and also it was sort of working, or it would/​could have been sort of working if you weren’t being vigilant. Like, something about reading that part of the post, led to you having a sense that “this is MIRI’s fault”, and then you noticed that on reflection the sense seemed incorrect. Is that right?

• I definitely have a strong sense reading this post that “those environmental conditions would not cause any problems to me” and I am trying to understand whether this is true or not, and if so, what properties of a person make them susceptible to the problems.

Do you have any perception about that? I wonder things like:

• Would you circa 2010 have been able to guess that you were susceptible to this level of suffering, if put in this kind of environment?

• What proportion of, say, random intellectually curious graduate students do you think would suffer this way if put into this environment?

• Do you have some sense of what psychological attributes made you susceptible, or advice to others about how to be less susceptible?

I have a lot of respect for what I know of you and your work and I’m sorry this happened.

(clarification edit: I have some sympathy for why it could be good to have an intellectual environment like this, so if my comment seems to be implying a perspective of “would it be possible to have it without people suffering”, that’s why.)

• Would you circa 2010 have been able to guess that you were susceptible to this level of suffering, if put in this kind of environment?

I would have had difficulty imagining “this kind of environment”. I would not have guessed that an outcome like this was likely on an outside view, I thought of myself as fairly mentally resilient.

What proportion of, say, random intellectually curious graduate students do you think would suffer this way if put into this environment?

30%? It’s hard to guess, and hard to say what the average severity would be. Some people take their jobs less seriously than others (although, MIRI/​CFAR encouraged people to take their job really seriously, what with the “actually trying” and “heroic responsibility” and all). I think even those who didn’t experience overtly visible mental health problems would still have problems like declining intellectual productivity over time, and mild symptoms e.g. of depression.

Do you have some sense of what psychological attributes made you susceptible, or advice to others about how to be less susceptible?

Not sure, my family had a history of bipolar, and I had the kind of scrupulosity issues that were common in EAs, which probably contributed.

What advice would I give? If you’re deeply embedded in a social scene try finding more than one source of info outside the scene, e.g. friends or strangers, to avoid getting stuck in a frame. If you’re trying to modify your mind, be patient with yourself, don’t try to force large changes; rather, allow yourself to attend to problems with your current approach and let “automatic” processes cause most of the actual behavior change. If you have sleep issues, don’t stress out about sleeping all day, try other activities like taking a walk; if they persist talking to someone outside your friend group (e.g. a therapist) might be a good idea.

• I would very much like to encourage people to not slip into the “MIRI/​CFAR as one homogenous social entity” frame, as detailed in a reply to the earlier post.

I think it’s genuinely misleading and confusion-inducing, and that the kind of evaluation and understanding that Jessica is hoping for (as far as I can tell) will benefit from less indiscriminate lumping-together-of-things rather than more.

Even within each org—someone could spend 100 hours in conversation with one of Julia Galef, Anna Salamon, Val Smith, Pete Michaud, Kenzie Ashkie, or Dan Keys, and accurately describe any of those as “100 hours of close interaction with a central member of CFAR,” and yet those would be W I L D L Y different experiences, well worth disambiguating.

If someone spent 100 hours of close interaction with Julia or Dan or Kenzie, I would expect them to have zero negative effects and to have had a great time.

If someone spent 100 hours of close interaction with Anna or Val or Pete, I would want to make absolutely sure they had lots of resources available to them, just in case (those three being much more head-melty and having a much wider spread of impacts on people).

It’s not only that saying something like “100 hours with CFAR” would provide no useful update without further information, it’s that it makes it seem like there should be an update, while obfuscating the missing crucial facts.

And as someone who’s spent the past six years deeply embedded in first CFAR and then subsequently MIRI, Jessica’s experience greatly disresembles mine. Which in no way implies that her account is false! The point I’m trying to make is, “MIRI/​CFAR” is not (at all) a homogenous thing.

• If someone spent 100 hours of close interaction with Julia or Dan or Kenzie, I would expect them to have zero negative effects and to have had a great time.

If someone spent 100 hours of close interaction with Anna or Val or Pete, I would want to make absolutely sure they had lots of resources available to them just in case (those three being much more head-melty and having a much wider spread of impacts on people)

As a complete outsider who stumbled upon this post and thread, I find it surprising and concerning that there’s anyone at MIRI/​CFAR with whom spending a few weeks might be dangerous, mental-health-wise.

Would “Anna or Val or Pete” (I don’t know who these people are) object to your statement above? If not, I’d hope they’re concerned about how they are negatively affecting people around them and are working to change that. If they have this effect somewhat consistently, then the onus is probably on them to adjust their behavior.

Perhaps some clarification is needed here—unless the intended and likely readers are insiders who will have more context than me.

(Edited to make top quote include more of the original text—per Duncan’s request)

• The OP cites Anna’s comment where she talked about manipulating people.

• Small nitpicky request: would you be willing to edit into your quotation the part that goes “just in case (those three being much more head-melty and having a much wider spread of impacts on people)”?

Its excision changes the meaning of my sentence, into something untrue. Those words were there on purpose, because without them the sentence is misleadingly alarming.

• Fair point! Done.

It is still concerning to me (of course, having read your original comment), but I can see how it may have mislead others who were skimming.

• FWIW, it is concerning to me, too, and was at least a little bit a point of contention between me and each of those three while we were colleagues together at CFAR, and somewhat moreso after I had left. But my intention was not to say “these people are bad” or “these people are casually dangerous.” More “these people are heavy-hitters when it comes to other people’s psychologies, for better and worse.”

• I have a decently strong sense that I would end up suffering from similar mental health issues. I think it has a lot to do with a tendency to Take Ideas Seriously. Or, viewed less charitably, having a memetic immune disorder.

Xrisk and, a term that is new to me, srisk, are both really bad things. They’re also quite plausible. Multiplying how bad they are by how likely they are, I think the rational feeling is some form of terror. (In some sense of the term “rational”. Too much terror of course would get in the way of trying to fix it, and of living a happy life.) It reminds me of how in HPMoR, everyone’s patronus was an animal, because death is too much for a human to bear.

• What proportion of, say, random intellectually curious graduate students do you think would suffer this way if put into this environment?

This seems like the sort of thing that we would have solid data on at this point. Seems like it’d be worth it for eg. MIRI to do an anonymous survey. If the results indicate a lot of suffering, it’d probably be worth having some sort of mental health program, if only for the productivity benefits. Or perhaps this is already being done.

• (Meta: I’m talking about a bunch of stuff re: Jessica’s epistemics out loud that I’d normally consider it a bit bad form to talk about out loud, but Jessicata seems to prefer having the conversation in public laying everything out clearly)

Something I’ve been feeling watching the discussion here I want to comment on (this is a bit off the cuff)

1. I share several commenter’s feedback that Jessica’s account is smuggling in assumptions and making wrong inferences about what people were communicating (or trying to communicate)

2. I think the people responding to this post (and the previous one) are also doing so from a position of defensiveness*, and I don’t think their reactions have been uniformly fair.

For example of #2, while I agreed with the thrust of ‘it seems misleading to leave out Vassar’, I thought Scott A’s comment on the previous post made assumptions about the connection between Vassar and Ziz, and presented those assumptions in an overconfident tone. I also thought Logan’s comment here is doing a move of “focus-handling-their-way-towards understanding” in a way that seems totally legitimate as a way to figure out what’s up, but which doing in public ends up creating a cloud of poetic rhetorical heft that’s hard to argue with, which I think Benquo was correct to call out.

So if I imagine being Jessica, who seems to be earnestly** trying to figure out whether she’s in a trustworthy or healthy environment, and I imagine running the query “do people seem to be unfairly characterizing me and distorting things?”, that query would probably return “obviously yes.”

It’s possible for multiply parties in a debate to (correctly) notice that the other one is unfairly characterizing them /​ portraying them, at least a bit. And if you’re already worried that you’re in and social environment where you have to be on guard for people manipulating you or coercing you, it’s possible to end up in an attractor state where everything (sorta legitimately) looks like evidence of this.

...

*I’m not sure “defensiveness” captures the specific thing up with each of the commenters. Some of them seem more like neutral observers who don’t (themselves) feel under attack by the OPs, but who are picking up on the same distortionary effects that the defensive people are picking up on and responding to.

...

**my current belief is that Jessica is doing a kinda a combination of “presenting things in a frame that looks pretty optimized as a confusing, hard-to-respond-to-social-attack”, which at first looked disingenuous to me. I’ve since seen her respond to a number of comments in a way that looks concretely like it’s trying to figure stuff out, update on new information, etc, without optimizing for maintaining the veiled social attack. My current belief is that there’s still some kind of unconscious veiled social attack going on that Jessica doesn’t have full conscious access to, it looks too optimized to be an accident. But I don’t know that Jessica from within her current epistemic state should agree with me.

...

I think it’s a fairly unsolved problem what Alices and Bobs are supposed to do in this class of situation. Many things that seem robustly, obviously “fair” to me from everyone’s perspective require more effort than it’s necessarily worth.

I think it’s important for people who feel like they are in a coercive environment to be given some leeway for trying to reason about that out loud. . But also I think people who feel coerced are often hypersensitized to things that in fact aren’t a big deal (I think this is true even for ‘legitimately’ coerced people – once you’re paying attention to a pattern that’s threatening you, it makes sense to err on over-identifying it). And I don’t think it’s everyone else’s obligation to go through refuting each thing.

BUT, sometimes someone is correctly picking up on patterns of distortion, and it can take awhile to articulate exactly what’s up with them.

Right now I think the attention-allocation mechanisms we normally rely on on LessWrong are a bit messed up with this class of post, since people seem to vote in an over-the-top way where yay/​boo is more salient.

• **my current belief is that Jessica is doing a kinda a combination of “presenting things in a frame that looks pretty optimized as a confusing, hard-to-respond-to-social-attack”, which at first looked disingenuous to me. I’ve since seen her respond to a number of comments in a way that looks concretely like it’s trying to figure stuff out, update on new information, etc, without optimizing for maintaining the veiled social attack. My current belief is that there’s still some kind of unconscious veiled social attack going on

Seconded (though I think “pretty optimized” is too strong).

that Jessica doesn’t have full conscious access to, it looks too optimized to be an accident. But I don’t know that Jessica from within her current epistemic state should agree with me.

My wild, not well-founded guess is that Jessica does have some conscious access to this and could fairly easily say more about what’s going on with her (and maybe has and I /​ we forgot?), along the lines of “owning” stuff (and that might help people hear her better by making possible resulting conflicts and anti-epistemology more available to talk about). I wonder if Jessica is in /​ views herself as in a conflict, such that saying that would be, or intuitively seem like, a self-harming or anti-epistemic thing to do. As a hypothetical example, saying “I’m hurt and angry and I don’t want to misdirect my anger and I want to figure things out and get the facts straight and I do want to direct my anger to correct targets” without further comment would to some real or imagined listeners sort of sound like “yeah don’t take anything I’m saying seriously, I’m just slinging mud because I’m mad, go ahead and believe the story that minimizes fault rather than what’s most consistent with the facts”. In other words, “owning one’s experience” is bucketed with failing to stand one’s ground /​ hold a perspective? (Not sure in which person/​s the supposed bucketing is happening.)

• For some context:

• I got a lot of the material for this by trying to explain what I experienced to a “normal” person who wasn’t part of the scene while feeling free to be emotionally expressive (e.g. by screaming). Afterwards I found a new “voice” to talk about the problems in an annoyed way. I think this was really good for healing trauma and recovering memories.

• I have a political motive to prevent Michael from being singled out as the person who caused my psychosis since he’s my friend. I in fact don’t think he was a primary cause, so this isn’t inherently anti-epistemic, but it likely caused me to write in a more lawyer-y fashion than I otherwise would. (Michael definitely didn’t prompt me to write the first draft of the document, and only wrote a few comments on the post.)

• I’ve been working on this document for 1-2 weeks and doing a rolling release where I add more people to the document, it’s been somewhat stressful getting the memories/​interpretations into written form without making false/​misleading/​indefensible statements along the way, or unnecessarily harming the reputations of people I care about the reputations of.

• Some other people helped me edit this. I included some text from them without checking that it was as rigorous as the rest of the text I wrote. I think they made a different tradeoff than me in terms of saying “strong” statements that were less provable and potentially more accusatory, e.g. the one-paragraph overall summary was something I couldn’t have written myself because even as my co-writer was saying it out loud, I was having trouble tracking the semantics and thought it was for trauma-related reasons. I included the paragraph partially since it seemed true (even if hard-to-prove) on reflection and I was inhibited from myself writing a similar paragraph.

• In future posts I’ll rewrite more inclusions in my own words, since that way I can filter better for things I think I can actually rhetorically defend if pressed.

• I originally wrote the post without the summary of core claims, a friend/​co-editor pointed out that a lot of people wouldn’t read the whole thing, and it would be easier to follow with a summary, so I added it.

• Raemon is right that I think people are being overly defensive and finding excuses to reject the information. Overall it seems like people are paying much, much more attention to the quality of my rhetoric than the subject matter the post is about, and that seems like a large problem for improving the problems I’m describing. I wrote the following tweet about it: “Suppose you publish a criticism of X movement/​organization/​etc. The people closest to the center are the most likely to defensively reject the information. People far away are unlikely to understand or care about X. It’s people at middle distance who appreciate it the most.” In fact multiple people somewhat distant from the scene have said they really liked my post, one said he found it helpful for having a more healthy relationship with EA and rationality.

• Overall it seems like people are paying much, much more attention to the quality of my rhetoric than the subject matter the post is about

Just to be clear, I’m paying attention to the quality of your rhetoric because I cannot tell what the subject matter is supposed to be.

Upon being unable to actually distill out a set of clear claims, I fell back onto “okay, well, what sorts of conclusions would I be likely to draw if I just drank this all in trustingly/​unquestioningly/​uncritically?”

Like, “observe the result, and then assume (as a working hypothesis, held lightly) that the result is what was intended.”

And then, once I had that, I went looking to see whether it was justified/​whether the post presented any actual reasons for me to believe what it left sandbox-Duncan believing, and found that the answer was basically “no.”

Which seems like a problem, for something that’s 13000 words long and that multiple people apparently put a lot of effort into. 13000 words on LessWrong should not, in my opinion, have the properties of:

a) not having a discernible thesis

b) leaving a clear impression on the reader

c) that impression, upon revisit/​explicit evaluation, seeming really quite false

I think it’s actually quite good that you felt called to defend your friend Michael, who (after reading) does in fact seem to me to be basically innocent w/​r/​t your episode. I think you could have “cleared Michael of all charges” in a much more direct and simple post that would have been compelling to me and others looking in; it seems like that’s maybe 15 of what’s going on in the above, and it’s scattered throughout, piecemeal. I’m not sure what you think the other 45 is doing, or why you wanted it there.

(I mean this straightforwardly—that I am not sure. I don’t mean it in an attacky fashion, like me not being sure implies that there’s no good reason or whatever.)

getting the memories/​interpretations into written form without making false/​misleading/​indefensible statements along the way

I believe and appreciate the fact that you were attending to and cared about this, separate from the fact that I unfortunately do not believe you succeeded.

EDIT: The reason I leave this comment is because I sense a sort of … trend toward treating the critique as if it’s missing the point? And while it might be missing Jessica’s point, I do not think that the critique was about trivial matters while not attending to serious ones. I think the critique was very much centered on Stuff I Think Actually Matters.

FWIW, I consider myself to be precisely one of those “middle distance” people. I wasn’t around you during this time at MIRI, I’m not particularly invested in defending MIRI (except to the degree which I believe MIRI is innocent in any given situation and therefore am actually invested in Defending An Innocent Party, which is different), and I have a very mixed-bag relationship with EA and rationality; solid criticisms of the EA/​LW/​rationality sphere usually find me quite sympathetic. I’m looking in as someone who would have appreciated a good clear post highlighting things worth feeling concern over. I really wished this and its precursor were e.g. tight analogues to the Zoe post, which I similarly appreciated as a middle distancer.

• What, if any, are your (major) political motives regarding MIRI/​CFAR/​similar?

• I really liked MIRI/​CFAR during 2015-2016 (even though I had lots of criticisms), I think I benefited a lot overall, I think things got bad in 2017 and haven’t recovered. E.g. MIRI has had many fewer good publications since 2017 and for reasons I’ve expressed, I don’t believe their private research is comparably good to their previous public research. (Maybe to some extent I got disillusioned so I’m overestimating how much things changed, I’m not entirely sure how to disentangle.)

As revealed in my posts, I was a “dissident” during 2017 and confusedly/​fearfully trying to learn and share critiques, gather people into a splinter group, etc, so there’s somewhat of a legacy of a past conflict affecting the present, although it’s obviously less intense now, especially after I can write about it.

I’ve noticed people trying to “center” everything around MIRI, justifying their actions in terms of “helping MIRI” etc (one LW mod told me and others in 2018 that LessWrong was primarily a recruiting funnel for MIRI, not a rationality promotion website, and someone else who was in the scene 2016-2017 corroborated that this is a common opinion), and I think this is pretty bad since they have no way of checking how useful MIRI’s work is, and there’s a market for lemons (compare EA arguments against donating to even “reputable” organizations like UNICEF). It resembles idol worship and that’s disappointing.

This is corroborated by some other former MIRI employees, e.g. someone who left sometime in the past 2 years who agreed with someone else’s characterization that MIRI was acting against its original mission.

I think lots of individuals at MIRI are intellectually productive and/​or high-potential but pretty confused about a lot of things. I don’t currently see a more efficient way to communicate with them than by writing things on the Internet.

I have a long-standing disagreement about AI timelines (I wrote a post saying people are grossly distorting things, which I believe and think is important partially due to the content of my recent posts about my experiences; Anna commented that the post was written in a “triggered” mind state which seems pretty likely given the traumatic events I’ve described). I think lots of people are getting freaked out about the world ending soon and this is wrong and bad for their health. It’s like in Wild Wild Country where the leader becomes increasingly isolated and starts making nearer-term doom predictions while the second-in-command becomes the de-facto social leader (this isn’t an exact analogy and I would be inhibited from making it except that I’m specifically being asked about my political motives, I’m not saying I have a good argument for this).

I still think AI risk is a problem in the long term but I have a broader idea of what “AI alignment research” is, e.g. it includes things that would fall under philosophy/​the humanities. I think the problem is really hard and people have to think in inter-disciplinary ways to actually come close to solving it (or to get one of the best achievable partial solutions). I think MIRI is drawing attention to a lot of the difficulties with the problem and that’s good even if I don’t think they can solve it.

Someone I know pointed out that Eliezer’s model might indicate that the AI alignment field has been overall net negative due to it sparking OpenAI and due to MIRI currently having no good plans. If that’s true it seems like a large change in the overall AI safety/​x-risk space would be warranted.

My friends and I have been talking with Anna Salamon (head of CFAR) more over this year, she’s been talking about a lot of the problems that have happened historically and how she intends to do different things going forward, and that seems like a good sign but she isn’t past the threshold of willing+able she would need to be to fix the scene herself.

I’m somewhat worried about criticizing these orgs too hard because I want to maintain relations with people in my previous social network, because I don’t actually think they’re especially bad, because my org (mediangroup.org) has previously gotten funding from a re-granting organization whose representative told me that my org is more likely to get funding if I write fewer “accusatory” blog posts (although, I’m not sure if I believe them about this at this time, maybe writing critiques causes people to think I’m more important and fund me more?), because it might spark “retaliation” (which need not be illegal, e.g. maybe people just criticize me a bunch in a way that’s embarrassing, or give me less money). I feel weird criticizing orgs that were as good for my career as they were even though that doesn’t make that much sense from an ethical perspective.

I very much don’t think the central orgs can accomplish their goals if they can’t learn from criticisms. A lot of the time I’m more comfortable in rat-adjacent/​postrat-ish/​non-rationalist spaces than central rationalist spaces because they are less enamored by the ideology and the central institutions. It’s easier to just attend a party and say lots of weird-but-potentially-revelatory things without getting caught in a bunch of defensiveness related to the history of the scene. One issue with these alternative social settings is that a lot of these people think it’s normal to take ideas less seriously in general so they think e.g. I’m only speaking out about problems because I have a high level of “autism” and it’s too much to expect people to tell the truth when their rent stream depends on them not acknowledging it. I understand how someone could come to this perspective but it seems somewhat of a figure-ground inversion that normalizes parasitic behavior.

• I found this a very helpful and useful comment, and resonate with various bits of it (I also think I disagree with a good chunk of it, but a lot of it seems right overall).

• I’m curious which parts resonate most with you (I’d ordinarily not ask this because it would seem rude, but I’m in a revealing-political-motives mood and figure the actual amount of pressure is pretty low).

• I share the sense that something pretty substantial changed with MIRI in ~2017 and that something important got lost when that happened. I share some of the sense that people’s thinking about timelines is confused, though I do think overall pretty short timelines are justified (though mine are on the longer end of what MIRI people tend to think, though much shorter than yours, IIRC). I think you are saying some important things about the funding landscape, and have been pretty sad about the dynamics here as well, though I think the actual situation is pretty messy and some funders are really quite pro-critique, and some others seem to me to be much more optimizing for something like the brand of the EA-coalition.

• I feel like this topic may deserve a top-level post (rather than an N-th level comment here).

EDIT: I specifically meant the “MIRI in ~2017” topic, although I am generally in favor of extracting all other topics from Jessica’s post in a way that would be easier for me to read.

• Thanks, this is great (I mean, it clarifies a lot for me).

• (This is helpful context, thanks.)

• Like the previous post, there’s something weird about the framing here that makes me suspicious of this. It feels like certain perspectives are being “smuggled in”—for example:

Scott asserts that Michael Vassar thinks “regular society is infinitely corrupt and conformist and traumatizing”. This is hyperbolic (infinite corruption would leave nothing to steal) but Michael and I do believe that the problems I experienced at MIRI and CFAR were not unique or unusually severe for people in the professional-managerial class. By the law of excluded middle, the only possible alternative hypothesis is that the problems I experienced at MIRI and CFAR were unique or at least unusually severe, significantly worse than companies like Google for employees’ mental well-being.

This looks like a logical claim at first glance—of course the only options are “the problems weren’t unique or severe” or “the problems were unique and severe”—but posing the matter this way conflates problems that you had as an individual (“the problems I experienced”) with problems with the broader organization (“significantly worse… for employees’ well-being”), which I do not think have been adequately established to exist.

I think this is improper because it jumps from problems you had as an individual to problems that applied to the well-being of employees as a whole without having proved that this was the case. In other words, it feels like this argument is trying to smuggle in the premise that the problems you experienced were also problems for a broader group of employees as a whole, which I think has not properly been established.

Another perspective—and one which your framing seems to exclude—would be that your experience was unusually severe, but that the unique or unusual element had to do with personal characteristics of yours, particular conflicts or interactions you-in-particular had with others in the organization, or similar.

Similarly, you seem to partially conflate the actions of Ziz, who I consider an outright enemy of the community, with actions of “mainstream” community leaders. This does not strike me as a very honest way to engage.

• Looking over this again and thinking for a few minutes, I see why (a) the claim isn’t technically false, and (b) it’s nonetheless confusing.

Why (a): Let’s just take a fragment of the claim: “the problems I experienced at MIRI and CFAR were not unique or unusually severe for people in the professional-managerial class. By the law of excluded middle, the only possible alternative hypothesis is that the problems I experienced at MIRI and CFAR were unique or at least unusually severe”.

This is straightforwardly true: either , or . Where is “how severe were the problems I experienced at MIRI and CFAR were” and is “how severe the problems for people in the professional-managerial class generally are”.

Why (b): in context it’s followed by a claim about regular society being infinitely corrupt etc; that would require to be above some absolute threshold, . So it looks like I’m asserting the disjunction , which isn’t tautological. So there’s a misleading Gricean implicature.

I’ll edit to make this clearer.

Similarly, you seem to partially conflate the actions of Ziz, who I consider an outright enemy of the community, with actions of “mainstream” community leaders. This does not strike me as a very honest way to engage.

In the previous post I said Ziz formed a “splinter group”, in this post I said Ziz was “marginal” and has a “negative reputation among central Berkeley rationalists”.

• Thanks for this.

I’ve been trying to research and write something kind of like this giving more information for a while, but got distracted by other things. I’m still going to try to finish it soon.

While I disagree with Jessica’s interpretations of a lot of things, I generally agree with her facts (about the Vassar stuff which I have been researching; I know nothing about the climate at MIRI). I think this post gives most of the relevant information mine would give. I agree with (my model of) Jessica that proximity to Michael’s ideas (and psychedelics) was not the single unique cause of her problems but may have contributed.

The main thing I’d fight if I felt fighty right now is the claim that by not listening to talk about demons and auras MIRI (or by extension me, who endorsed MIRI’s decision) is impinging on her free speech. I don’t think she should face legal sanction for talking about this these, but I also don’t think other people were under any obligation to take it seriously, including if she was using these terms metaphorically but they disagree with her metaphors or think she wasn’t quite being metaphorical enough.

• The main thing I’d fight if I felt fighty right now is the claim that by not listening to talk about demons and auras MIRI (or by extension me, who endorsed MIRI’s decision) is impinging on her free speech.

You wrote that talking about auras and demons the way Jessica did while at MIRI should be considered a psychiatric emergency. When done by a practicing psychiatrist this is an impingement on Jessica’s free speech. You wrote this in response to a post that contained the following and only the following mentions of demons or auras:

1. During this time, I was intensely scrupulous; I believed that I was intrinsically evil, had destroyed significant parts of the world with my demonic powers, and was in a hell of my own creation. [after Jessica had left MIRI]

2. I heard that the paranoid person in question was concerned about a demon inside him, implanted by another person, trying to escape. [description of what someone else said]

3. The weirdest part of the events recounted is the concern about possibly-demonic mental subprocesses being implanted by other people. [description of Zoe’s post]

4. As weird as the situation got, with people being afraid of demonic subprocesses being implanted by other people, there were also psychotic breaks involving demonic subprocess narratives around MIRI and CFAR. [description of what other people said, and possibly an allusion to the facts described in the first quote, after she had left MIRI]

5. While in Leverage the possibility of subtle psychological influence between people was discussed relatively openly, around MIRI/​CFAR it was discussed covertly, with people being told they were crazy for believing it might be possible. (I noted at the time that there might be a sense in which different people have “auras” in a way that is not less inherently rigorous than the way in which different people have “charisma”, and I feared this type of comment would cause people to say I was crazy.)

Only the last one is a description of a thing Jessica herself said while working at MIRI. Like Jessica when she worked at MIRI, I too believe that people experiencing psychotic breaks sometimes talk about demons. Like Jessica when she worked at MIRI, I too believe that auras are not obviously less real than charisma. Am I experiencing a psychiatric emergency?

• You wrote that talking about auras and demons the way Jessica did while at MIRI should be considered a psychiatric emergency. When done by a practicing psychiatrist this is an impingement on Jessica’s free speech.

I don’t think I said any talk of auras should be a psychiatric emergency, otherwise we’d have to commit half of Berkeley. I said that “in the context of her being borderline psychotic” ie including this symptom, they should have “[told] her to seek normal medical treatment”. Suggesting that someone seek normal medical treatment is pretty different from saying this is a psychiatric emergency, and hardly an “impingement” on free speech. I’m kind of playing this in easy mode here because in hindsight we know Jessica ended up needing treatment, I feel like this makes it pretty hard to make it sound sinister when I suggest this.

You wrote this in response to a post that contained the following and only the following mentions of demons or auras:

“During this time, I was intensely scrupulous; I believed that I was intrinsically evil, had destroyed significant parts of the world with my demonic powers, and was in a hell of my own creation...” [followed by several more things along these lines]

Yes? That actually sounds pretty bad to me. If I ever go around saying that I have destroyed significant parts of the world with my demonic powers, you have my permission to ask me if maybe I should seek psychiatric treatment. If you say “Oh yes, Scott, that’s a completely normal and correct thing to think, I am validating you and hope you go deeper into that”, then once I get better I’ll accuse you of being a bad friend. Jessica’s doing the opposite and accusing MIRI of being a bad workplace for not validating and reinforcing her in this!

I think what we all later learned about Leverage confirms all this. Leverage did the thing Jessica wanted MIRI to do told everyone ex cathedra that demons were real and they were right to be afraid of them, and so they got an epidemic of mass hysteria that sounds straight out of a medieval nunnery. People were getting all sorts of weird psychosomatic symptoms, and one of the commenters said their group house exploded when one member accused another member of being possessed by demons, refused to talk or communicate with them in case the demons spread, and the “possessed” had to move out. People felt traumatized, relationships were destroyed, it sounded awful.

MIRI is under no obligation to validate and signal-boost tolerate individual employees’ belief in demons, including some sort of metaphorical demons. In fact, I think they’re under a mild obligation not to, as part of their role as ~leader-ish in a rationalist community. They’re under an obligation to model good epistemics for the rest of us and avoid more Leverage-type mass hysterias.

One of my heroes is this guy:

https://​​www.youtube.com/​​watch?v=Bmo1a-bimAM

Surinder Sharma, an Indian mystic, claimed to be able to kill people with a voodoo curse. He was pretty convincing and lots of people were legitimately scared. Sanal Edamaruku, president of the Indian Rationalist Organization, challenged Sharma to kill him. Since this is the 21st century and capitalism is amazing, they decided to do the whole death curse on live TV. Sharma sprinkled water and chanted magic words around Edamaruku. According to Wikipedia, “the challenge ended after several hours, with Edamaruku surviving unharmed”.

If Leverage had a few more Sanal Edamarukus, a lot of people would have avoided a pretty weird time.

I think the best response MIRI could have had to all this would have been for Nate Soares to challenge Geoff Anders to infect him with a demon on life TV, then walk out unharmed and laugh. I think the second-best was the one they actually did.

EDIT: I think I misunderstood parts of this, see below comments.

• I said that “in the context of her being borderline psychotic” ie including this symptom, they should have “[told] her to seek normal medical treatment”. Suggesting that someone seek normal medical treatment is pretty different from saying this is a psychiatric emergency, and hardly an “impingement” on free speech.

It seems like you’re trying to walk back your previous claim, which did use the “psychiatric emergency” term:

Jessica is accusing MIRI of being insufficiently supportive to her by not taking her talk about demons and auras seriously when she was borderline psychotic, and comparing this to Leverage, who she thinks did a better job by promoting an environment where people accepted these ideas. I think MIRI was correct to be concerned and (reading between the lines) telling her to seek normal medical treatment, instead of telling her that demons were real and she was right to worry about them, and I think her disagreement with this is coming from a belief that psychosis is potentially a form of useful creative learning. While I don’t want to assert that I am 100% sure this can never be true, I think it’s true rarely enough, and with enough downside risk, that treating it as a psychiatric emergency is warranted.

Reading again, maybe by “it” in the last sentence you meant “psychosis” not “talking about auras and demons”? Even if that’s what you meant I hope you can see why I interpreted it the way I did?

(Note, I do not think I would have been diagnosed with psychosis if I had talked to a psychiatrist during the time I was still at MIRI, although it’s hard to be certain and it’s hard to prove anyway.)

Yes? That actually sounds pretty bad to me. If I ever go around saying that I have destroyed significant parts of the world with my demonic powers, you have my permission to ask me if maybe I should seek psychiatric treatment.

This is while I was already in the middle of a psychotic break and in a hospital. Obviously we would agree that I needed psychiatric treatment at this point.

MIRI is under no obligation to validate and signal-boost individual employees’ belief in demons, including some sort of metaphorical demons.

“Validating and signal boosting” is not at all what I would want! I would want rational discussion and evaluation. The example you give at the end of challenging Geoff Anders on TV would be an example of rational evaluation.

(I definitely don’t think Leverage handled this optimally, and that the sort of test you describe would have been good for them to do more of; I’m pointing to their lower rate of psychiatric incarceration as a point in favor of what they did, relatively speaking.)

What would a rational discussion of the claim Ben and I agree on (“auras are not obviously less real than charisma”) look like? One thing to do would be to see how much inter-rater agreement there is among aura-readers and charisma-readers, respectively, to see whether there is any perceivable feature being described at all. Another would be to see how predictive each rating is of other measurable phenomena (e.g. maybe “aura theory” predicts that people with “small auras” will allow themselves to be talked over by people with “big auras” more of the time; maybe “charisma theory” predicts people smile more when a “charismatic” person talks). Testing this might be hard but it doesn’t seem impossible.

(P.S. It seems like the AI box experiment (itself similar to the more standard Milgram Experiment) is a test of mind control ability, which in some cases comes out positive, like the Milgram Expeiment; this goes to show that the depending on the setup of the Anders/​Soares demon test, it might not have a completely obvious result.)

• Sorry, yes, I meant the psychosis was emergency. Non-psychotic discussion of auras/​demons isn’t.

I’m kind of unclear what we’re debating now.

I interpret us as both agreeing that there are people talking about auras and demons who are not having psychiatric emergencies (eg random hippies, Catholic exorcists), and they should not be bothered, except insofar as you feel like having rational arguments about it.

I interpret us as both agreeing that you were having a psychotic episode, that you were going further /​ sounded less coherent than the hippies and Catholics, and that some hypothetical good diagnostician /​ good friend should have noticed that and suggested you seek help.

Am I right that we agree on those two points? Can you clarify what you think our crux is?

• Verbal coherence level seems like a weird place to locate the disagreement—Jessica maintained approximate verbal coherence (though with increasing difficulty) through most of her episode. I’d say even in October 2017, she was more verbally coherent than e.g. the average hippie or Catholic, because she was trying at all.

The most striking feature was actually her ability to take care of herself rapidly degrading, as evidenced by e.g. getting lost almost immediately after leaving her home, wandering for several miles, then calling me for help and having difficulty figuring out where she was—IIRC, took a few minutes to find cross streets. When I found her she was shuffling around in a daze, her skin looked like she’d been scratching it much more than usual, clothes were awkwardly hung on her body, etc. This was on either the second or third day, and things got almost monotonically worse as the days progressed.

The obvious cause for concern was “rapid descent in presentation from normal adult to homeless junkie”. Before that happened, it was not at all obvious this was an emergency. Who hasn’t been kept up all night by anxiety after a particularly stressful day in a stressful year?

I think the focus on verbal coherence is politically convenient for both of you. It makes this case into an interesting battleground for competing ideologies, where they can both try to create blame for a bad thing.

Scott wants to do this because AFAICT his agenda is to marginalize discussion of concepts from woo /​ psychedelia /​ etc, and would like to claim that Jess’ interest in those was a clear emergency. Jess wants to do this because she would like to claim that the ideas at MIRI directly drove her crazy.

I worked there too, and left at the same time for approximately the same reasons. We talked about it extensively at the time. It’s not plausible that it was even in-frame that considering details of S-risks in the vein of Unsong’s Broadcast would possibly be helpful for alignment research. Basilisk-baiting like that would generally have been frowned upon, but mostly just wouldn’t have come up.

The obvious sources of madness here were

1. The extreme burden of responsibility for the far future (combined with the position that MIRI was uniquely essential to this), and encouragement to take this responsibility seriously, is obviously stressful.

2. The local political environment at the time was a mess—splinters were forming, paranoia was widespread. A bunch of people we respected and worked with had decided the world was going to end, very soon, uncomfortably soon, and they were making it extremely difficult for us to check their work. This uncertainty was, uh, stressful.

3. Psychedelics very obviously induces states closer-than-usual to psychosis. This is what’s great about them—they let you dip a toe into the psychotic world and be back the next day, so you can take some of the insights with you. Also, this makes them a risk for inducing psychotic episodes. It’s not a coincidence that every episode I remember Jess having in 2017 and 2018 was a direct result of a trip-gone-long.

4. Latent tendency towards psychosis

Critically, I don’t think any of these factors would have been sufficient on their own. The direct content of MIRI’s research, and the woo stuff, both seem like total red herrings in comparison to any of these 4 issues.

• I want to specifically highlight “A bunch of people we respected and worked with had decided the world was going to end, very soon, uncomfortably soon, and they were making it extremely difficult for us to check their work.” I noticed this second-hand at the time, but didn’t see any paths toward making things better. I think it had a really harmful effects on the community, and is worth thinking a lot about before something similar happens again.

• Thanks for giving your own model and description of the situation!

Regarding latent tendency, I don’t have a family history of psychosis (but I do of bipolar), although that doesn’t rule out latent tendency. It’s unclear what “latent tendency” means exactly, it’s kind of pretending that the real world is a 3-node Bayesian network (self tendency towards X, environment tendency towards inducing X, whether X actually happens) rather than a giant web of causality, but maybe there’s some way to specify it more precisely.

I think the 4 factors you listed are the vast majority, so I partially agree with your “red herring” claim.

The “woo” language was causal, I think, mostly because I feared that others would apply to coercion to me if I used it too much (even if I had a more detailed model that I could explain upon request), and there was a bad feedback loop around thinking that I was crazy and/​or other people would think I was crazy, and other people playing into this.

I think I originally wrote about basilisk type things in the post because I was very clearly freaking out about abstract evil at the time of psychosis (basically a generalization of utility function sign flips), and I thought Scott’s original comment would have led people to think I was thinking about evil mainly because of Michael, when actually I was thinking about evil for a variety of reasons. I was originally going to say “maybe all this modeling of adversarial/​evil scenarios at my workplace contributed, but I’m not sure” but an early reader said “actually wait, based on what you’ve said what you experienced later was a natural continuation of the previous stuff, you’re very much understating things” and suggested (an early version of) the last paragraph of the basilisk section, and that seemed likely enough to include.

It’s pretty clear that thinking about basilisk-y scenarios in the abstract was part of MIRI’s agenda (e.g. the Arbital article). Here’s a comment by Rob Bensinger saying it’s probably bad to try to make an AI that does a lot of interesting stuff and has a good time doing it, because that objective is too related to consicousness and that might create a lot of suffering. (That statement references the “s-risk” concept and if someone doesn’t know what that is and tries to find out, they could easily end up at a Brian Tomasik article recommending thinking about what it’s like to be dropped in lava.)

The thing is it seems pretty hard to evaluate an abstract claim like Rob’s without thinking about details. I get that there are arguments against thinking about the details (e.g. it might drive you crazy or make you more extortable) but natural ways of thinking about the abstract question (e.g. imagination /​ pattern completion /​ concretization /​ etc) would involve thinking about details even if people at MIRI would in fact dis-endorse thinking about the details. It would require a lot of compartmentalization to think about this question in the abstract without thinking about the details, and some people are more disposed to do that than others, and I expect compartmentalization of that sort to cause worse FAI research, e.g. because it might lead to treating “human values” as a LISP token.

[EDIT: Just realized Buck Shlegeris (someone who recently left MIRI) recently wrote a post called “Worst-case thinking in AI alignment”… seems concordant with the point I’m making.]

• hmm… this could have come down to spending time in different parts of MIRI? I mostly worked on the “world’s last decent logic department” stuff—maybe the more “global strategic” aspects of MIRI work, at least the parts behind closed doors I wasn’t allowed through, were more toxic? Still feels kinda unlikely but I’m missing info there so it’s just a hunch.

• My guess is that it has more to do with willingness to compartmentalize than part of MIRI per se. Compartmentalization is negatively correlated with “taking on responsibility” for more of the problem. I’m sure you can see why it would be appealing to avoid giving into extortion in real life, not just on whiteboards, and attempting that with a skewed model of the situation can lead to outlandish behavior like Ziz resisting arrest as hard as possible.

• I think this is a persistent difference between us but isn’t especially relevant to the difference in outcomes here.

I’d more guess that the reason you had psychoses and I didn’t had to do with you having anxieties about being irredeemably bad that I basically didn’t at the time. Seems like this would be correlated with your feeling like you grew up in a Shin Sekai Yori world?

• I clearly had more scrupulosity issues than you and that contributed a lot. Relevantly, the original Roko’s Basilisk post is putting AI sci-fi detail on a fear I am pretty sure a lot of EAs feel/​felt in their heart, that something nonspecifically bad will happen to them because they are able to help a lot of people (due to being pivotal on the future), and know this, and don’t do nearly as much as they could. If you’re already having these sorts of fears then the abstract math of extortion and so on can look really threatening.

• When I got back into town and talked with Jessica, she was talking about how it might be wrong to take actions that might possibly harm others, i.e. pretty much any actions, since she might not learn fast enough for this to come out net positive. Seems likely to me that the content of Jessica’s anxious perseveration was partly causally upstream of the anxious perseveration itself.

I agree that a decline in bodily organization was the main legitimate reason for concern. It seems obviously legitimate for Jessica (and me) to point out that Scott is proposing a standard that cannot feasibly be applied uniformly, since it’s not already common knowledge that Scott isn’t making sense here, and his prior comments on this subject have been heavily upvoted. The main alternative would be to mostly stop engaging on LessWrong, which I have done.

I don’t fully understand what “latent tendency towards psychosis” means functionally or what predictions it makes, so it doesn’t seem like an adequate explanation. I do know that there’s correlation within families, but I have a family history of schizophrenia and Jessica doesn’t, so if that’s what you mean by latent tendency it doesn’t seem to obviously have an odds ratio in the correct direction within our local cluster.

• By latent tendency I don’t mean family history, though it’s obviously correlated. I claim that there’s this fact of the matter about Jess’ personality, biology, etc, which is that it’s easier for her to have a psychotic episode than for most people. This seems not plausibly controversial.

I’m not claiming a gears-level model here. When you see that someone has a pattern of <problem> that others in very similar situations did not have, you should assume some of the causality is located in the person, even if you don’t know how.

• Listing “I don’t know, some other reason we haven’t identified yet” as an “obvious source” can make sense as a null option, but giving it a virtus dormitiva type name is silly.

I think that Jessica has argued with some plausibility that her psychotic break was in part the result of taking aspects of the AI safety discourse more seriously and unironically than the people around her, combined with adversarial pressures and silencing. This seems like a gears-level model that might be more likely in people with a cognitive disposition correlated with psychosis.

• I interpret us as both agreeing that there are people talking about auras who are not having psychiatric emergencies (eg random hippies), and they should not be bothered.

Agreed.

I interpret us as both agreeing that you were having a psychotic episode, that you were going further /​ sounded less coherent than the hippies, and that some hypothetical good diagnostician /​ good friend should have noticed that and suggested you seek help.

Agreed during October 2017. Disagreed substantially before then (January-June 2017, when I was at MIRI).

(I edited the post to make it clear how I misinterpreted your comment.)

• Interpreting you as saying that January-June 2017 you were basically doing the same thing as the Leveragers when talking about demons and had no other signs of psychosis, I agree this was not a psychiatric emergency, and I’m sorry if I got confused and suggested it was. I’ve edited my post also.

• One thing to add is I think in the early parts of my psychosis (before the “mind blown by Ra” part) I was as coherent or more coherent than hippies are on regular days, and even after that for some time (before actually being hospitalized) I might have been as coherent as they were on “advanced spiritual practice” days (e.g. middle of a meditation retreat or experiencing Kundalini awakening). I was still controlled pretty aggressively with the justification that I was being incoherent, and I think that control caused me to become more mentally disorganized and verbally incoherent over time. The math test example is striking, I think less than 0.2% of people could pass it (to Zack’s satisfaction) on a good day, and less than 3% could give an answer as good as the one I gave, yet this was still used to “prove” that I was unable to reason.

• My recollection is that at that time you were articulately expressing what seemed like a level of scrupulosity typical of many Bay Area Rationalists. You were missing enough sleep that I was worried, but you seemed oriented x3. I don’t remember you talking about demons or auras at all, and have no recollection of you confusedly reifying agents who weren’t there.

• [ ]
[deleted]
• [ ]
[deleted]
• [ ]
[deleted]
• [ ]
[deleted]
• [ ]
[deleted]
• [ ]
[deleted]
• [ ]
[deleted]
• It should be noted that, as I was nominally Nate’s employee, it is consistent with standard business practices for him to prevent me from talking with people who might distract me from my work; this goes to show the continuity between “cults” and “normal corporations”.

This is very much not standard business practice. Working as an employee at four different “normal corporations” over thirteen years, I have never felt any pressure from my bosses (n=5) on who to talk to outside of work. And I’ve certainly been distracted at times!

Now that I’m a manager, I similarly would never consider this, even if I did think that one of my employees was being seriously distracted. If someone wasn’t getting their work done or was otherwise not performing well, we would certainly talk about that, but who their contacts are is absolutely no business of mine.

• I never told Jessica not to talk to someone (or at the very least, I don’t recall it and highly doubt it). IIRC, in that time period, Jessica and one other researcher were regularly inviting Michael to the offices and talking to him at length during normal business hours. IIRC, the closest I came to “telling Jessica not to talk to someone” was expressing dissatisfaction with this state of affairs. The surrounding context was that Jessica had suffered performance (or at least Nate-legible-performance) degredation in the previous months, and we were meeting more regularly in attempts to see if we could work something out, and (if memory serves) I expressed skepticism about whether lengthy talks with Michael (in the office, during normal business hours) would result in improvement along that axis. Even then, I am fairly confident that I hedged my skepticism with caveats of the form “I don’t think it’s a good idea, but it’s not my decision”.

• Thanks for the correction.

A relevant fact is that at MIRI we didn’t have set office hours (at least not that I remember), and Michael Vassar came to the office sometimes during the day. So one could argue that he was talking to people during work hours. (Still, I think the conversations we were having were positive for being able to think more clearly about AI alignment and related topics.) Also it seems somewhat likely that Nate was discouraging Michael from talking with me in general, not just during weekdays/​daytime.

I’ll edit the post to make these things clearer.

• This condition was triggered, Maia announced it, and Maia left the boat and killed themselves; Ziz and friends found Maia’s body later.

Are you saying that Maia was at this time in San Francisco? That’s an interesting claim given that the people in the European group house in which Maia lived in the preceding years knew, Maia died while being on vacation in Poland (their home country).

From their view, Maia was starting to take hormones to transition from male to female a few months earlier and then was spending time alone. That phase usually comes with pretty unstable psychological states.

Shortly before that, they wrote posts about how humans don’t want happiness https://​​web.archive.org/​​web/​​20180104225807/​​http://​​squirrelinhell.blogspot.com/​​2017/​​12/​​happiness-is-chore.html There was also a later post about how they saw life as not worth living given that everything will be wiped out by death anyway sooner or later. I think that post got deleted, but I don’t find an archive copy right now.

I only knew about the circumstances of the death after reading Ziz’s account and previously believed the account of the roommates from the group house who seemed to be missing the crucial information about the interaction with Ziz.

Without Ziz sharing information it’s hard to look into the circumstances of their death.

• Oh, that’s pretty good counter-evidence to my claim; I’ll edit accordingly.

• I’m personally quite unsure about how to think about this event. There’s very little for Ziz to gain by making up a story like this. It reflects badly on her to be partly responsible for the suicide.

It’s also possible that while traveling alone after he said he went to Poland we went by the plane to hang out with Ziz’s crew but it’s all very strange.

• “yeah, I don’t really mind being the evil thing. Seems okay to me.”

Regarding oneself as amoral seems to necessarily involve incoherence AFAICT. Like claiming you don’t have your own oughts or that there is no structure to your oughts. In this case

Like doing a good work of art or playing an instrument. It feels satisfying.

is regarded as a higher good than the judgment of others. Loudly signaling that you don’t care about the judgment of others seems to be a claim about what control surfaces you will and won’t expose to social feedback. Nevertheless, the common sense prior that you shouldn’t expect alliance with people who ‘don’t care about being evil’ to go well seems appropriate.

To put it less abstractly: believe people when they say they are defecting rather than believe it is nested layers of fun and interesting counter signaling.

• Note: I have no relation to MIRI/​CFAR, no familiarity with this situation and am not a metal health expert, so I can’t speak with any specific authority here.

First, I’d like to offer my sympathy for the suffering you described. I’ve had unpleasant intrusive thoughts before. They were pretty terrible, and I’ve never had them to the degree you’ve experienced. X/​S risk research tends to generate a lot of intrusive thoughts and general stress. I think better community norms/​support in this area could help a lot. Here is one technique you may find useful:

1. Raise your hand in front of your face with your palm facing towards you.

2. Fix your eyes on the tip of a particular finger.

3. Move your hand from side to side, while still tracking the chosen finger with your eyes (head remains still).

4. Every time your hand changes direction, switch which finger your eyes track. I.e., first track the tip of the thumb, then track the pointer finger, then the middle, then ring, then pinky, then back to thumb.

This technique combines three simultaneous control tasks (moving your hand, tracking the current finger, switching fingers repeatedly) and also saturates your visual field with the constantly moving background. I find it captures my full attention for as long as a I perform the technique and is therefore useful for interrupting intrusive/​obsessive thoughts.

Another option for dealing with intrusive thoughts is to just start doing squats or similar exercises. This is less distracting from the thoughts, but physically healthier and may serve as negative reinforcement to your brain about avoiding intrusive thoughts in the future.

I was told, by Nate Soares, that the pieces to make AGI are likely already out there and someone just has to put them together. He did not tell me anything about how to make such an AGI, on the basis that this would be dangerous. Instead, he encouraged me to figure it out for myself, saying it was within my abilities to do so.

Did Nate Soares specify that he wanted you to come up with a workable AGI design? If I’d heard my supervisor ask that of me, my interpretation would be something like: “Think about what sort of extra capabilities would make current AI more “agent-y” or general, and how such capabilities might be implemented.”

For example, I might think that adding some sort of multi-episode memory to GPT-3 would make it more agent-y, and think about ways to integrate memory with transformers. Another idea in this ballpark is to train an ML system on human brain activity data as a form of knowledge distillation from humans to AI (as described by Gwern here).

(Neither of these is among my most “dangerous” ideas about improving AI capabilities. I’m comfortable sharing the first because ~everyone already knows improved memory should be useful for general systems. I’m sharing the second because I think it will make many alignment approaches easier, especially value learning, interpretability, alignment by default and HCH).

I’d assume only ~20% at best of the approaches I thought of would actually be useful for capabilities, even if they were well implemented by a team of people smarter than me. If my supervisor then specified that they expected me to produce an actually workable AGI design, with significantly better performance than current state of the art systems, I’d have been blown away and seriously questioning their sanity/​competence.

• Thanks for the suggestion for intrusive thoughts.

If I came up with a non-workable AGI design, that would not be significant evidence for “the pieces to make AGI are already out there and someone just needs to put them together”. Lots of AI people throughout the history of the field have come up with non-workable AGI designs, including me in high school/​college.

• That’s a neat idea with the hand thing.

• It might help your case to write a version of this that removes most of the interpretation you’ve given here, and tries to present just the claims you know to be objective truths. While ‘the plaintiff is failing to personally provide an objective neutral point of view’ seems like a particularly disturbing sort of argument to dismiss something like this on, it is nonetheless the case that this does seem to be the principal defense, and most of those comments are pointing to real issues in your presentation.

Disclaimer, I’m an outsider.

• Yeah, I was similarly thinking that someone (on Jessica’s side of the story) should rewrite the articles. To remove the distracting parts, so that it would be easier to focus on whatever is left.

• It seems to me that there’s been a lot of debate about the causes of the psychosis and the suicides. I haven’t seen the facts on the ground, so I can’t know anything for sure. But as far as I can tell, the Vassarites and the core rationalists generally agree that MIRI&co aren’t particularly bad, with the Vassarites just claiming, well:

Scott asserts that Michael Vassar thinks “regular society is infinitely corrupt and conformist and traumatizing”. This is hyperbolic (infinite corruption would leave nothing to steal) but Michael and I do believe that the problems I experienced at MIRI and CFAR were not unique or unusually severe for people in the professional-managerial class. By the law of excluded middle, the only possible alternative hypothesis is that the problems I experienced at MIRI and CFAR were unique or at least unusually severe, significantly worse than companies like Google for employees’ mental well-being.

And you give some relatively plausible defense of Vassar not being the main cause of the psychosis for the Vassarites. But that raises the question to me, why try to assign an environmental cause at all? It seems more reasonable to just say that the Vassarites are prone to psychosis, regardless of environment. At least I haven’t heard of any clear evidence against this.

• Why not assign an environmental cause in a case where one exists and I have evidence about it? “Vassarites are prone to psychosis” is obviously fundamental attribution error, that’s not how physical causality works. There will be specific environmental causes in “normal” cases of trauma as well.

• Why not assign an environmental cause in a case where one exists and I have evidence about it?

As I understand it, both sides of the issue agree that MIRI isn’t uniquely bad when it comes to frame control and such. MIRI might have some unique themes, e.g. AI torturing people instead of the devil torturing people, or lying about the promise of an approach for AI instead of lying about the promise of an approach for business, but it’s not some unique evil by MIRI. (Please correct me if I’m misunderstanding your accusations here.)

As such, it’s not that MIRI, compared to other environments, caused this. Of course, this does not mean that MIRI didn’t in some more abstract sense cause it, in the sense that one could imagine some MIRI’ which was like MIRI but didn’t have the features you mention as contributors. But the viability of creating such an organization, both cost-wise and success-wise, is unclear, and because the organization doesn’t exist but is instead a counterfactual imagination, it’s not even clear that it would have the effects you hope it would have. So assigning the cause of MIRI not being MIRI’ seems to require a much greater leap of faith.

“Vassarites are prone to psychosis” is obviously fundamental attribution error, that’s not how physical causality works. There will be specific environmental causes in “normal” cases of trauma as well.

Not so obvious to me. There were tons of people are in these environments with no psychosis at all, as far as I know? Meanwhile fundamental attribution error is about when people attribute something to a person where there is a situational factor that would have caused everyone else to act in the same way.

Of course you could attribute this to subtleties about the social relations, who is connected to who and respected by who. But this doesn’t seem like an obviously correct attribution to me. Maybe if I knew more about the social relations, it would be.

• I think you’re trying to use a regression model where I would use something more like a Bayes net. This makes some sense in that I had direct personal experience that includes lots of nodes in the Bayes net, and you don’t, so you’re going to use a lower-resolution model than me. But people who care about the Bayes net I lived in can update on the information I’m presenting.

There were tons of people are in these environments with no psychosis at all, as far as I know?

I think the rate might be higher for former MIRI employees in particular, but I’m not sure how to evaluate; the official base rate is that around 3% of people have or will experience a psychotic break in their lifetime. If there are at least 3 psychotic breaks in former MIRI employees then MIRI would need to have had 100 employees to match the general population rate (perhaps more if the psychotic breaks happened within a few years of each other, and in the general population they’re pretty spread out), although there’s noise here and the official stat could be wrong.

Anyway, “MIRI is especially likely to cause psychosis” (something that could be output by the type of regression model you’re considering) is not the main claim I’m making.

Part of what’s strange about attributing things to “Vassarites” in a regression model is that part of how “Vassarites” (including Vassar) became that way is through environmental causes. E.g. I listened to Michael’s ideas more because I was at MIRI and Michael was pointing out features of MIRI and the broader situation that seemed relevant to me given my observations, and other people around didn’t seem comparably informationally helpful. I have no family history of schizophrenia (that I know of), only bipolar disorder.

• I think the rate might be higher for former MIRI employees in particular, but I’m not sure how to evaluate; the official base rate is that around 3% of people have or will experience a psychotic break in their lifetime. If there are at least 3 psychotic breaks in former MIRI employees then MIRI would need to have had 100 employees to match the general population rate, although there’s noise here and the official stat could be wrong.

Isn’t the rate of general mental illness also higher, e.g. autism or ADHD, which is probably not caused by MIRI? (Both among MIRI and among rationalists and rationalist-adj people more generally, e.g. I myself happen to be autistic, ADHD, GD, and probably also have one or two personality disorders; and I have a family history of BPD.) Almost all mental illnesses are correlated so if you select for some mental illness you’d expect to get other to go along with it.

Anyway, “MIRI is especially likely to cause psychosis” (something that could be output by the type of regression model you’re considering) is not the main claim I’m making.

Part of what’s strange about attributing things to “Vassarites” in a regression model is that part of how “Vassarites” (including Vassar) became that way is through environmental causes. E.g. I listened to Michael’s ideas more because I was at MIRI and Michael was pointing out features of MIRI and the broader situation that seemed relevant to me given my observations, and other people around didn’t seem comparably informationally helpful. I have no family history of schizophrenia (that I know of), only bipolar disorder.

I am very sympathetic to the idea that the Vassarites are not nearly as environmentally causal to the psychosis as they might look. It’s the same principle as above; Vassar selected for psychosis, being critical of MIRI, etc., so you’d expect higher rates even if he had no effect on it.

(I think that’s a major problem with naive regressions, taking something that’s really a consequence and adding it to the regression as if it was a cause.)

I think you’re trying to use a regression model where I would use a Bayes net. This makes some sense in that I had direct personal experience that includes lots of nodes in the Bayes net, and you don’t, so you’re going to use a lower-resolution model than me. But people who care about the Bayes net I lived in can update on the information I’m presenting.

It’s tricky because I try to read the accounts, but they’re all going to be filtered through people’s perception, and they’re all going to assume a lot of background knowledge that I don’t have, due to not having observed it. I could put in a lot of effort to figure out what’s true and false, representative and unrepresentative, but it’s probably not possible for me due to various reasons. I could also just ignore the whole drama. But I’m just confused—if there’s agreement that MIRI isn’t particularly bad about this, then this seems to mostly preclude environmental attribution and suggest personal attribution?

I wouldn’t necessarily say that I use a regression model, as e.g. I’m aware of the problem with just blaming Vassar for causing other’s psychosis. There’s definitely some truth to me being forced to use a lower-resolution model. And that can be terrible. Partly I just have a very strong philosophical leaning towards essentialism, but also partly it just, from afar, seems to be the best explanation.

• I’m just confused—if there’s agreement that MIRI isn’t particularly bad about this, then this seems to mostly preclude environmental attribution and suggest personal attribution?

I’ve read Moral Mazes and worked a few years in the corporate world at Fannie Mae. I’ve also talked a lot with Jessica and others in the MIRI cluster who had psychotic breaks. It seems to me like what happens to middle managers is in some important sense even worse than a psychotic break. Jessica, Zack, and Devi seem to be able to represent their perspectives now, to be able to engage with the hypothesis that some activity is in good faith, to consider symmetry considerations instead of reflexively siding with transgressors.

Ordinary statistical methods—and maybe empiricism more generally—cannot shed light on pervasive, systemic harms, when we lack the capacity to perform controlled experiments on many such systems. In such cases, we instead need rationalist methods, i.e. thinking carefully about mechanisms from first principles. We can also try to generalize efficiently from microcosms of the general phenomenon, e.g. generalizing from how people respond to unusually blatant abuse by individuals or institutions, to make inferences about the effects of pervasive abuse.

But corporate employers are not the only context people live in. My grandfather was an independent insurance broker for much of his career. I would expect someone working for a low-margin business in a competitive industry to sustain much less psychological damage, though I would also expect them to be paid less and maybe have a more strenuous job. I don’t think the guys out on the street a few blocks from my apartment selling fruit for cash face anything like what Jessica faced, and I’d be somewhat surprised if they ended up damaged the way the people in Moral Mazes seem damaged.

• But I’m just confused—if there’s agreement that MIRI isn’t particularly bad about this, then this seems to mostly preclude environmental attribution and suggest personal attribution?

Suppose it’s really common in normal corporations for someone to be given ridiculous assignments by their boss and that this leads to mental illness at a high rate. Each person at a corporation like this would have a specific story of how their boss gave them a really ridiculous assignment and this caused them mental problems. That specific story in each case would be a causal model (if they hadn’t received that assignment or had anything similar to that happen, maybe they wouldn’t have that issue). This is all the case even if most corporations have this sort of thing happen.

• In a sense, everything is caused by everything. If not for certain specifics of the physical constants, the universe as we know it wouldn’t exist. If cosmic rays would strike you in just the right ways, it could probably prevent psychosis. Etc. Further, since causality is not directly observable, even when there isn’t a real causal relationship, it’s possible to come up with a specific story where there is.

This leads to a problem for attributing One True Causal Story; which one to pick? Probably we shouldn’t feel restricted to only having one, as multiple frames may be relevant. But clearly we need some sort of filter.

Probably the easiest way to get a filter is by looking at applications. E.g., there’s the application of, which social environment should you join? Which presumably is about the relative effects of the different environments on a person. I don’t think this most closely aligns with your point, though.

Probably an application near to you is, how should rationalist social environments be run? (You’re advocating for something more like Leverage, in certain respects?) Here one doesn’t necessarily need to compare across actual social environments; one can consider counterfactual ones too. However, for this a cost/​benefit analysis becomes important; how difficult would a change be to implement, and how much would it help with the mental health problems?

This is hard to deduce, and so it becomes tempting to use comparisons across actual social environments as a proxy. E.g. if most people get ridiculous assignments by their boss, then that probably means there is some reason why that’s very hard to avoid. And if most people don’t get severe mental illnesses, then that puts a limit to how bad the ridiculous assignments can be on their own. So it doesn’t obviously pass a cost-benefit test.

Another thing one could look at is how well the critics are doing; are they implementing something better? Here again I’m looking at it from afar, so it’s hard for me to know. But I do have one private semi-direct signal: I am friends with one of the people in your social circles, and I talked to them recently, and they seemed to be doing really badly. Though I don’t know enough to judge too confidently; maybe your group isn’t actually focusing on making something more healthy, and so we wouldn’t expect people to be doing better; maybe the person I’m thinking of isn’t too deeply involved with what you’re doing; maybe it’s just a random downswing and doesn’t mean much in the general picture; maybe it’s just this person; I don’t know.

But if nobody else is implementing something better, then that feels like reasons to be skeptical about the causal assignment here. Though there’s two things to note; first, that this could very well fit with the general Vassarite model—the reason implementing better things might be hard might be that “conflict theory is right” (hard to come up with a better explanation, though people have tried...). And secondly, that neither of the above make it particularly relevant to attribute causality to the individuals involved, since they are inherently about the environment.

So what does make it relevant to attribute causality to the individuals involved? Well, there’s a third purpose: Updating. As an outside observer, I guess that’s the most natural mindset for me to enter. Given that these posts were written and publicized, how should I change my beliefs about the world? I should propagate evidence upwards through the causal links to these events. Why were these accusations raised, by this person, against this organization, on this forum, at this time? Here, everything is up for grabs, whether it’s attribution to a person or an organization; but also, here things tend to be much more about the variance; things that are constant in society do not get updated much by this sort of information.

Anyway, my girlfriend is telling me to go to bed now, so I can’t continue this post. I will probably be back tomorrow.

• (You’re advocating for something more like Leverage, in certain respects?)

Not really? I think even if Leverage turned out better in some ways that doesn’t mean switching to their model would help. I’m primarily not attempting to make policy recommendations here, I’m attempting to output the sort of information a policy-maker could take into account as empirical observations.

This is also why the “think about applications” point doesn’t seem that relevant; lots of people have lots of applications, and they consult different information sources (e.g. encyclopedias, books), each of which isn’t necessarily specialized to their application.

E.g. if most people get ridiculous assignments by their boss, then that probably means there is some reason why that’s very hard to avoid.

That seems like a fully general argument against trying to fix common societal problems? I mean, how do you expect people ever made society better in the past?

In any case, even if it’s hard to avoid, it helps to know that it’s happening and is possibly a bottleneck on intellectual productivity; if it’s a primary constraint then Theory of Constraints suggests focusing a lot of attention on it.

It seems like the general mindset you’re taking here might imply that it’s useless to read biographies, news reports, history, and accounts of how things were invented/​discovered, on the basis that whoever writes it has a lot of leeway in how they describe the events, although I’m not sure if I’m interpreting you correctly.

• I’m primarily not attempting to make policy recommendations here, I’m attempting to output the sort of information a policy-maker could take into account as empirical observations.

This is also why the “think about applications” point doesn’t seem that relevant; lots of people have lots of applications, and they consult different information sources (e.g. encyclopedias, books), each of which isn’t necessarily specialized to their application.

This seems to me to be endorsing “updating” as a purpose; evidence flows up the causal links (and down the causal links, but for this purpose the upwards direction is more important). So I will be focusing on that purpose here. The most interesting causal links are then the ones which imply the biggest updates.

Which I suppose is a very subjective thing? It depends heavily not just on the evidence one has about this case, but also on the prior beliefs about psychosis, organizational structure, etc..

In theory, the updates should tend to bring everybody closer to some consensus, but the direction of change may vary wildly from person to person, depending on how they differ from that consensus. Though in practice, I’m already very essentialist, and my update is in an essentialist direction, so that doesn’t seem to cash out.

(… or does it? One thing I’ve been essentialist about is that I’ve been skeptical that “cPTSD” is a real thing caused by trauma, rather than some more complicated genetic thing. But the stories from especially Leverage and also to an extent MIRI have made me update enormously hard in favor of trauma being able to cause those sorts of mental problems—under specific conditions. I guess there’s an element of, on the more ontological/​theoretical level, people might converge, but people’s preexisting ontological/​theoretical beliefs may cause their assessments of the situation to diverge.)

Not really? I think even if Leverage turned out better in some ways that doesn’t mean switching to their model would help.

My phrasing might have been overly strong, since you would endorse a lot of what Leverage does, due to it being cultish. What I meant is that one thing you seem to have endorsed is that one thing you seem to have endorsed is talking more about “objects” and such.

That seems like a fully general argument against trying to fix common societal problems? I mean, how do you expect people ever made society better in the past?

In any case, even if it’s hard to avoid, it helps to know that it’s happening and is possibly a bottleneck on intellectual productivity; if it’s a primary constraint then Theory of Constraints suggests focusing a lot of attention on it.

I agree that this is a rather general argument, but it’s not supposed to stand on its own. The structure of my argument isn’t “MIRI is normal here so it’s probably hard to change, so the post isn’t actionable”, it’s “It’s dubious things happened exactly as the OP describes, MIRI is normal here so it’s probably hard to change, it’s hard to know whether the changes implied would even work because they’re entirely hypothetical, the social circle raising the critique does not seem to be able to use their theory to fix their own mental health, so the post isn’t actionable”.

(I will send you a PM with the name of the person in your social circle who seemed to currently be doing terribly, so you can say whether you think I am misinterpreting the situation around them.)

None of these would, individually, be a strong argument. Even together they’re not a knockdown argument. But these limitations do make it very difficult for me to make much of it.

It seems like the general mindset you’re taking here might imply that it’s useless to read biographies, news reports, history, and accounts of how things were invented/​discovered, on the basis that whoever writes it has a lot of leeway in how they describe the events, although I’m not sure if I’m interpreting you correctly.

Yes. I don’t really read biographies or history, and mostly don’t read the news, for quite similar reasons. When I do, I always try to keep selection biases and interpretation biases strongly in mind.

I have gradually become more and more aware of the problems with this, but I also have the belief that excessive focus on these sorts of things lead to people overinterpreting everything. There’s probably a balance to be struck.

• MIRI certainly had a substantially conflict-theoretic view of the broad situation, even if not the local situation. I brought up the possibility of convincing DeepMind people to care about AI alignment. MIRI leaders including Eliezer Yudkowsky and Nate Soares told me that this was overly naive, that DeepMind would not stop dangerous research even if good reasons for this could be given. Therefore (they said) it was reasonable to develop precursors to AGI in-house to compete with organizations such as DeepMind in terms of developing AGI first. So I was being told to consider people at other AI organizations to be intractably wrong, people who it makes more sense to compete with than to treat as participants in a discourse.

Anyone from MIRI want to comment on this? This seems weird, especially considering how open Demis/​Legg have been to alignment arguments.

• MIRI leaders including Eliezer Yudkowsky and Nate Soares told me that this was overly naive, that DeepMind would not stop dangerous research even if good reasons for this could be given.

I have no memory of saying this to Jessica; this of itself is not strong evidence because my autobiographical memory is bad, but it also doesn’t sound like something I would say. I generally credit Demis Hassabis as being more clueful than many, though unfortunately not on quite the same page. Adjacent things that could possibly have actually been said in reality might include “It’s not clear that Demis has the power to prevent Google’s CEO from turning up the dial on an AGI even if Demis thinks that’s a bad idea” or “Deepmind has recruited a lot of people who would strongly protest reduced publications, given their career incentives and the impression they had when they signed up” or maybe something something Law of Continued Failure they already have strong reasons not to advance the field so why would providing them with stronger ones help.

Therefore (they said) it was reasonable to develop precursors to AGI in-house to compete with organizations such as DeepMind in terms of developing AGI first.

I haven’t been shy over the course of my entire career about saying that I’d do this if I could; it’s looking less hopeful in 2020 than in 2010 due to the trajectory of machine learning and timelines generally looking shorter.

So I was being told to consider people at other AI organizations to be intractably wrong, people who it makes more sense to compete with than to treat as participants in a discourse.

Not something I’d have said, and the sort of statement which would make a bunch of readers think “Oh Eliezer said that explicitly” but with a nice little motte of “Oh, I just meant that was the implication somebody could have took from other things Eliezer said.”

• In case it refreshes your memory, this was in a research retreat, we were in a living room on couches, you and I and Nate were there, Critch and Ryan Carey were probably there, I was saying that convincing DeepMind people to care about alignment was a good plan, people were saying that was overly naive and competition was a better approach. I believe Nate specifically said something about Demis saying that he couldn’t stop DeepMind researchers from publishing dangerous/​unaligned AI things even if he tried. Even if Demis can be reasoned with, that doesn’t imply DeepMind as a whole can be reasoned with, since DeepMind also includes and is driven by these researchers who Demis doesn’t think he can reason with.

• Sounds like something that could have happened, sure, I wouldn’t be surprised to hear Critch or Carey confirm that version of things. A retreat with non-MIRI people present, and nuanced general discussion on that topic happening, is a very different event to have actually happened than the impression this post leaves in the mind of the reader.

• Do you have a take on Shane Legg? Or any insight to his safety efforts? In his old blog and the XiXiDu interview, he was pretty solid on alignment, back when it was far harder to say such things publicly. And he made this comment in this post, just before starting DeepMind:

A better approach would be to act as a parent organisation, a kind of AGI VC company, that backs a number of promising teams. Teams that fail to make progress get dropped and new teams with new ideas are picked up. General ideas of AGI safety are also developed in the background until such a time when one of the teams starts to make serious progress. At this time the focus would be to make the emerging AGI design as safe as possible.

• I’m even more positive on Shane Legg than Demis Hassabis, but I don’t have the impression he’s in charge.

• My immediate thought on this was that the conclusion [people at other AI organizations are intractably wrong] doesn’t follow from [DeepMind (the organisation) would not stop dangerous research even if good reasons...]. (edited to bold “organisation” rather than “DeepMind”, for clarity)

A natural way to interpret the latter being that people who came to care sufficiently (and be cautious) about alignment would tend to lose/​fail-to-gain influence over DeepMind’s direction (through various incentive-driven dynamics). It’s being possible to change the mind of anyone at an organisation isn’t necessarily sufficient to change the direction of that organisation.
[To be clear, I know nothing DeepMind-specific here—just commenting on the general logic]

• In context I thought it was clear that DeepMind is an example of an “other AI organization”, i.e. other than MIRI.

• Sure, that’s clear of course.
I’m distinguishing between the organisation and “people at” the organisation.
It’s possible for an organisation’s path to be very hard to change due to incentives, regardless of the views of the members of that organisation.

So doubting the possibility of changing an organisation’s path doesn’t necessarily imply doubting the possibility of changing the minds of the people currently leading/​working-at that organisation.

[ETA—I’ll edit to clarify; I now see why it was misleading]

• How did you conclude from Nate Soares saying that that the tools to create agi likely already exist that he wanted people to believe he knew how to construct one?

Why were none of these examples mentioned in the original discussion thread and comment section from which a lot of the quoted sections come from?

1. Because he asked me to figure it out in a way that implied he already had a solution; the assignment wouldn’t make sense if it were to locate a non-workable AGI design (as many AI researchers have done throughout the history of the field); that wouldn’t at all prove that the pieces to make AGI are already out there. Also, there wouldn’t be much reason to think that his sharing a non-workable AGI design with me would be dangerous.

2. I believe my previous post was low on detail partially due to traumatic conditioning making these things hard to write about. I got a lot of info and psychological healing from telling a “normal” person (not part of rationalist scene) about what happened while feeling free to be emotionally expressive along the way. I mentioned screaming regarding the “infohazard” concept being used to cover up circumstances of deaths; I also screamed regarding the “create a workable AGI design” point. This probably indicates that some sort of information connection/​flow was suppressed.

• If someone told me to come up with an AGI design and that I already knew the parts, then I would strongly suspect that person was trying to make me do a Dantzig to find the solution. (Me thinking that would of course make it not really work.)

• I was given ridiculous statements and assignments including the claim that MIRI already knew about a working AGI design and that it would not be that hard for me to come up with a working AGI design on short notice just by thinking about it, without being given hints.

There’s a huge gulf between “AGI ideal that sounds like it will work to the researcher who came up with it” and “AGI idea that actually works when tried.” Like, to the point where AGI researchers having ideas that they’re insanely overconfident in, that don’t work when tried, is almost being a trope. How much of an epistemic problem this is depends on the response to outside view, I think. If you remove the bluster, “working AGI design” turns into “research direction that sounds promising”. I do think that, if it’s going to be judged by the “promising research direction” standard rather than the “will actually work when tried” standard, then coming up with an AGI design is a pretty reasonable standard?

• I agree that coming up with a “promising research direction” AI design would have been a reasonable assignment. However, such a research direction if found wouldn’t provide significant evidence for Nate’s claim that “the pieces to make AGI are already out there and someone just has to put them together”, since such research directions have been found throughout the AI field without correspondingly short AI timelines.

• Thank you for posting this. It clearly took a lot of effort, it deeply personal, and… places you in a weird spot relative to the “Community”.

This is truth. Truth is sacred. Annddd.… God damn that does not seem like a safe/​sensible workspace. I hope you are in a better headspace now, and get something of a break/​ warmth/​ whatever it happens to be that nourishes YOUR person.

I particularly want to state my appreciation for the orderly manner everything is presented which… isn’t the standard way of approaching such a personal story/​account, and presumably took extra effort, but was VERY helpful for me as a reader.

Take care. Keep good people around you. Be well.

• [ ]
[deleted]