Mental Health and the Alignment Problem: A Compilation of Resources
This is a post about mental health and disposition in relation to the alignment problem. It compiles a number of resources that address how to maintain wellbeing and direction when confronted with existential risk. The target audience is anyone concerned with alignment, but the aim is for it to be particularly helpful for those who are new to these ideas. My sense is that most people who have known about the alignment problem for some time have already found their own strategies for resilience.
Many people in this community have posted similar ideas after Yudkowsky’s “Death With Dignity” generated so much conversation on the subject. This post intends to be more touchy-feely, dealing more directly with emotional landscapes than questions of timelines or probabilities of success.
The resources section would benefit from community additions. Please suggest any resources that you would like to see added to this post and I will update it.
Please note that this document is not intended to supplement professional medical or psychological help in any way. Many preexisting mental health conditions can be exacerbated by these conversations. If you are concerned that you may be experiencing a mental health crisis, please consult a professional.
There is no right way to emotionally respond to the reality of approaching superintelligent AI, and our collective responsibility to align it with our values.
There are days when I find it easy to be inspired by the difficulty of the alignment problem, and days where my life feels richer, more meaningful, and more charged for knowing about it.
There are other days when I find the whole affair terrifying and bleak, and days where I feel claustrophobic, sentenced to an impossibly cruel fate.
There are other days when I’m not in touch with the magnitude of the stakes, and where I am capable of acknowledging the alignment problem but have lost contact with it in my body and emotions.
I try not to pass judgment on how my sensitivity or vulnerability shifts day to day. But I recognize that my disposition matters. It has consequences on my motivation and actions, and frames how I perceive and interpret the world. And I recognize that learning about alignment has created a fundamental shift in my disposition; I experience the world in a different way now. The emotions I need to process (duty? zeal? terror? awe?) are, at the very least, more complicated.
I imagine it is similar for many of you, and will be similar for many more as they learn about the alignment problem and the fact that we might not succeed. As that happens, I want to ensure we have the tools and resources to be okay. Here, the valence of “be ok” is your decision. This question could be rephrased “how can I thrive despite the alignment problem,” “how can I cope with the alignment problem,” “how can I overcome my fear of the alignment problem,” etc. Everyone needs to find their own question and their own answer.
At its foundation “being ok” is the decision to continue to live facing reality and the alignment problem directly, with internal stability and rationality intact. And as a high ideal, we’re going for some degree of inviolability, of unconditional wellbeing. The kind of wellbeing that holds onto “okayness” even if the possibility of solving alignment drops to 0.
As we learn to do this, we also gain the ability to be more helpful and compassionate to others. I have at times found myself distressed at someone’s inability to grasp the alignment problem the way I do, or at other times feeling almost a schadenfreude “I told you so,” when someone I previously talked to about alignment says “now I get it, and I’m scared.” We can do more good for the world and each other when we come from a place of compassion, understanding the sensitivity of each new mind that recognizes the alignment problem for what it is. Our ability to stand in some place of positive mental health and stability while facing the alignment problem directly can be difficult. It is a gift if we can do that for ourselves, and a gift if we can share it with others.
Fortunately we don’t have to do this alone, and this community is a good place to start. Many community members have found ways to make sense of themselves, their work, and their lives in relation to the alignment problem, and written about it in posts that we can draw inspiration from.
Below, I’ve put together many of the resources that I could find on the subject, summarized them, and broken them into categories for others to easily access. I’ve tried to pull together resources that have something specific to say about mental health under challenging circumstances like alignment.
That said, it was difficult to find posts directly relevant to this conversation as the majority of alignment-related content (and this is a good thing) focuses on the object-level rather than derivative issues like mental health.
If you know of other resources that would be useful to add to this list, or would like to correct any mischaracterization of resources I’ve currently listed, that would be helpful. Also, if you would have your own testimony/emotional approach to alignment, I invite you to share that in the comments below.
Text in italics below are quotations from the posts.
These posts explain orientations towards alignment that can help frame many aspects of our lives, from our time allocation and decision-making to how we choose to react to certain emotions that arise.
Ruby: A Quick Guide to Confronting Doom. Start here. This post is exactly about this subject, and is a good preface to reading the opinions below with the appropriate epistemic distance.
It’s easy to hear the proclamations of [highly respected] others and/or everyone else reacting and then reflexively update to “aaahhhh”. I’m not saying “aaahhhh” isn’t the right reaction, but I think for any given person it should come after a deliberate step of processing arguments and evidence to figure out your own anticipations…My guess is that people who are concluding P(Doom) is high will each need to figure out how to live with it for themselves. My caution is just that whatever strategy you figure out should keep you in touch with reality (or your best estimate of it), even if it’s uncomfortable.
Eliezer Yudkowsky: MIRI announces new ‘Death with Dignity’ strategy. Partially in jest, this post advocates that a good orientation for dealing with the alignment problem is to take actions that generate “dignity.” There has been debate in the community over both the aesthetics and content of the post, but there’s a coherent takeaway that I think most can agree on: try to be rational, and failing that, develop a good deontological strategy that protects against irresponsible action. There are a number of other Yudkowsky posts below which add nuance to this framework, and his Coming of Age sequence discusses alignment in Beyond the Reach of God.
So don’t get your heart set on that “not die at all” business. Don’t invest all your emotion in a reward you probably won’t get. Focus on dying with dignity—that is something you can actually obtain, even in this situation… the measuring units of dignity are over humanity’s log odds of survival—the graph on which the logistic success curve is a straight line. A project that doubles humanity’s chance of survival from 0% to 0% is helping humanity die with one additional information-theoretic bit of dignity.
The comments section of this post offers a number of meaningfully different perspectives, including AI_WAIFU’s “Fuck. That. Noise.” disagreement.
Alex Turner: Emotionally Confronting a Probably Doomed World. A response to Yudkowsky’s post, Turner argues that we should decouple our emotional response from the probability of doom, and escape the idea that we are “living in a tragedy.” Turner’s Swimming Upstream talks about earlier decisions to confront the alignment problem.
We do not live in a story. We can, in fact, just assess the situation, and then do what makes the most sense, what makes us strongest and happiest. The expected future of the universe is—by assumption—sad and horrible, and yet where is the ideal-agency theorem which says I must be downtrodden and glum about it?
Landfish: Don’t die with dignity, instead play your outs. A response to Yudkowsky’s post, Landfish argues for an MTG-inspired strategy of “playing your outs,” or responding to a low-odds-of-success-future by looking ahead for what opportunities and affordances might still be available. Note that there is disagreement about the risk of this strategy and if it is consequentially useful for alignment; I include it here because it also offers an emotional frame.
I find playing to your outs a lot more motivating. The framing doesn’t shy away from the fact that winning is unlikely. But the action is “playing” rather than “dying”. And the goal is “outs” rather than “dignity”. Again, I think the difference is in connotation and not actually strategy.
Nate Soares: On Caring. Nate’s post describes some of his motivation to be an effective altruist and do as much good as possible. While not specific to alignment, Nate works at MIRI and this view could be understood through that context as an argument for facing the alignment problem “courageously.”
Humanity is playing for unimaginably high stakes. At the very least, there are billions of people suffering today. At the worst, there are quadrillions (or more) potential humans, transhumans, or posthumans whose existence depends upon what we do here and now… Courage isn’t about being fearless, it’s about being able to do the right thing even if you’re afraid. And similarly, addressing the major problems of our time isn’t about feeling a strong compulsion to do so. It’s about doing it anyway, even when internal compulsion utterly fails to capture the scope of the problems we face.
John Wentworth: WE CHOOSE TO ALIGN AI, and The Plan. Together, these posts are a summary of Wentworth’s emotional position with respect to alignment and his specific plan to work on the problem. He offers a perspective that the magnitude of the challenge is reason for inspiration, not despair.
When people first seriously think about alignment, a majority freak out. Existential threats are terrifying. And when people first seriously look at their own capabilities, or the capabilities of the world, to deal with the problem, a majority despair… but for someone who wants the challenge, the emotional response is different. The problem is terrifying? Our current capabilities seem woefully inadequate? Good; this problem is worthy. The part of me which looks at a rickety ladder 30 feet down into a dark tunnel and says “let’s go!” wants this. The part of me which looks at a cliff face with no clear path up and cracks its knuckles wants this. The part of me which looks at a problem with no clear solution and smiles wants this. The response isn’t tears, it’s “let’s fucking do this”.
Alex Flint: Musings on General Systems Alignment. Flint advocates that we envision the world as “fundamentally friendly to [AI alignment] efforts,” a perspective that allows us more optimism, resolve, and potentially more success in speaking with others about the issue.
It is not that our civilization has woken up completely to the dangers of advanced AI. It is that our civilization has not woken up, yet wishes to wake up, and knows that it wishes to wake up, and has found just enough clarity to bestow significant power and resources to us in the hope that we will take up leadership… Our job is to find the resolve to move forward with this difficult task, without getting caught up in the harmful patterns that exist in the world, and without losing track of the subtle way in which everyone is on our side.
Richard Ngo: My Attitude Towards Death. Ngo’s post discusses fear of death, and his optimism for the future. He implies a strategy of “conversing” with his fear and trying to reassure it as a method for better integrating concerns.
What would happen if I talked more to the part [of me] that’s scared of death, to try and figure out where it’s coming from? By default, I expect it’d be uncooperative—it wants to continue being scared of death, to make sure that I act appropriately (e.g. that I stay ambitious). Can I assure it that I’ll still try hard to avoid death if it becomes less scared? One source of assurance is if I’m very excited about a very long life—which I am, because the future could be amazing. Another comes from the altruistic part of me, whose primary focus is increasing the probability that the future will in fact be amazing. Since I believe that we face significant existential risk this century, working to make humanity’s future go well overlaps heavily with working to make my own future go well. I think this broad argument (along with being in communities which reward longtermist altruism) has helped make the part of me that’s scared of death more quiescent.
Holden Karnofsky: Call to Vigilance. In the last post in Karnofsky’s The Most Important Century sequence, he describes that his emotional response to the alignment problem (and other challenges of our time) is one of intense, mixed emotions. He warns against people acting recklessly just to “do something,” and advocates that people should remain aware and put themselves in positions to take robustly good actions.
When confronting the “most important century” hypothesis, my attitude doesn’t match the familiar ones of “excitement and motion” or “fear and avoidance.” Instead, I feel an odd mix of intensity, urgency, confusion and hesitance. I’m looking at something bigger than I ever expected to confront, feeling underqualified and ignorant about what to do next. This is a hard mood to share and spread, but I’m trying.
Please suggest additional “positions” and I will add them here.
These posts provide guidance for stabilizing positive states, and for transforming negative emotions that may arise when facing existential risks such as anxiety, fear, despair, apathy, and depression.
On rising to the challenge: Yudkowsky and others: Challenging the Difficult and Heroic Responsibility. As demonstrated by the “positions” above, many people find that the best way to counter negative emotions around the alignment problem is to work on it directly. These resources advocate for building the internal drive to tackle problems as serious as alignment, and may be especially useful for individuals struggling with “helplessness” by transforming that feeling into action.
”You could call it heroic responsibility, maybe,” Harry Potter said. “Not like the usual sort. It means that whatever happens, no matter what, it’s always your fault. Even if you tell Professor McGonagall, she’s not responsible for what happens, you are. Following the school rules isn’t an excuse, someone else being in charge isn’t an excuse, even trying your best isn’t an excuse. There just aren’t any excuses, you’ve got to get the job done no matter what.” –HPMOR, chapter 75.
If you’re motivated to do something about alignment, there are many pragmatic posts on LW as well as non-LW resources like AI Safety Support, the AGI Safety Fundamentals Course, and 80,000 Hours.
On how to overcome negative emotions: Nate Soares: Replacing Guilt. This foundational sequence has helped many people in the LW community transform feelings of guilt, resistance, sorrow, imposter syndrome, and other negative emotions into inspiration. Despite the title, its scope is much larger that “guilt” and is a great starting place for any reader. One post to call out is Detach the Grim-o-meter, which offers the perspective that one can continue to be “curious, playful, and relaxed” even when looking directly at existential risks.
When all is said and done, Nature will not judge us by our actions; we will be measured only by what actually happens. Our goal, in the end, is to ensure that the timeless history of our universe is one that is filled with whatever it is we’re fighting for. For me, at least, this is the underlying driver that takes the place of guilt: Once we have learned our lessons from the past, there is no reason to wrack ourselves with guilt. All we need to do, in any given moment, is look upon the actions available to us, consider, and take whichever one seems most likely to lead to a future full of light.
On accepting sorrow and fear: Yudkowsky’s Feeling Rational and Luke Muehlhauser’s Musks’ Non-missing Mood. In contrast with some of the posts above that encourage decoupling emotions from probabilities of doom, these posts offer the perspective that negative emotions are not only a natural, but also a rational response. For those confronting negative emotions who would rather accept and work with those feelings than transform them, these posts may offer some insight.
When something terrible happens, I do not flee my sadness by searching for fake consolations and false silver linings. I visualize the past and future of humankind, the tens of billions of deaths over our history, the misery and fear, the search for answers, the trembling hands reaching upward out of so much blood, what we could become someday when we make the stars our cities, all that darkness and all that light—I know that I can never truly understand it, and I haven’t the words to say. Despite all my philosophy I am still embarrassed to confess strong emotions, and you’re probably uncomfortable hearing them. But I know, now, that it is rational to feel. - Feeling Rational
On working with imposter syndrome: Seven years ago, Luke Muehlhauser recommended If you’re an “AI safety lurker,” now would be a good time to de-lurk. But imposter syndrome and self-doubt can prevent people from raising their hand. Yudkowsky’s Hero Licensing talks about his own experience questioning the value of his work. Scott Alexander’s Parable of the Talents and Nicole Ross’ Desperation Hamster Wheels are not specific to alignment, but offer some advice on how to work with feelings of inadequacy. That said, self-worth is a deeper subject than just imposter syndrome and likely needs to be addressed outside of the context of productivity entirely, not to mention alignment.
When someone feels sad because they can’t be a great scientist, it is nice to be able to point out all of their intellectual strengths and tell them “Yes you can, if only you put your mind to it!” But this is often not true. At that point you have to say “f@#k it” and tell them to stop tying their self-worth to being a great scientist. And we had better establish that now, before transhumanists succeed in creating superintelligence and we all have to come to terms with our intellectual inferiority. - Parable of the Talents
On being honest about concerns: Katja Grace Beyond fire alarms: freeing the groupstuck. This post is primarily a response to Yudkowsky’s “There is No Fire Alarm for AGI,” but offers relevant ideas for how to deal with situations where one is afraid of looking silly for being overly-concerned about AI risk.
Practice voicing your somewhat embarrassing concerns, to make it easier for others to follow (and easier for you to do it again in future)… React to others’ concerns that don’t sound right to you with kindness and curiosity instead of laughter. Be especially nice about concerns about risks in particular, to counterbalance the special potential for shame there. [or about people raising points that you think could possibly be embarrassing for them to raise]
On overcoming avoidance: Anna Salaman’s Flinching away from the Truth and Making your explicit reasoning trustworthy. Sometimes people avoid thinking about the alignment problem due to the concern of mistaken beliefs, going down lines of thinking that lead to seductive but inaccurate conclusions. Anna’s posts may offer reassurance for those who are hesitant to engage fully to trust their own reasoning and not rely on others’ positions.
“I don’t want to think about that! I might be left with mistaken beliefs!” tl;dr: Many of us hesitate to trust explicit reasoning because we haven’t built the skills that make such reasoning trustworthy. Some simple strategies can help.
On facing death: In addition to Ngo’s My Attitude Towards Death, there are a number of LW posts on death that may be useful to this conversation. Some such as Joe Carlsmith’s Thoughts on Being Mortal aren’t about alignment but confront fear of death directly, while some such as Yudkowsky’s The Meaning that Immortality Gives to Life touch on the singularity but are more about avoiding death. Avoidance of death is likely the crux of most fear and sorrow around the alignment problem, so from a purely mental-health related standpoint, it may be meaningful to try to separate emotional response to death from the alignment problem itself. Finding ways to confront death directly may afford a deep inviolability to existential fear.
Sometimes, on the comparatively rare occasions when I experience even-somewhat-intense sickness or pain, I… am brought more directly into the huge number of subjective worlds filled with relentless, inescapable pain. These glimpses often feel like a sudden shaking off of a certain kind of fuzziness; a clarifying of something central to what’s really going on in the world; and it also comes with fear of just how helpless we can become. - Thoughts on Being Mortal
HPMOR advice on facing existential risk: Yudkowsky’s Harry Potter and the Methods of Rationality is not about AI alignment (Harry deals mostly with local-to-planetary-scale rather than cosmological/hyperexistential threats), but the depicted emotions and mental strategies have direct analogue. The story contains deep explorations of the internal experience of facing seemingly impossible odds, the burden of heroic responsibility, difficult tradeoffs and the necessity of sacrifice, and the motivation for rational action and self-improvement. A non-exhaustive list of chapters that might be useful (and would be much more useful in context with the whole sequence):
Ch 39: death, motivation for transhumanism
Ch 43-46: fear, death, motivation for transhumanism
Ch 56-58: optimizing against improbable odds, despair
Ch 63: the burden of responsibility, longing for a normal life
Ch 75: heroic responsibility
Ch 79-82: sacrifice
Ch 88: fear of expressing panic, bystander apathy
Ch 89: accepting/rejecting an unacceptable reality
Ch 110: guilt, shame
Ch 111-115: optimizing against improbable odds, despair
Ch 117: guilt, sacrifice
EA resources on general wellbeing and burnout. While not specific to alignment, it would be a mistake not to mention the wealth of information on the EA forum related to mental health such as Miranda Zhang’s Mental Health Resources tailored for EAs (WIP) and Ewelina Tur’s List of Mental Health Resources. The EA forum also has a bunch of specific posts on burnout, like Elizabeth’s Burnout, Tessa’s Aiming for the minimum of self-care is dangerous, and Julia Wise’s Cheerfully.
Tools and Practices
These practices span from interventionist practices aimed at quickly cutting through negative states to longer-term practices aimed at building up more sustainable wellbeing.
Meditation: Kaj Sotala’s My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms. This post explains how meditation practices can help people develop unconditional equanimity, even in the face of existential risk. Sotala’s sequence goes deeper in these ideas, and he also has a number of practical individual posts like Overcoming suffering: emotional acceptance.
But if you cared about things like saving the world, then you will still continue to work on saving the world, and you will be Looking at things which will help you save the world—including ones that increase your rationality. It’s just that if the world ends up ending, it won’t feel like the end of the world. Of course, you will still feel intense grief and disappointment and everything that you’d expect to feel about the world ending. Intense grief and disappointment just won’t be the end of the world. - My attempt to explain Looking
Productivity Sprints: Nate Soares’ The mechanics of my recent productivity and Logan Riggs’ Saving the world in 80 days and epilogue. Given that many of the above posts advocate for getting involved as a way to respond to the magnitude of the alignment problem, it’s useful to have testimonials about how people got started. Nate’s post provides some practical advice, and Logan’s demonstrates what it looks like to put that advice into practice. Alex Turner’s Problem relaxation as a tactic is also helpful for those looking to get into alignment who find the scope of the problem too large.
A decade ago, I decided to save the world. I was fourteen, and the world certainly wasn’t going to save itself. I fumbled around for nine years; it’s surprising how long one can fumble around. I somehow managed to miss the whole idea of existential risk and the whole concept of an intelligence explosion… A year ago, I finally read the LessWrong sequences… On Saturday I was invited to become a MIRI research associate. It’s been an exciting year, to say the least. - The mechanics of my recent productivity
Focusing and Noticing: If you’re not sure what you “feel” about the alignment problem, or emotions you’ve previously felt are out of reach, the techniques of Focusing and Noticing can help. These methods bring awareness to sensations within the body, which increases clarity and affords an opportunity to do something about the feelings. For example, it may help uproot unconscious motivations that may be driving undesirable habits (procrastination, doomscrolling, etc.), or may help with anxiety or self-doubt.
Focusing refers to a family of introspective techniques… whose aim is to access one’s “gut” or “System 1″ feelings. Archetypically, sensations within the body are approached with a spirit of gentle curiosity, and possible verbal labels are checked against felt senses. Where successful, this can improve internal understanding and allow split off trauma or conflict between subagents to be processed for improved internal alignment.
Dark Arts: Nate Soares’ The Dark Arts of Rationality. “Dark Arts” is a term for methods which involve deception or believing untrue things, such as intentional compartmentalization, inconsistency, or modifying terminal goals. These are not methods I’d recommend for everyone, but may help some individuals balance their life/productivity with intense feelings related to the alignment problem.
We are fortunate, as humans, to be skilled at compartmentalization: this helps us work around our mental handicaps without sacrificing epistemic rationality. Of course, we’d rather not have the mental handicaps in the first place: but you have to work with what you’re given. We are weird agents without full control of our own minds. We lack direct control over important aspects of ourselves. For that reason, it’s often necessary to take actions that may seem contradictory, crazy, or downright irrational.
Unfortunately, this section is quite bare at the moment. I am interested in using it to promote therapists, coaches, instructors, or other individuals who provide support to those who may be struggling with their reactions to the alignment problem. Please let me know if you or someone you know would be a good fit for this list.
EA Mental Health Navigator has a list of coaches and therapists who have experience working with effective altruists. Lynette Bye is one of them, and has written a post with advice for how to select a therapist. Tee Barnett is another coach with experience discussing existential risk.
A Final Note
One can make a case that robust mental health is instrumental for working on alignment. And that’s certainly true. If you’re reading this, it’s probably because you are in a community that is working on alignment or adjacent to it. And if you’re not, but are interested in finding more ways to help, this post points to resources to get started. One of the benefits of robust mental health practices is that it creates the stability necessary to continue to do the necessary work.
But this post is written with the intention of increasing wellbeing, not productivity. We work on the alignment problem because we are driven by our deep care to protect the world we know, the one in which people are capable of joy and beauty and love. I believe it’s meaningful for us to abide in wellbeing and flourish for no other reason than that it is what life is about. Wellbeing is instrumental for solving alignment, but more importantly, wellbeing is why we’re trying to solve it.
While there is disagreement about the timeline on which we need to solve this problem, and disagreement about our probability of success, there’s broad acceptance in this community that the problem is real and the stakes dire. Even those who are optimistic about things going well could imagine worlds in which they don’t.