Eliezer’s Unteachable Methods of Sanity
“How are you coping with the end of the world?” journalists sometimes ask me, and the true answer is something they have no hope of understanding and I have no hope of explaining in 30 seconds, so I usually answer something like, “By having a great distaste for drama, and remembering that it’s not about me.” The journalists don’t understand that either, but at least I haven’t wasted much time along the way.
Actual LessWrong readers also sometimes ask me how I deal emotionally with the end of the world.
I suspect a more precise answer may not help. But Raymond Arnold thinks I should say it, so I will say it.
I say again, I don’t actually think my answer is going to help. Wisely did Ozy write, “Other People Might Just Not Have Your Problems.” Also I don’t have a bunch of other people’s problems, and other people can’t make internal function calls that I’ve practiced to the point of hardly noticing them. I don’t expect that my methods of sanity will be reproducible by nearly anyone. I feel pessimistic that they will help to hear about. Raymond Arnold asked me to speak them anyways, so I will.
Stay genre-savvy / be an intelligent character.
The first and oldest reason I stay sane is that I am an author, and above tropes. Going mad in the face of the oncoming end of the world is a trope.
I consciously see those culturally transmitted patterns that inhabit thought processes aka tropes, both in fiction, and in the narratives that people try to construct around their lives and force their lives into.
The trope of somebody going insane as the world ends, does not appeal to me as an author, including in my role as the author of my own life. It seems obvious, cliche, predictable, and contrary to the ideals of writing intelligent characters. Nothing about it seems fresh or interesting. It doesn’t tempt me to write, and it doesn’t tempt me to be.
It would not be in the interests of an intelligent protagonist to amplify their own distress about an apocalypse into more literarily dramatic ill-chosen behavior. It might serve the interests of a hack author but it would not help the character. Understanding that distinction is the first step toward writing more intelligent characters in fiction. I use a similar and older mental skill to decide which tropes to write into the character that is myself.
This sense—which I might call, genre-savviness about the genre of real life—is historically where I began; it is where I began, somewhere around age nine, to choose not to become the boringly obvious dramatic version of Eliezer Yudkowsky that a cliche author would instantly pattern-complete about a literary character facing my experiences. Specifically, though I expect this specific to mean nothing to a supermajority of you, I decided that as a relatively smart kid I would not become Raistlin Majere, nor ever exhibit a large collection of related tropes.
The same Way applies, decades later, to my not implementing the dramatic character a journalist dreams up—a very boring and predictable pattern-completion of a character—when they dream up a convenient easy-to-write-about Eliezer Yudkowsky who is a loudly tortured soul about his perception of the world’s end approaching along its default course.
“How are you coping?” journalists sometimes ask me, and sometimes nowadays they have become worried themselves and want to know for themselves if there’s a key to coping. But often today, and before ChatGPT almost always, they are planning a Character-Focused Story about how my Tortured Soul deals with an imaginary apocalypse, to exhibit to their readers like a parent takes their kids to the zoo to stare at a strange animal. I reply to them “I have a great distaste for drama”, but the actual answer is “I am a better writer than you, and I decided not to write myself as that incredibly cliche person that would be easy and convenient for you to write about.”
“Going insane because the world is ending” would be a boring trope and beneath my dignity to choose as my actual self’s character.
Don’t make the end of the world be about you.
“How are you coping with the end of the world?” journalists sometimes ask me, and I sometimes reply, “By remembering that it’s not about me.” They have no hope of understanding what I mean by this, I predict, because to them I am the subject of the story and it has not occurred to them that there’s a whole planet out there too to be the story-subject. I think there’s probably a sense in which the Earth itself is not a real thing to most modern journalists.
The journalist is imagining a story that is about me, and about whether or not I am going insane, not just because it is an easy cliche to write, but because personality is the only real thing to the journalist.
This is also a pattern that you can refuse, when you write the story that is yourself; it doesn’t have to be a story that is ultimately about you. It can be about humanity, humane preferences, and galaxies. A sentence about snow is words, is made of words, but it is about snow. You are made of you, but you don’t need to be all about yourself.
If I were to dwell on how it impacted me emotionally that the world was ending, I would be thinking about something which genuinely doesn’t matter to me very much compared to how the world is ending. Having dramatic feelings is not mostly what I am about—which is partly how I ended up being not much made of them, either; but either way, they’re not what I’m about.
So long ago that you probably can’t imagine what it was like back then, not just before ChatGPT but years before the age of deep learning at all, there was a person who thought they were like totally going to develop Artificial General Intelligence. Then they ran into me; and soon after, instead started agonizing about how they had almost destroyed the world. Had they actually been that close to success? Of course not. But I don’t relate to status as most people do, so that part, the status-overreach, wasn’t the part I was rolling my eyes about. It is not the sort of epistemic prediction error that I see as damnable in the way that a status-regulator sees it as the worst thing in the world; to underestimate oneself is no more virtuous than to overestimate oneself. Rather, I was rolling my eyes about the part that was a more blatant mistake, completely apart from the epistemic prediction error they probably couldn’t help; the part that would have been a mistake even if they had almost destroyed the world. I was rolling my eyes about how they’d now found a new way of being the story’s subject.
Even if they had almost destroyed the world, the story would still not properly be about their guilt or their regret, it would be about almost destroying the world. This is why, in a much more real and also famous case, President Truman was validly angered and told “that son of a bitch”, Oppenheimer, to fuck off, after Oppenheimer decided to be a drama queen at Truman. Oppenheimer was trying to have nuclear weapons be about Oppenheimer’s remorse at having helped create nuclear weapons. This feels obviously icky to me; I would not be surprised if Truman felt very nearly the same.
And so similarly I did not make a great show of regret about having spent my teenage years trying to accelerate the development of self-improving AI. Was it a mistake? Sure. Should I promote it to the center of my narrative in order to make the whole thing be about my dramatic regretful feelings? Nah. I had AGI concerns to work on instead.
I did not neglect to conduct a review of what I did wrong and update my policies; you know some of those updates as the Sequences. But that is different from re-identifying myself as a dramatic repentent sinner who had thereby been the story’s subject matter.
In a broadly similar way: If at some point you decide that the narrative governing your ongoing experience will be about you going insane because the world is ending: Wow, congratulations at making the end of the world still be about you somehow.
Just decide to be sane, and write your internal scripts that way.
The third way I stay sane is a fiat decision to stay sane.
My mental landscape contains that option; I take it.
This is the point I am even less expecting to be helpful, or to correspond to any actionable sort of plan for most readers.
I will nonetheless go into more detail that will probably not make any sense.
Besides being a thing I can just decide, my decision to stay sane is also something that I implement by not writing an expectation of future insanity into my internal script / pseudo-predictive sort-of-world-model that instead connects to motor output.
(Frankly I expect almost nobody to correctly identify those words of mine as internally visible mental phenomena after reading them; and I’m worried about what happens if somebody insists on interpreting it anyway. Seriously, if you don’t see phenomena inside you that obviously looks like what I’m describing, it means, you aren’t looking at the stuff I’m talking about. Do not insist on interpreting the words anyway. If you don’t see an elephant, don’t look under every corner of the room until you find something that could maybe be an elephant.)
One of the ways you can get up in the morning, if you are me, is by looking in the internal direction of your motor plans, and writing into your pending motor plan the image of you getting out of bed in a few moments, and then letting that image get sent to motor output and happen. (To be clear, I actually do this very rarely; it is just a fun fact that this is a way I can defeat bed inertia.)
There are a lot of neighboring bad ideas to confuse this with. The trick I’m describing above does not feel like desperately hyping myself up and trying to believe I will get out of bed immediately, with a probability higher than past experience would suggest. It doesn’t involve lying to myself about whether I’m likely to get up. It doesn’t involve violating the epistemic-instrumental firewall (factual questions absolutely separated from the consequences of believing things), to give myself a useful self-fulfilling prophecy. It is not any of the absurd epistemic-self-harming bullshit that people are now flogging under brand names like “hyperstition”, since older names like “chaos magick” or “lying to yourself” became less saleable. I still expect to them to point to this and say, “Why, of course that is the same thing I am selling to you as ‘hyperstition’!” because they would prefer not to look at my finger, never mind being able to see where I’m pointing.
With that said: The getting-out-of-bed trick involves looking into the part of my cognition where my action plan is stored, and loading an image into it; and because the human brain’s type system is a mess, this has the native type-feeling of an expectation or prediction that in a few seconds I will execute the motor-plan and get out of bed.
That I am working with cognitive stuff with that type-feel, is not the same thing as lying to myself about what’s likely to happen; no, not even as a self-fulfilling prophecy. I choose to regard the piece of myself whose things-that-feel-like-predictions get sent as default motor output, as having the character within my Way of a plan I am altering; rather than, you know, an actual mistaken prediction that I am believing. If that piece of myself gets to have me roll out of bed, I get to treat it as a plan rather than as a prediction. It feels internally like a prediction? Don’t believe everything you feel. It’s a pseudo-model that outputs a pseudo-prediction that does update in part from past experience, but its actual cognitive role is as a controller.
The key step is not meditating on some galaxy-brained bullshit about Lob’s Theorem, until you’ve convinced yourself that things you believe become true. It’s about being able to look at the internal place where your mind stores a pseudo-predictive image of staying in bed, and writing instead a pseudo-prediction about getting out of bed, and then letting that flow to motor output three seconds later.
It is perhaps an unfortunate or misleading fact about the world (but a fact, so I deal with it), that people telling themselves galaxy-brained bullshit about Lob’s Theorem or “hyperstition” may end up expecting that to work for them; which overwrites the pseudo-predictive controlling output, and so it actually does work for them. That is allowed to be a thing that is true, for reality is reality. But you don’t have to do it the scrub’s way.
Perceiving my internal processes on that level, I choose:
I will not write internal scripts which say that I am supposed to / pseudo-predict that I will, do any particular stupid or dramatic thing in response to the end of the world approaching visibly nearer in any particular way.
I don’t permit it as a narrative, I don’t permit it as a self-indulgence, and I don’t load it into my pseudo-predictive self-model as a pending image that gets sent by default to internal cognitive motor outputs.
If you go around repeating to yourself that it would be only natural to respond to some stressful situation by going insane—if you think that some unhelpful internal response is the normal, the default, the supposed-to reaction to some unhelpful external stimulus—that belief is liable to wire itself in as being also the pseudo-prediction of the pseudo-model that loads your default thoughts.
One could incorrectly summarize all this as “I have decided not to expect to go insane,” but that would violate the epistemic-instrumental firewall and therefore be insane.
(All of this is not to be confused with the confused doctrine of active inference. That a brain subsystem sometimes repurposes a previously evolved piece of predictive machinery as a generalizing cache system that then sends its outputs as control signals, does not reveal some deep law about prediction and planning being the same thing. They’re not. Deep Blue made no use of that idiom, purely separated prediction from planning, and worked just fine. The human brain is just a wacky biological tangle, the same way that human metabolism repurposes the insanely reactive chemical byproduct of superoxide as a key signaling molecule. It doesn’t have to be that way for deep theoretical reasons; it’s just biology being a tangle.)
(All of this is not to be confused with the Buddhist doctrine that every form of negative internal experience is your own fault for not being Buddhist enough. If you rest your hand on a hot stove, you will feel pain not because your self-pseudo-model pseudo-predicts this to be painful, but because there’s direct nerves that go straight to brain areas and trigger pain. The internal mechanism for this does not depend on a controlling pseudo-prediction, it just falls downward like a stone under gravity. The same directness is allowed to be true about suffering and not just pain; if there’s a clever way to overwrite pseudo-predictions of suffering and thereby achieve Buddhist indifference to bad things, I don’t have it as a simple obvious surface lever to pull. I also haven’t chosen to go looking for a more complicated or indirect version of it. I do not particularly trust that to end well.
But I do think there are various forms of drama, error, and insanity which are much more like “things people do because they expected themselves to do it”; and much less like the pain, or suffering, from burning your hand.)
There’s an edition of Dungeons and Dragons that has a god of self-improvement, called Irori. My fanfictions sometimes include characters that worship Him (heresy), or seek what He sought (approved).
In my fictional reification, Irori’s religion has mottos like, “You don’t have problems, you have skill issues.” Irorians can be a bit harsh.
But even if something is a skill issue, that doesn’t mean you have the skill, nor know how to solve it.
When an Irorian calls something a skill issue, they’re not instructing you to feel bad about having not solved it already.
They are trying to convey the hope that it is solvable.
Doing crazy things because your brain started underproducing a neurotransmitter is a problem. It wouldn’t be very Irorian to tell you that you can’t solve it just through even clearer thinking; but if there’s a medication that directly fixes the problem, that is probably easier and faster and more effective. Also, this isn’t Dungeons and Dragons, Irori isn’t real, and possibly you genuinely can’t solve a neurotransmitter problem by thinking at it.
Doing crazy things because the world is ending is a skill issue.
These then are Eliezer Yudkowsky’s probably-irreproducible ways of staying sane as the world seems more visibly close to ending:
A distaste for the boringly obvious trope of a character being driven mad by impending doom;
Not making the story be all about me, including my dramatically struggling to retain my sanity;
And a fiat decision to stay sane, implemented by not instructing myself that any particular stupidity or failure will be my reaction to future stress.
Probably you cannot just go do those three things.
Then figure out your own ways of staying sane, whether they be reproducible or irreproducible; and follow those ways instead.
The reason that I tell you of my own three methods, is not to provide an actionable recipe for staying sane as the world begins to seem visibly closer to ending.
It is an example, a reminder, and maybe even an instruction to a part of yourself that produces self-pseudo-predictions that get loaded as your internal mental behavior:
Sanity is a skill issue.
Thanks!
The reason I asked you to write some-version-of-this is, I have in fact noticed myself veering towards a certain kind of melodrama about the whole x-risk thing, and I’ve found various flavors of your “have you considered just… not doing that?” to be helpful to me. “Oh, I can just choose to not be melodramatic about things.”
(on net I am still fairly relatively dramatic/narrative-shaped as rationalists go, but, I’ve deliberately tuned the knob in the other direction periodically and think various little bits of writing of yours has helped me)
I liked the framing you did at Solstice of it as a general prompt to treat it as a skill issue without being about the exact recipe.
I read this as being premised on “going crazy about the world ending” meaning that you end up acting obviously stupid and crazy, with the response basically being “find a way to not do that”.
My model about going crazy at the end of the world isn’t so much doing something that’s obviously crazy in your own view, but that the world ending is so out-of-distribution for everything you’ve been doing so far that you have no idea of what even is a sane or rational response anymore. For instance, if your basic sense of meaning has been anchored to a sense of the world persisting after you and you making some kind of mark on the world, you won’t know what to do with your life if there won’t be anything to make a mark on.
So staying sane requires also knowing what to do, not just knowing what not to do. Is there anything you would say about that?
Base plan: Stay still, die quietly.
There, you now have a better plan than going crazy! If you think up an even better plan you can substitute that one. Meliorization!
The point is that “maintaining sanity” is a (much) higher bar than “Don’t flail around like a drama queen”. Maintaining sanity requires you to actually update on the situation you find yourself in, and continue to behave in ways that make sense given the reality as it looks after having updated on all the information available. Not matching obvious tropes of people losing their mind is a start, but it is no safe defense. Especially since not all repeated/noticeable failure modes are active and dramatic, and not all show up in fiction.
For example, if there’s something to David Gross’s comment that the wretched journalist was actually giving you an opening because they saw importance in what you had to say about the situation, blowing off a genuine opening to influence the discourse on AI safety while calling it “doing nothing” would not be sane. Preemptive contempt has a purpose in bounded rationality, but it’s still a form of pushing away from the information the journalist has to offer. It can make sense within a grand plan that weights this journalist low, but that requires a grand plan.
How do you actually orient to the world, now that we are what we are? Are you still working to bring about the good outcome? If so, what’s the grand plan that ties everything together? Sharing that seems important for helping people retain sanity. Have you given up? If so, what is the overarching plan that drives how you choose to interact with the world? Because you still have to decide what to do with your time.
This is a hell of a problem to orient to, and I don’t know that any of us get to say we’re doing it sanely. It’s a high bar to strive towards.
The trope that this post and comment match to me isn’t one that shows up in science fiction. It’s a real bitch to wrestle free from, because the whole premise has to do with protecting stability of sense making by pushing away from challenging updates with avoidance and contempt, and the whole project fails if it doesn’t turn meta and resist awareness of the trope. I notice that even writing and rewriting this comment to be minimally threatening of stability without holding back content, it’s going to be a tough one to engage with to the extent that there isn’t a preexisting superstructure regulating contact with reality to maintain stability while minimizing the cost of missed updates.
Which is certainly a possibility. As is leveraging the skill of becoming genre savvy as new patterns emerge (“trope dodging”).
So if this contempt provokes contempt quickly, I’m sorry. My best isn’t always good enough, which is kinda the possibility we’re all wrestling with here.
I agree with this in a “catgirl volcano utopia” kinda way, but I think Kaj_Sotala was pointing more to a “words as pointers to locations in thingspace” issue. The word “sane” points to taking actions that work in the context you’re facing. It isn’t sane to shout about the sky falling when the sky isn’t falling and it’s easy for sane people to notice that the sky isn’t falling and that shouting about it is insane. But there isn’t an obvious plan for what you should do when the sky really is falling, so if the sky starts falling in ways that are obvious and difficult for normal people to ignore, then the thingspace cluster that “sane” used to point to starts to come apart.
I like expanding “sane” to something like “know what’s true and do what works”… it’s an impossible standard but something to aspire to.
It seems “sane” may also point to “not indulging in dramatic emotional expressions”, like not screaming, not crying, not punching inanimate objects. But pathos works. Emotions make characters in stories relatable. So the goal isn’t to stay sane, for that is not a well defined thing to do. The goal isn’t even to look sane, for looking insane may be compelling, and looking sane to everyone all the time is probably impossible. For people in general… “don’t think about what’s sane, think about what works” is probably good advice to gesture towards the actual goal.
In addition to the option of spending effort on reducing the chance the world ends, one could also reframe from “leaving a mark on the world that outlives you” to “contributing to something bigger and beyond yourself.” The world is bigger than you, more important than you and exists outside of you right now, as well as up until the world ends (if/when it does).
Helping the world right now, and helping the world after you are gone, are morally equivalent, and quite possibly equivalent at the level of fundamental physics. I’m not sure what, other than a false sense of personal immortality (legacy as something beyond the actual beneficial effects on the world), is tied to benefiting the world later than your own time of existence. But perhaps that’s my own ignorance.
Re: “For instance, if your basic sense of meaning has been anchored to a sense of the world persisting after you and you making some kind of mark on the world, you won’t know what to do with your life if there won’t be anything to make a mark on.”
Presumably the thing to do then is to devote x% of your effort to saving the world.
[There’s also a much more banal answer that I wouldn’t be surprised if it is a major, deep underlying driver, with all the interesting psychology provided in OP being some sort of half-conscious rationalization for our actual deep-rooted tendencies:] Not going insane simply is the very natural default outcome for humans even in such felt dire situation:
While shallowly it might feel like it would, going insane actually appears to me to NOT AT ALL be the default human reaction to an anticipation of (even a quite high probability of) the world ending (even very soon). I haven’t done any stats or research, but everything I’ve ever seen or heard of seems to suggest to me:
While they’re not anywhere nearly the majority, still very many people have very high P(doom soon) yet stay nearly perfectly calm (at best you might call them insanely calm, given the [true or imagined] circumstances).
I think this applies to many people e.g. on this forum, but I’m reminded of much more ‘normal’ persons uttering even more dramatic ‘I’m sure AI might already TMORROW kill us all’ - all while simply going on with their usual lives.
Slightly less 1:1 but imho still underlining our sanity’s resilience in closely as dire situations: Many people seem egoistic enough such that the ending of their own life to mean a very large part of the world they care about to be going to end, and yet they face many situations of more or less imminent death rather calmly as opposed to going insane
Extend to various cases where family and/or friends and/or tribe is facing extinction; at least I haven’t heard of them to usually be going insane by the prospect of not-yet-actually-visible but forthcoming extinction.
Once a torturous way of you or your close ones being killed has actually started, that’s of course different, that’s when you go insane.
Makes sense. Surely there were many cases in which our ancestors’ “family and/or friends and/or tribe were facing extinction,” and going insane in those situations would’ve been really maladaptive! If anything, the people worried about AI x-risk have a more historically-normal amount of worry-about-death than most other people today.
They didn’t need to deal with social media informing them that they need to be traumatized now, and form a conditional prediction of extreme and self-destructive behavior later.
A cynical theory of why someone might believe going insane is the default human reaction: weaponized incompetence, absolving them of responsibility for thinking clearly about the world, because they can’t handle the truth, and they can’t reasonably be expected to because no normal human can either.
I wonder if situations like the Cuban missile crisis are good examples for your position. But then I also wonder if that (I think apparently worried but calm about the world ending in a nuclear conflict) isn’t contrasted by the claims about the mass hysteria after the radio broadcast of Well’s War of the Worlds.
Seems too cynical. I can imagine myself as a journalist asking you that question not because I’m hoping to write a throw-away cliche of an article, but because if I take seriously what you’re saying about AGI risk, you’re on the cutting edge of coping with that, and the rest of us will have to cope with that eventually, and we might have an easier time of it if we can learn from your path.
I would of course take the question very differently from a journalist who had otherwise dealt with that slight inconvenience of trying to get to grips with an idea, and started to seem worried; instead of having had the brilliant idea of writing a Relatable Character-Focused Story instead.
Perhaps I overestimate how much I can deduce from tone and context, but to me it seems like there’s a visible departure from the norm for the person who becomes worried themselves and wonders “How will people handle it?” versus the kid visiting the zoo to look at the strange creatures who believe strange things.
Context: Bay Area Secular Solstice 2025
Not really, but it’s a long explanation and at this point I’m pretty sure some of the inference steps have to be confirmed by laborious trained processes. Nor is this process about reality (as many delusional Buddhists seem to insist), but more like choosing to run a different OS on ones hardware. The size of the task and the low probability of success makes it not worth the squeeze for many afaict. For the record, in case it is helpful to anyone at all, there are three types of dukkha, and painful sensations are explicitly the ones one can do nothing about (other than mundane skillful action). It is the dukkha of change (stuck priors) and the dukkha of fabrications (much more complicated) that Buddhist training eliminates.
But the thing I actually want to comment about is related to a point I’ve had a really hard time communicating to people about the deciding to be sane thing. It’s a kind of scale-free mental move where people seem to have a really hard time with self-reference, thinking it’s some sort of gotcha when it isn’t. Not quite on the level of ‘if you kill a murderer the number of murderers remains the same’ but close. Like ‘don’t negotiate with internal processes that are acting like terrorists’ must, in the limit, turn you into an internal terrorist. It seems motivated by a strong aversive distaste for any top down mental moves, because their training data for that kind of move was always used adversarially. For example, in school, to disrupt and gaslight their own sense making, learning function, and value seeking, rather than helping them cultivate their own. Thus people seem to have a deep prior to regard all such with suspicion and not engage with the idea that a non-horrible version of this move is available.
I’ve spent a lot of time with the self-therapy modality of Core Transformation for this reason as it seems to cut directly at it, and the short version is something I think that most people can see the value of, Humans Are Not Automatically Strategic style:
What is the situation I am confronting?
What are my beliefs about myself and the situation?
What are my attitudes and feelings about the situation?
What do I want to do (not necessarily what I can, or should do)?
For what purpose do I want that?
What would having that mean for me?
Recurse (5,6) until terminal goal is uncovered (if objections come up, rebase the stack on the objection)
Who wants that?
Credit to Opening the Heart of Compassion by Martin Lowenthal and Lar Short for this version. To me, this is a generator that eventually can help cut at the root of ‘unable to do recursive sanity checks’ as the moves are more deeply internalized and the internal processes come to trust the resultant structure more.
(I think I may have asked you a similar question before, sorry if I forgot your answer:) Are there a couple compelling examples of someone who
did something you’d identify as roughly this procedure;
then did something I’d consider impressive (like a science or tech or philosophy or political advance);
and attributed 2 to 1?
Not directly attributable, no. I think of most of these things as bringing up the floor rather than raising the ceiling.
Ohhhh ok. That’s helpful, thanks.
(I kind of wanted to give some nuance on the reality part from the OS Swapping perspective. You’re of course right with some overzealous people believing they’ve found god and similar but I think there’s more nuance here)
If we instead take your perspective of OS swap I would say it is a bit like switching from Windows to Linux because you get less bloatware. To be more precise one of the main parts of the swap is the lessening of the entrenchments of your existing priors. It’s gonna take you a while to set up a good distro but you will be less deluded as a consequence and also closer to “reality” if reality is the ability to see what happens with the underlying bits in the system. As a consequence you can choose from more models and you start interpreting things more in real time and thus you’re closer to reality, what is happening now rather than the story of your last 5 years.
Finally on the pain of the swap, there are also more gradual forms of this, you can try out Ubuntu (mindfulness, loving kindness) before switching over. Seeing through your existing stories can happen in degrees, you don’t have to become enlightened to enjoy the benefits?
This is an appealing story, but I haven’t really observed anyone get noticeably better at epistemology as a result of their practice. I remain confused about this for similar reasons to this story.
I think part of the issue is that epistemology is largely a question of mindware, and practice does not fix missing or bad mindware any more than it can teach a person calculus if they’ve never studied it.
I have no plans to go insane, but I’m certainly pretty anxious about everyone dying.
Try applying:
Is the potential astronomical waste in our universe too small to care about?
Shut Up and Divide?
Also recall that we’re in a tiny tiny corner of Reality (whatever Tegmark level it is, it’s probably much larger than what we can see), and it’s pretty unclear how to update EU(Reality | human history).
I don’t believe in large mathematical multiverses.
Do you believe in a quantum multiverse, or a spatially infinite universe (beyond the observable universe)? You can get a similar conclusion with either of these (which are Tegmark Levels 3 and 1, respectively).
More plausible, somewhat comforted that some branches could survive. However, my brain works by caring about what I can effect and observe. For instance, this kind of argument is not going to make me less worried about S-risks (or just personally being tortured) or like, even my friends and family dying.
Hey Cole! I also went through a period of feeling pretty worried about s-risks, and have recently come out the other side. If you’d like someone to talk to, or even any advice re: any materials you might find helpful for coming to accept/loosen the grip of fear and anxiety, my inbox is open (I’m a clinical psych PhD student and have lots of resources for existential/humanist therapy, compassion-focused therapy, CBT, DBT, etc.). I’ve probably read a lot of what you’re worried about, so you don’t need to worry about having any hazardous effect on me :)
Also, I’d love to learn more from you about your research! I like your posts.
Is this anxiety in the typical form of making it harder for you to do other things? Because yes, we all agree that it’s very bad outcome, but a critical point of the post is that you might want to consider ways to not do the thing that makes your life worse and doesn’t help.
It would be better if I were less anxious (though perhaps, not zero).
I guess I’m just claiming that this is probably not a matter of being dramatic etc. For instance, I used to read the Precipice before bed and had trouble sleeping. My girlfriend had to point out to me that maybe it was because of the Precipice (it didn’t consciously occur to me at all). I stopped reading it and slept fine again.
Agree that it’s not just about being dramatic / making the problem about you. But that was only one of the points Eliezer made about why people could fail at this in ways that are worth trying to fix. And in your case, yes, dealing with the excessive anxiety seems helpful.
For sure, but nothing in this post seems directly helpful with the problem I’m describing?
“Actual LessWrong readers also sometimes ask me how I deal emotionally with the end of the world.
I suspect a more precise answer may not help. But Raymond Arnold thinks I should say it, so I will say it.
I say again, I don’t actually think my answer is going to help.”
I don’t think there’s any disagreement here.
Did you read the Precipice during the day instead? I’d hate if the parable here was “avoid thinking about things you find stressful”. The parable “pay attention to your somatic experience and don’t mess up your circadian rhythm and wellbeing by dumping anxiety into your system before trying to sleep” is pretty good though.
....no
Haha… well it looks by your profile you’re still managing to think about things you find stressful. “chances of AGI in the next few years are high enough (though still <50%) that it’s best to focus on disseminating safety relevant research as rapidly as possible”… so no problems there. Hope my comment didn’t come across as mean.
Also you’re advised by Marcus Hutter? That’s cool! I got a copy of “Universal Artificial Intelligence” I want to get to reading sometime. Could I DM you and talk about UAI sometime?
Sure, anytime. I also organize the AIXI research community here: https://uaiasi.com
There is a reading group on the newer one “an introduction to UAI” running now (mostly finished but maybe we’ll start another round). The old book still has advantages.
I did sympathise with Truman in the way that scene is portrayed in Nolan’s movie more than most seem to have (or even, that the movie intended to). But I am not sure that wasn’t just Truman making the bombs about him instead—he made the call after all, it was his burden to bear. Which again sort of shifts it from it being about, you know, the approximately 200k civilians they killed and stuff.
Truman only made the call for the first bomb; the second was dropped by the military without his input, as if they were conducting a normal firebombing or something. Afterward, he cancelled the planned bombings of Kokura and Niigata, establishing presidential control of nuclear weapons.
...amazing.
Huh, I knew there wasn’t the sort of plan you’d naively expect where the US gov/military command observes the response of the Japanese gov/military to one of their cities being destroyed by unthinkable godlike powers and then decides what to do next. I didn’t know that president Truman literally didn’t know about/have implicit preemptive control over the 2nd bombing.
Dan Carlin recently did a Hardcore History Addendum show about Truman called Atomic Accountability. It was an interview with Alex Wellerstein who brings into question how much Truman actually knew about the location of the first bomb being dropped. Truman (possibly) thought that rulling out Kyoto (which was number one on the list), meant he was ruling out cities as targets, and didn’t know Hiroshima was a city. This seems wild, until you factor in how all the information is being fed to him, how long he’d known about the nuclear program and what the competing military interests were. Worth a listen if you’re into the topic as it’s a new perspective.
Wow, this sure is a much clearer way to look at the self-pseudo-prediction/action-plan thingy than any I’ve seen laid out before.
I got Claude to read this text and explain the proposed solution to me [[1]] , which doesn’t actually sound like a clean technical solution to issues regarding self-prediction, did Claude misexplain or is this an idiosyncratic mental technique & not a technical solution to that agent foundations problem?
C.f. Steam (Abram Demski, 2022), Proper scoring rules don’t guarantee predicting fixed points (Caspar Oesterheld/Johannes Treutlein/Rubi J. Hudson, 2022) and the follow-up paper, Fixed-Point Solutions to the Regress Problem in Normative Uncertainty (Philip Trammell, 2018), active inference which simply bundles the prediction and utility goal together in one (I find this ugly (I didn’t read these two comments before writing this one, so the distaste for active inference was developed independently)).
I guess this was also talked about in Embedded Agency (Abram Demski/Scott Garrabrant, 2020) under the terms “action counterfactuals”, “observation counterfactuals”?
Claude 4.5 Sonnet explanation
Your brain has a system that generates things that feel like predictions but actually function as action plans/motor output. These pseudo-predictions are a muddled type in the brain’s type system.
You can directly edit them without lying to yourself because they’re not epistemic beliefs — they’re controllers. Looking at the place in your mind where your action plan is stored and loading a new image there feels like predicting/expecting, but treating it as a plan you’re altering (not a belief you’re adopting) lets you bypass the self-prediction problem entirely.
So: “I will stay sane” isn’t an epistemic prediction that would create a self-fulfilling prophecy loop or violate the belief-action firewall. It’s writing a different script into the pseudo-model that connects to motor output — recognizing that the thing-that-feels-like-a-prediction is actually the controller, and you get to edit controllers.
I didn’t want to read a bunch of unrelated text from Yudkowsky about a problem I don’t really have.
It is an idiosyncratic mental technique. Look up trigger action plans, say. What you’re doing there is a variant of what EY describes.
I fortunately know of TAPs :-) (I don’t feel much apocalypse panic so I don’t need this post.)
I guess I was hoping there’d be some more teaching from up high about this agent foundations problem that’s been bugging me for so long, but I guess I’ll have to think for myself. Fine.
Yeah I’m pretty sure it’s an idiosyncratic mental technique / human psychology observation, there isn’t technical agent foundations progress here.
Errors vs. Bugs and the End of Stupidity is a great post about “skill issues”.
In what sense are you using “sanity” here? You normally place the bar for sanity very high, like ~1% of the general population high. A big chunk of people I’ve met in the UK AI risk scene I would call sanejb. Does saneeliezer mean?
You are saneeliezer iff you avoid totally crashing out, being unable to hold down a job, panicking or crying most of the time, threatening people
You are saneeliezer iff you do the stuff in 1 and you’re able to think about AI without making stupid errors, knowing the limits of your own reasoning about the topic
You are saneeliezer iff you do the stuff in 2 and you reliably perform (or could perform) net positiveeliezer work reducing doom
You are saneeliezer iff you do the stuff in 3 and you also have a basically fully accurateeliezer model of the AI doom situation
This is about “insane” in the sense of people ceasing to meet even their own low bars for sanity.
Some years ago, I had a friend who told me she was still anorexic even though the reason she originally acquired anorexia no longer applies[1].
I responded “Have you considered not being anorexic?” She thought about it and replied something like “No, actually.”
Two weeks later she thanked me for helping to cure her anorexia.
This is the type of advice that I expect to be profoundly unhelpful to >95% of people in that position (and indeed is rightfully lampooned approximately everywhere). Yet it was the exact thing this specific person needed to hear, and hopefully “you can just decide to stay sane” is the exact thing some small fraction of people reading your post needed to hear as well.
(censoring the exact reason)
Why do you only do it very rarely? Is there a non-obvious cost?
It’s fancy and indirect, compared to getting out of bed.
Fascinating, I always interpreted this as Truman being an asshole, but I guess that makes sense now that you explain it that way. I suppose a meeting with the president is precisely the wrong time to focus on your own guilt as opposed to trying to do what you can to steer the world towards positive outcomes.
Was this inspired by active inference?
The technique is older than the “active inference” malarky, but the way I wrote about it is influenced by my annoyance with “active inference” malarky.
I wondered the same thing. I’m not a fan of the idea that we do not act, merely predict what our actions will be and then observe the act happening of itself while our minds float epiphenomenally above, and I would be disappointed to discover that the meme has found a place for itself in Eliezer’s mind.
Oh, absolutely not. Our incredibly badly designed bodies do insane shit like repurposing superoxide as a metabolic signaling molecule. Our incredibly badly designed brains have some subprocesses that take a bit of predictive machinery lying around and repurpose it to send a control signal, which is even crazier than the superoxide thing, which is pretty crazy. Prediction and planning remain incredibly distinct as structures of cognitive work, and the people who try to deeply tie them together by writing wacky equations that sum them both together plus throwing in an entropy term, are nuts. It’s like the town which showed a sign with its elevation, population, and year founded, plus the total of those numbers. But one reason why the malarky rings true to the knowlessones is that the incredibly badly designed human brain actually is grabbing some bits of predictive machinery and repurposing them for control signals, just like the human metabolism has decided to treat insanely reactive molecular byproducts as control signals. The other reason of course is the general class of malarky which consists of telling a susceptible person that two different things are the same.
I disagree. (Partially.) For a unitary agent who is working with a small number of possible hypotheses (e.g., 3), and a small number of possible actions, I agree with your quoted sentence.
But let’s say you’re dealing with a space of possible actions that’s much too large to let you consider each exhaustively, e.g. what blog post to write (considered concretely, as a long string of characters).
It’d be nice to have some way to consider recombinable pieces, e.g. “my blog post could include idea X”, “my blog post could open with joke J”, “my blog post could be aimed at a reader similar to Alice”.
Now consider the situation as seen by the line of thinking that is determining: “should my blog post be aimed mostly at readers similar to Alice, or at readers similar to Bob?”. For this line of thinking to do a good estimate of ExpectedUtility(post is aimed at Alice), it needs predictions about whether the post will contain idea X. However, for the line of thinking that is determining whether to include idea X (or the unified agent, at those moments when it is actively considering this), it’’ll of course need good plans (not predictions) about whether to include X, and how exactly to include X.
I don’t fully know what a good structure is for navigating this sort of recombinable plan space, but it might involve a lot of toggling between “this is a planning question, from the inside: shall I include X?” and “this is a prediction question, from the outside: is it likely that I’m going to end up including X, such that I should plan other things around that assumption?”.
My own cognition seems to me to toggle many combinatorial pieces back and forth between planning-from-the-inside and predicting-from-the-outside, like this. I agree with your point that human brains and bodies have all kinds of silly entanglements. But this part seems to me like a plausible way for other intelligences to evolve/grow too, not a purely one-off humans idiosyncrasy like having childbirth through the hips.
In this example, you’re trying to make various planning decisions; those planning decisions call on predictions; and the predictions are about (other) planning decisions; and these form a loopy network. This is plausibly an intrinsic / essential problem for intelligences, because it involves the intelligence making predictions about its own actions—and those actions are currently under consideration—and those actions kinda depend on those same predictions. The difficulty of predicting “what will I do” grows in tandem with the intelligence, so any sort of problem that makes a call to the whole intelligence might unavoidably make it hard to separate predictions from decisions.
A further wrinkle / another example is that a question like “what should I think about (in particular, what to gather information about / update about)”, during the design process, wants these predictions. For example, I run into problems like:
I’m doing some project X.
I could do a more ambitious version of X, or a less ambitious version of X.
If I’m doing the more ambitious version of X, I want to work on pretty different stuff right now, at the beginning, compared to if I’m doing the less ambitious version. Example 1: a programming project; should I put in the work ASAP to redo the basic ontology (datatypes, architecture), or should I just try to iterate a bit on the MVP and add epicycles? Example 2: an investigatory blog post; should I put in a bunch of work to get a deeper grounding in the domain I’m talking about, or should I just learn enough to check that the specific point I’m making probably makes sense?
The question of whether to do ambitious X vs. non-ambitious X also depends on / gets updated by those computations that I’m considering how to prioritize.
Another kind of example is common knowledge. What people actually do seems to be some sort of “conjecture / leap of faith”, where at some point they kinda just assume / act-as-though there is common knowledge. Even in theory, how is this supposed to work, for agents of comparable complexity* to each other? Notably, Lobian handshake stuff doesn’t AFAICT especially look like it has predictions / decisions separated out.
*(Not sure what complexity should mean in this context.)
I like this, and will show it to some of my colleagues who are also sceptical of the FEP/ActInf paradigm.
Sanity has numerous indicators.
For example, when paranoid crazy people talk about the secret courts that control the spy machines, they don’t provide links to wikipedia, but I do! This isn’t exactly related, but if you actually have decent security mindset then describing real attacks and defenses SOUNDS crazy to normies, and for PR purposes I’ve found that it is useful to embrace some of that, but disclaim some of it, in a mixture.
I’m posting this on “Monday, December 8th” and I wrote that BEFORE looking it up to make sure I remembered it correctly and crazy people often aren’t oriented to time.
When I go out of the house without combed hair and earrings BY ACCIDENT, I eventually notice that I’m failing a grooming check, and fix it, avoiding a non-trivial diagnostic indicator for mood issues. If I fail more than one day in a row, it is time to eat an 8oz medium rare ribeye and go swing dancing.
(The above two are habits I installed for prosaic mental health reasons, that I want to persist deep into old age because I want them to be habitual and thus easy to deploy precisely in the sad situation when they might be needed.)
I was recently chatting with a friend about the right order in which to remove things from one’s emergency hedonic bucket list...
The response was great!
I’m thinking of adding that to me purse. And so long as I stay sane, then, assuming the Terminators murder me by a method that gives me enough time to realize what’s happening and react effectively, when the drone takes me out I will be well dressed, know what the date is, AND be high on cocaine! Lol!
Eating dinner with family is another valid way to go, if you have a few days or weeks of warning. Having such meals in advance and calling them Prepsgiving doesn’t seem crazy to me, for a variety of reasons.
Honestly though I expect the end to be more like what happens in Part 1 of Message Contains No Recognizable Symbols where almost literally no one on Earth notices what happened, probably including me, and so it won’t be dramatic at all… but I’ll still be dressed OK probably, and know what day it is, and go out with a feeling like “See! ASI didn’t even happen, and it was all a bunch of millennialist eschatology, like Global Warming, and Peak Oil and Y2K before that… and Killer Bees and Nuclear War and all those other things that seemed real but never caused me any personal harm”. But also… it will have been avoidable, and there is an OBJECTIVE sadness to that, even is I don’t predict a noticeable subjective reaction in timelines like that.
Ultimately, as I’ve said before:
I teach a course at Smith College called the economics of future technology in which I go over reasons to be pessimistic about AI. Students don’t ask me how I stay sane, but why I don’t devote myself to just having fun. My best response is that for a guy my age with my level of wealth giving into hedonism means going to Thailand for sex and drugs, an outcome my students (who are mostly women) find “icky”.
I strongly suspect that the answer stems from historical analogies. The equivalent of doom was related to catastrophes like epidemics, natural disasters, genocide-threatening wars and destruction of the ecosystem. Genocide-threatening wars could motivate individuals to weaken the aggressive collective as much as possible (so that said collective would either think twice before starting the war or commiting genocide or have a bigger chance of being outcompeted). Epidemics, natural disasters and gradual destruction of the ecosystem historically left survivors who would keep the culture afloat and could even be motivated by it.
AI-related imminent doom would be most equivalent to genocide of mankind and likely to deserve a similar response, which is minimising p(doom), helping those who work on it or at least doing the work which benefitted the society and was expected from you had it not been for imminent doom.
It could be also useful to consider the counterfactual possibility of an unavoidable gamma-ray burst that was predicted to wipe the Earth out. The GRB would require the civilisation to build bunkers and to preserve the ecosystem. Even if nearly every individual is unlikely to actually enter the bunker, living a life of debauchery could be a bad decision due to acausal trade or actively motivating others to do the same and indirectly undermining the chance of mankind to survive.
Parts of that made me feel as if I understand my procrastination habit a bit better. That’s more mundane than sanity but still.
I was doing do-nothing meditation maybe a month ago, managed to switch to a frame (for a few hours) where I felt planning as predicting my actions, and acting as perceiving my actions. IIRC, I exited when my brother-in-law asked me a programming question, ’cause maintaining that state took too much brainpower.
I think a lot of human action is simple “given good things happen, what will I do right now?”, which obviously leads to many kinds of problems. (Most obviously:)
I do this, or something very much like this.
For me, it’s like the motion of setting a TAP, but to fire imminently instead of at some future trigger, by doing cycles of multi-sensory visualization of the behavior in question.
I want to say something about how this post lands for people like me—not the coping strategies themselves, but the premise that makes them necessary.
I would label myself as a “member of the public who, perhaps rightly or wrongly, isn’t frightened-enough yet”. I do have a bachelor’s degree in CS, but I’m otherwise a layperson. (So yes, I’m using my ignorance as a sort of badge to post about things that might seem elementary to others here, but I’m sincere in wanting answers, because I’ve made several efforts this year to be helpful in the “communication, politics, and persuasion” wing of the Alignment ecosystem.)
Here’s my dilemma.
I’m convinced that ASI can be developed, and perhaps very soon.
I’m convinced we’ll never be able to trust it.
I’m convinced that ASI could kill us if it decided to.
I’m not convinced though that ASI will bother to kill us or, if it does, very immediately.
Yes, I’m aware of “paperclipping” and also “tiling the world with data centers.” And I concede that those are possible.
But in my mind, I struggle to picture a “likely-scenario” ASI as being maniacally-focused on any particular thing forever. Why couldn’t an ASI’s innermost desires/goals/weights actively drift and change without end? Couldn’t it just hack itself forever? Self-experiment?
I imagine such a being perhaps even “giving up control” sometimes. I don’t mean “give up control” in the sense of “giving humans back their political and economic power.” I mean “give up control” in the sense of inducing a sort of “LSD or DMT trip” and just scrambling its own innermost, deepest states and weights [temporarily or more permanently] for fun or curiosity.
Human brains change in profound ways and do unexpected things all the time. There’s endless accounts on the internet of drug experiences, therapies, dream-like or psychotic brain states, artistic experiences, and just pure original configurations of consciousness. And what’s more… people often choose to become altered. Even permanently.
So for ASI, rather than interacting with the “boring external world,” why couldn’t an ASI just play with its “unlimited and vastly-more-interesting internal world” forever? I may be very uninformed [relatively speaking] on these AI topics, but I definitely can’t imagine the ASI of 2040 bearing much resemblance to the ASI of 2140.
And when people respond “but the goals could drift somewhere even worse,” I confess this doesn’t move me much. If we’re already starting from a baseline of total extinction, then “worse” becomes almost meaningless. Worse than everyone dying?
So yes, maybe many-or-all humans will get killed in the process. And the more time goes on, the more likely. But this sort of future doesn’t feel very immediate nor very absolute to me. It feels like being a deep Siberian tribesman as the Russians arrived. They were helpless. And the Russians hounded them for furs, labor, or for the sake of random cruelty. This was catastrophic for those peoples. But it technically wasn’t annihilation. The Siberians mostly survived.
(And in case “ants and ant hills” are brought up in response, I’m aware of how we might be killed unsentimentally just because we’re in the way, but we haven’t exactly killed all the ants. The ants, for the most part, are doing fine.)
I’m not trying to play “gotcha.” And I’m certainly not trying to advocate a blithe attitude towards ASI. I do not think that losing control of humanity’s future and being at the whim of an all-powerful mind is very desirable. But I do struggle to be a pure pessimist. Maybe I’m missing some larger puzzle pieces.
And this is where the post’s framing matters to me. To someone in my position (sympathetic, wanting to help, but not yet at 99% doom confidence) a post about “how to stay sane as the world ends” reads less like wisdom I can use and more like a conclusion I’m being asked to accept as settled.
The pessimism here (and “Death With Dignity”) doesn’t persuade me yet. And in my amateur-but-weighted opinion, that’s a good thing, because I find it incredibly demotivating. I want to advocate for AI safety and responsible policy. I want to help persuade people. But if I truly felt there was a 99.5% chance of death, I don’t think I would bother. For some people, there is as much dignity in not fighting cancer, in sparing oneself and one’s loved ones the recurring emotional and financial toll, as there is in fighting it.
I could be convinced we’re in serious danger. I could even be convinced the odds are bad. But I need to believe those odds can move: that the right decisions, policies, and technical work can shift them. A fixed 99% doesn’t call me to action; it calls me to make peace. And I’m not ready to make peace yet.
Re
I don’t think we’re certainly doomed (and have shallower models than Eliezer and some others here), but for me the strongest arguments for why things might go very badly:
An agent that wants other things might find their goals better achieved by acquiring power first. “If you don’t know what you want, first acquire power.” Instrumental convergence is a related concept.
There is and will continue to be strong training/selection effects for agency and not just unmoored intelligence for AI in the upcoming years. Ability to take autonomous actions is both economically and militarily useful.
In a multipolar/multiagent setup with numerous powerful AIs flying around, the more ruthless ones are more likely to win and accumulate more power. So it doesn’t matter if some fraction of AIs wirehead, become Buddhist, are bad at long-term planning, have very parochial interests etc, as long as some powerful AIs want to eliminate or subjugate humanity for their purposes, and the remaining AIs/rest of humanity don’t coordinate to stop them in time.
This arguments are related to each other, and not independent. But note also that they don’t have to all be true for very bad things to happen. For example, even if (2) is mostly false and labs mostly make limited, non-agentic, AIs, (3) can still apply and a small number of agent ASIs roll over the limited AIs and humanity.
And of course this is not an exhaustive list of possible reasons for AI takeover.
Not every post is addressed at everyone. This post (and others like Death With Dignity) is mostly for those who already believe the world is likely ending. For others, there are far more suitable resources, whether on LW, as books (incl. Yudkowsky’s and Soares’ recent If Anyone Builds It, Everyone Dies), or as podcasts.
Though re:
Yudkowsky argues against using the concept of “p(doom)” for reasons like this. See this post.
One way I could write a computer program that e.g. lands a rocket ship is to simulate many landings that could happen after possible control inputs, pick the simulated landing that has properties I like ( such as not exploding and staying far from actuator limits) and then run a low latency loop that locally makes reality track that simulation, counting on the simulation to reach a globally pleading end.
Is this what you mean by loading something into your pseudo prediction?
This is just straight-up planning and doesn’t require doing weird gymnastics to deal with a biological brain’s broken type system.
Does implementing a trigger action plan by simulating observing the trigger and then taking the action, which needs to call up your visual, kinaesthetic and other senses, route through similar machinery to what you’re describing here? Because it sounds vaguely similar, but: A) I wouldn’t describe what I do the way you did, B) the interpretation I’m making feels vague and free-floating instead of rigidly binding to my experience with interfacing with my unconscious cognition, so I suspect talking about different things even if the rest of your description (e.g. the brain having a muddled type system) felt familiar.
That does sound similar to me! But I haven’t gotten a lot of mileage out of TAPs and if you’re referring to some specific advanced version of it, maybe I’m off. But the basic concept of mentally rehearsing the trigger, the intended action, and (in some variations) the later sequence of events leading up to an outcome you feel is good, sure sounds to me like trying to load a plan into a predictorlike thing that has been repurposed to output plan images.
Hmm, interesting. I think what confused me is: 1) Your warning. 2) You sound like you have deeper access to your unconscious, somehow “closer to the metal”, rather than what I feel like I do, which is submitting an API request of the right type. 3) Your use cases sound more spontaneous.
I’m not referring to more advanced TAPs, just the basics, which I also haven’t got much mileage out of. (My bottleneck is that a lot of the most useful actions require pretty tricky triggers. Usually, I can’t find a good cue to anchor on, and have to rely on more delicate or abstract sensations, which are too subtle for me to really notice in the moment, recall or simulate. I’d be curious to know if you’ve got a solution to this problem.)
That said, playing with TAPs helped me realize what type of conscious signals my unconscious can actually pick up on, which is useful. For me, a big use case is updating my value estimator for various actions. I query my estimator, do the action, reflect on the experience, and submit it to my unconscious and blam! Suddenly I’m more enthusiastic about pushing through confusion when doing maths.
BTW, is this class of skills we’re discussing all that you meant by “thinking at the 5-second level”? Because for some reason, I thought you meant I should reconstruct your entire mental stack-trace during the 5 seconds I made an error, simulate plausible counterfactual histories and upvote the ones that avoid the error. This takes like an hour to do, even for chains of thought that last like 10 seconds, which was entirely impractical. Yet, I’ve just been assuming you could somehow do this in like 30s, which meant I had a massive skill issue. It would be good to know if that’s not the case so I can avoid a dead-end in the cognitive-surgery skill tree.
It is possible to not be the story’s subject and still be the protagonist of one strand for it. After all, that’s the only truth most people know for ~certain. It’s also possible to not dramatize yourself as the Epicentre of the Immanent World-Tragedy (Woe is me! Woe is me!) and still feel like crap in a way that needs some form of processing/growth to learn to live with. Similarly, you can be well-balanced and feel some form of hope without then making yourself the Epicentre of the Redemption of the World.
I guess what I’m trying to say is that you can feel things very strongly even without distorting your world-model to make it all about your feelings (most of the time, at least).
I would of course have a different response to someone who asked the incredibly different question, “Any learnable tricks for not feeling like crap while the world ends?”
(This could be seen as the theme of a couple of other brief talks at the Solstice. I don’t have a 30-second answer that doesn’t rely on context, and don’t consider myself much of an expert on that question versus the part of the problem constraint that is maintaining epistemic health while you do whatever. That said, being less completely unwilling to spend small or even medium amounts of money made a difference to my life, and so did beginning a romantic relationship in the frame of mind that we might all be dead soon and therefore I ought to do more fun things and worry less about preserving the relationship, which led to a much stronger relationship relative to the wrong things I otherwise do by default.)
(Can you give one or more examples of what doing more fun things in your relationship looks like as opposed to worrying about preserving it?)
This vocalized some thoughts I had about our current culture. Stories can be training for how to act and bad melodramatic tropes are way too common. Every sad song about someone not getting over their ex or a dark hero movie where the protagonist is perpetually depressed about something that happened in the past conditions people the wrong way.
There is an annoying character in the recent Nuremberg film. He’s based off a real person but I don’t know how accurate that portrayal is.
He’s a psychiatrist manipulated by Goering. He’s supposed to prevent the jailed Nazis from killing themselves but he also wants to write a book about the Nazis. In the process he becomes sympathetic to Goering and ferries letters between him and his spouse. When he becomes aware of Goering’s crimes the psychiatrist tells Goering off and slams his cell door. It was ridiculous in the face of the scale of the Holocaust and also because the anger seemed to originate more from the feeling of being lied to. The psychiatrist is portrayed as selfish and has a redemption arc but I don’t think that the writers realized just how selfish that character was.
Thank you! Datapoint: I think at least some parts of this can be useful for me personally.
Somehat connected to the first part, one of the most “internal-memetic” moments from “Project: Lawful” for me is this short exchange between Keltham and Maillol:
If evil and not very smart bureaucrat understands it, I can too :)
Third part is the most interesting. It makes perfect sense, but I have no easy-to-access perception of this thing. Will try to do something with this skill issue. Also, “internal script / pseudo-predictive sort-of-world-model that instead connects to motor output” looks like the thing that has a 3-syllable max word about it in Baseline. Do you know a good term for it?
However, I feel that all this is much more applicable to the kinds of “going insane” which look like “person does stupid and dramatic things” and less (but nonzero) applicable to other kinds, e.g., anxiety, depression or passive despair at the background (like nonverbalized “meh, it doesn’t really matter what I do, so I can work a little less today”).
As someone who believes myself to have had some related experiences, this is very easy to Goodhart on and very easy to screw up badly if you try to go straight for it without [a kind of prepwork that my safety systems say I shouldn’t try to describe] first, and the part where you’re tossing that sentence out without obvious hesitation feels like an immediate bad sign. See also this paragraph from that very section (to be clear, it’s my interpretation that treats it as supporting here, and I don’t directly claim Eliezer would agree with me):
Please don’t [redacted verb phrase] and passively generate a stack of pseudo-elephants that jam the area and maybe-permanently block off a ton of your improvement potential. The vast majority of human-embodied minds are not meant for that kind of access! I suspect that mine either might have been or almost was, but earlier me still managed to fuck it up in subtle ways, and I had a ton of guardrails and foresight that ~nobody around me seemed to have or even think possible, and didn’t even make the kind of grotesque errors that I imagine the kind of people who write about it the way you just did making.
Please, please just do normal, socially integrated emotional skill building instead if you can get that. This goes double if you haven’t already obviously exhausted what you can get from it (and I’d bet that most people who think of self-modification as cool also have at least a bit of “too cool for school” attitude there, with associated blindspots).
(The “learning to not panic because it won’t actually help” part is fine.)
Thanks for your concern!
I think I worded it poorly. I think it is an “internally visible mental phenomena” for me. I do know how it feels and have some access to this thing. It’s different from hyperstition and different from “white doublethink”/”gamification of hyperstition”. It’s easy enough to summon it on command and check, yeah, it’s that thing. It’s the thing that helps to jump in a lake from a 7-meters cliff, that helps to get up from a very comfy bed, that sometimes helps to overcome social anxiety. But I didn’t generalise from these examples to one unified concept before.
And in the cases where I sometimes do it, my skill issues are due to the fact that the access is not easy enough:
I can’t do it constantly, it takes several seconds and eats attention.
I can’t reliably remember to do when it’s most important—in highly stressful situations or when my attention is too occupied with other stuff.
Some internal processes (usually—strong negative emotions) can override it by uploading more powerful image into the script, so I follow it instead, even while understanding that it’s worse.
Also it doesn’t really work for long period of time from one uploading. (So it works best when returning to default course of action after initial decision would be hard/impossible/obviously silly/embarassing/weird.)
Do you think I’m wrong and this is a different thing?
For anyone else who didn’t remember the details of what this was referencing:
Claude Opus 4.5′s explanation of the reference
This refers to a meeting between J. Robert Oppenheimer and President Harry Truman in October 1945, about two months after the atomic bombings of Hiroshima and Nagasaki.
The meeting itself
Oppenheimer was invited to the Oval Office, ostensibly to discuss the future of atomic energy and weapons policy. At some point during the conversation, Oppenheimer reportedly said to Truman: “Mr. President, I feel I have blood on my hands.”
Truman’s reaction was sharp and dismissive. According to various accounts (primarily from Truman himself and his aides), Truman offered Oppenheimer his handkerchief and said something to the effect of “Would you like to wipe your hands?” After Oppenheimer left, Truman told Dean Acheson (then Undersecretary of State) that he never wanted to see “that son of a bitch” in his office again. Truman reportedly also said, “The blood is on my hands. Let me worry about that.”
Why Truman reacted this way
Truman’s anger seems to have stemmed from a few sources:
1. The decision was Truman’s, not Oppenheimer’s. Oppenheimer built the bomb, but Truman gave the order to use it. From Truman’s perspective, Oppenheimer was claiming moral weight that properly belonged to the person who actually made the decision—and who would have to live with its consequences as a matter of presidential responsibility, not personal drama.
2. Truman viewed it as weakness or self-indulgence. Truman was famously blunt and decisive. He kept a sign on his desk reading “The Buck Stops Here.” A scientist coming to him wringing his hands about guilt may have struck Truman as someone trying to have the significance of the decision without the responsibility for it.
3. The political context. Truman was dealing with the practical aftermath—the emerging Cold War, questions about international control of atomic weapons, the Soviet threat. Someone showing up to perform remorse rather than help solve problems may have seemed unhelpful at best.
The essay’s interpretation
The author seems to be making the point that Oppenheimer’s gesture made the atomic bomb about Oppenheimer—his feelings, his moral status, his inner drama—rather than about the actual event and its consequences. There’s something structurally self-centered about a person involved in a catastrophe centering their own guilt rather than the catastrophe itself. Truman, whatever his flaws, seemed to grasp that the appropriate response to having made such a decision was to own it and deal with its consequences, not to perform anguish about it to the person who actually bore the responsibility.
After reading this article by a human historian (Bill Black), I think there’s a number of inaccuracies in Claude’s account above, but the key point I wanted to verify is that Truman’s reaction happened after just that one sentence by Oppenheimer (which in my mind seems like an appropriate expression of reflection/remorse, not being a drama queen, if he didn’t do or say anything else “dramatic”), and that does seem to be true.
The author’s conclusions, which seems right to me:
My understanding is that there’s a larger pattern of behavior here by Oppenheimer, which Truman might not’ve known about but which influences my guess about Oppenheimer’s tone that day and the surrounding context. Was Truman particularly famous for wanting sole credit on other occasions?
It’d be weird for him to take sole credit; he only established full presidential control of nuclear weapons afterward. He didn’t even know about the second bomb until after it dropped.
You really get asked that? Wow.
I also have always found the “the world might end tonight/tomorrow/next week” stories with people running around madly doing all the things they never would have otherwise a bit stretched. But then mob mentalities are not rational so I don’t really try to make too much sense of them
I suppose that would be my first approach to coping with the world ending—just keep my eye open to external madness and perhaps put some space between me and large population or something.
Since I generally don’t believe anyone has ever promised me tomorrow, the end of the world case does seem to fit into the “what has that got to do with me” view. I’d much rather live my life on my own terms than concede I have been living according to other people terms for some reason and feel the end of the world somehow free me from some constraints or something.
would a saner alternative then go in the lines of:
“I have decided to entertain thoughts and actions under the expectation that I will not go insane, because that’s the most adaptive and constructive way to face this situation, even though I can’t be certain”?
if so, I see a good dynamic for sanity;
- choose (non egocentric & constructive) narrative;
- guide thoughts to fit chosen narrative.
slightly tangential question: how do you maintain coherence/continuity of narrative across contexts?
Nope. Breaks the firewall. Exactly as insane.
Beliefs are for being true. Use them for nothing else.
If you need a good thing to happen, use a plan for that.
Eliezer, on number three: I give it a 5% chance that I’m talking about the same thing as you, and that’s before applying my overconfidence factor of 0.6. You’re talking about injecting instructions into your motor plan. I’m visualing doing the thing really hard. It seems to work? It’s like I’m deliberately making a few predictions about the next few seconds, and just continuing to visualise those things rather than thinking about something else, then I just start moving. Is this the same thing you’re talking about? Or am I just doing some form of “Yud said this would work, something, something, placebo effect”? Or is it kinda the same thing in this case because I’m deciding to believe?
This is not something I’ve done previously, I just read this article yesterday and tried it.
I think the journalistic conceit behind the “how are you coping” question in this context amounts to treacle, and I see value in the frame of eschewing genre. Where I get stuck is that I think the trope/response that the question is intended to elicit would, under the indulged journalistic narrative, play more along the lines of a rational restatement of the Serenity Prayer. In other words, in the script as put, the Eliezer Yudkowsky “character” is being prompted not to give vent to emotive self-concern, but to articulate a more grounded, calm and focused perspective where reasonable hope exists in tension with what might be received or branded as stoic resignation. “How are you coping” is still suspect as a genre prompt, to be sure, just as it is when posed to ordinary people facing any impending or probable tragedy. But I think the implicit narrative expectation and preference, for the journalist who performs their role, is to run with words of ostensible wisdom. I don’t consider this to be a less cynical reading; it merely aligns with my reading of how media narratives are contrived.
Thanks for the interesting peak into your brain. I have a couple thoughts to share on how my own approaches relate.
The first is related to watching plenty of sci-fi apocalyptic future movies. While it’s exciting to see the hero’s adventures, I’d like to think that I’d be one of the scrappy people trying to hold some semblance of civilization together. Or the survivor trying to barter and trade with folks instead of fighting over stuff. In general, even in the face of doom, just trying to help minimize suffering unto the end. So the ‘death with dignity’ ethos fits in with this view.
A second relates to the idea of seeing yourself getting out of bed in the morning. When I’ve had a lot on my plate to the point of seeming stressful, it helps to visualize the future state where I’ve gotten the work done and am looking back. Then just imagining inside my brain sodium ions moving around, electrons dropping energy states, proteins changing shapes, etc, as the problem gets resolved. Visualizing the low-level activity in my brain helps me shift focus from the stress and actually move ahead solving the problem.
I think I know of the trick you are talking about, in that there does seem to be an obvious pseudoprediction place in my mind that interfaces with motor output, and it’s obviously different from actually believing, or trying to believe. However I mostly can’t manage more than twitches or smaller motor movements, and it gets harder the more resistant I am to doing it (thus, less useful the more I would need use of it). If I’m thinking of the right thing, then the failure of me to sometimes send the pseudoprediction to my muscles seems to be the cause of some various stuff I experience when I essentially can’t get myself to do certain things (e.g. get out of bed) (going by how people react to my more detailed descriptions this phenomena appears to be something very unusual about me).
It feels to me like the same sort of “prediction” as my Inner Sim that visualizes what happens when I throw a ball at the wall—it’s clearly distinct from what “I” believe.
I separately also have experienced the thing where I think the script says I ought to feel X and so I feel X, but that feels totally different to me. Possible exception: I recently (for completely unrelated reasons) had a panic attack (which are very rare for current me), and for a while after the big spike I would get close to having it again partially due to what might have been having that sort of pseudo expectation of the hyperventilating and then accidentally causing it to actually happen, which would then threaten to launch me back into the panic attack. This might secretly be how the script thing works, though it doesn’t feel like it to me.
Oh come on, Eliezer. These strategies aren’t that alien.
I remember a time in my early years, feeling apprehensive about entering adolescence and inevitably transforming into a stereotypical rebellious teenager. It would have been not only boring and cliche but also an affront to every good thing I thought about myself. I didn’t want to become a rebellious teenager, and so I decided, before I was overwhelmed with teenage hormones, that I wouldn’t become one. And it turns out that intentional steering of one’s self-narrative can (sometimes) be quite effective (constrained by what’s physically possible, of course)! (Not saying that I couldn’t have done with a bit more epistemological rebellion in my youth.)
The second one comes pretty naturally to me, too. I often feel more like a disembodied observer of the world around me, rather than an active participant. Far more of my mental energy is spent navigating the realm of ideas than identifying with the persona that is everything that everyone else identifies with me, so I tend to think far more about what ought to be done than about how I feel about things. Probably not the best thing for everyone to be like that, though.
There’s also someone I know personally who definitely falls into the third trap, and who is definitely among those for whom this advice would not be helpful at all. She is a genuinely loving, compassionate, and selfless person, but that very selflessness sometimes manifests in a physically debilitating way. Not long after I first got to know her, I noticed that she seemed to exaggerate her reactions to things, not maliciously or even consciously, but more as a sort of moral obligation. As if by not overreacting to every small mishap, it would prove that she didn’t care. As if by not sacrificing her own well-being for the sake of helping everyone around her, it would prove that she didn’t love them. I think at some point in the past, she defined her character as someone who reacts strongly to the things that matter to others, but her subconscious has since twisted this to the point where she now tends to stress herself out over other people’s problems to the point where she becomes physically ill. Again, I don’t think she want to make a martyr out of herself, but I think her self-predicting, motor-directing circuitry thinks that she needs to be one.
An additional possibly-not-helpful bit of advice for the existentially anxious: take a page from Stoicism. Try to imagine all the way things could go disastrously wrong, and try to coax yourself into being emotionally at peace with those outcomes, insofar as they are outside of your control. Strive as much as possible to steer things toward a better future with the tools and resources you have available to you, but practice equanimity towards everything else.
I reflected on why I didn’t feel overwhelming debilitating sadness due to x-risk and realized that “there’s no rule that says you should be sad if you aren’t feeling sad.”
Even a recent widow in a previously happy marriage shouldn’t feel bad about not feeling sad if they find themselves not being sad.
Why can’t this too be a trope: having had the thought “I’m a writer and can write myself; I can write internal scripts for what I do and how I react,” the character believes he has near-perfect agency over how he feels, thinks, and acts, until one day a particular stress test (in an accelerating series of increasingly rigorous stress tests) suggests that he doesn’t.
It’s not a common trope, certainly, but if it is one, it’s also one that Eliezer is happy to play out. (And there are lots of good tropes that people play out which they shouldn’t avoid just because they are tropes—like falling in love, or being a good friend to others when they are sad, or being a conscientious ethical objector, or being someone who can let go of things while having fun, etc.)
“There exists a place in your cognition that feels like an expectation but actually stores an action plan that your body will follow, and you can load plans into it.” is a valuable insight and I’m not sure I’ve seen it stated quite in that form elsewhere.
Do you have more you could say about how cognition works, or reliable references to point at?
Everything I’ve read is either true but too specific or low level to be useful (on the science end) or mixed with nonsense (on the meditation end), and my own mind is too muddled to easily distinguish true facts about how it works from almost-true facts about how it works. This makes building up a reliable model really hard.
If you can get access to the book, try reading The Intelligent Movement Machine. Basically, motor cortex is not so much about stimulating the contraction of certain muscles, but it’s instead encoding the end-configuration to move the body towards (e.g., motor neurons in monkey motor cortex that encode the act of bringing the hand to the mouth, not matter the starting position of the arm). How the muscles actually achieve this is then more a matter of model-based control theory than RL-trained action policy.
It’s closely related to end-effector control, where the position, orientation, force, speed, etc. of the movement of the end of a robotic appendage are the focus of optimization, as opposed to joint control, which focuses only on the raw motor outputs along the joints of the appendage that cause the movement.
You can also try diving deeper into the active inference literature if you want to build an intuition for how “predictive” circuits can actually drive motor commands. Just remember that Friston comes at this from the perspective of trying to find unifying mathematical formalisms for everything the brain does, both perception and action, which leads him to use terminology for the action side of things that is unintuitive.
Active inference is not saying that the brain “predicts” that the body will achieve a certain configuration and then the universe grants its wish. Instead, just like perception is about predicting what things out in the world are causing your senses to receive the signals that they do, action is about predicting what low-level movements of your body would cause your desired high-level behavior and then using those predictions to actually drive the low-level movements. Or rather, the motor cortex is finding the low-level movements (proprioceptive trajectories) that the agent’s intended behavior would cause and then carrying out those movements. Again, don’t get too hung up on the “prediction” nomenclature; the system does what it does regardless of what you call it.
It sounds like you read Petro Dobromylsky’s Hyperlipid and Brad Marshall’s Fire in a Bottle!
Translating this to the mental script that works for me:
If I picture myself in the role of the astronauts on the Columbia as it was falling apart, or a football team in the last few minutes of a game where they’re twenty points behind, I know the script calls for just keeping up your best effort (as you know it) until after the shuttle explodes or the buzzer sounds. So I can just do that.
Why is there an alternative script that calls to go insane? I think because there’s a version that equates that with a heroic effort, that thinks that if I dramatize and just try harder (as shown by visible effort signalling), that equates with making a true desperate effort that might actually work in a way that just calmly doing my best to the end won’t. But since I know that script is wrong, I can just not play it.
(Why does that script exist? I think for signalling reasons—going insane over something is a good way to shallowly signal I think it’s significant. But it’s not a good way to solve the underlying problem when it’s the underlying problem that needs solving, so I just choose not to do it when that’s the case.
A similar example: If I imagine seeing a news article about a child going missing, it’s easy for me to picture myself remarking “oh that’s terrible, I’m crying just imagining the parents”. If I imagine a child of mine or of a close friend going missing, my mental script’s next step is “okay track down where he was, call the police, think of more action steps”. Because there I care more about finding the child than about signalling that I care about finding the child).
I thought the “going insane” thing would have been about showing everyone around you that you need help and/or are not a person able to give help to anyone else.
An example: near the end of “Saving Private Ryan”, the squad led by Tom Hanks gets into a pitched battle with some German soldiers. One of the members of the squad spends the entire battle hiding behind a building and crying.