″...What do you do with this impossible challenge?
First, we assume that you don’t actually say “That’s impossible!” and give up a la Luke Skywalker. You haven’t run away.
Why not? Maybe you’ve learned to override the reflex of running away. Or maybe they’re going to shoot your daughter if you fail. We suppose that you want to win, not try—that something is at stake that matters to you, even if it’s just your own pride. (Pride is an underrated sin.)
Will you call upon the virtue of tsuyoku naritai? But even if you become stronger day by day, growing instead of fading, you may not be strong enough to do the impossible. You could go into the AI Box experiment once, and then do it again, and try to do better the second time. Will that get you to the point of winning? Not for a long time, maybe; and sometimes a single failure isn’t acceptable.
(Though even to say this much—to visualize yourself doing better on a second try—is to begin to bind yourself to the problem, to do more than just stand in awe of it. How, specifically, could you do better on one AI-Box Experiment than the previous?—and not by luck, but by skill?)
Will you call upon the virtue isshokenmei? But a desperate effort may not be enough to win. Especially if that desperation is only putting more effort into the avenues you already know, the modes of trying you can already imagine. A problem looks impossible when your brain’s query returns no lines of solution leading to it. What good is a desperate effort along any of those lines?
Make an extraordinary effort? Leave your comfort zone—try non-default ways of doing things—even, try to think creatively? But you can imagine the one coming back and saying, “I tried to leave my comfort zone, and I think I succeeded at that! I brainstormed for five minutes—and came up with all sorts of wacky creative ideas! But I don’t think any of them are good enough. The other guy can just keep saying ‘No’, no matter what I do.”
And now we finally reply: “Shut up and do the impossible!”
As we recall from Trying to Try, setting out to make an effort is distinct from setting out to win. That’s the problem with saying, “Make an extraordinary effort.” You can succeed at the goal of “making an extraordinary effort” without succeeding at the goal of getting out of the Box.
“But!” says the one. “But, SUCCEED is not a primitive action! Not all challenges are fair—sometimes you just can’t win! How am I supposed to choose to be out of the Box? The other guy can just keep on saying ‘No’!”
True. Now shut up and do the impossible.
Your goal is not to do better, to try desperately, or even to try extraordinarily. Your goal is to get out of the box.”
Pee Doom
I’m really noticing how the best life improvements come from purchasing or building better infrastructure, rather than trying permutations of the same set of things and expecting different results. (Much of this results from having more money, granting an expanded sense of possibility to buying useful things.)
The guiding question is, “What upgrades would make my life easier?” In contrast with the question that is more typically asked: “How do I achieve this hard thing?”
It seems like part of what makes this not just immediately obvious is that I feel a sense of resistance (that I don’t really identify with). Part of that is a sense of… naughtiness? Like we’re supposed to signal how hardworking we are. For me this relates to this fear I have that if I get too powerful, I will break away from others (e.g. skipping restaurants for a Soylent Guzzler Helmet, metaphorically) as I re-engineer my life and thereby invite conflict. There’s something like a fear that buying or engaging in nicer things would be an affront to my internalized model of my parents?
The infrastructure guideline relates closely to the observation that to a first approximation we are stimulus-response machines reacting to our environment, and that the best way to improve is to actually change your environment, rather than continuing to throw resources past the point of diminishing marginal returns in adaptation to the current environment. And for the same reasons, the implications can scare me, for it may imply leaving the old environment behind, and it may even imply that the larger the environmental change you make, the more variance you have for a good or bad update to your life. That would mean we should strive for large positive environmental shifts, while minimizing the risk of bad ones.
(This also gives me a small update towards going to Mars being more useful for x-risk, although I may need to still propagate a larger update in the other direction away from space marketing. )
Of course, most of one’s upgrades should be tiny and within one’s comfort zone. What the portfolio of small vs huge changes one should make in one’s life is an open question to me, because while it makes sense to be mostly conservative with one’s allocation of one’s life resources, I suspect that fear brings people to justify the static zone of safety they’ve created with their current structure, preventing them from seeking out better states of being that involve jettisoning sunk costs that they identify with. Better coordination infrastructure could make such changes easier if people don’t have to risk as much social conflict.
I find the question, “What would change my mind?”, to be quite powerful, psychotherapeutic even. AKA “singlecruxing”. It cuts right through to seeking disconfirmation of one’s model, and can make the model more explicit, legible, object. It’s proactively seeking out the data rather than trying to reduce the feeling of avoidant deflection associated with shielding a beloved notion from assault. Seems like it comports well with the OODA loop as well. Taken from Raemon’s “Keeping Beliefs Cruxy”.
I am curious how others ask this question of themselves. What follows is me practicing the question.
What would change my mind about the existence of the moon? Here are some hypotheses:
I would look up in the sky every few hours for several days and nights and see that it’s not there.
I see over a dozen posts on my Facebook feed talking about how it turns out it was just a cardboard cutout and SpaceX accidentally tore a hole in it. They show convincing video of the accident and footage of people reacting such as leaders of the world convening to discuss it.
Multiple friends are very concerned about my belief in this luminous, reflective rocky body. They suggest I go see a doctor or the government will throw me in the lunatics’ asylum. The doctor prescribes me a pill and I no longer believe.
It turns out I was deluded and now I’m relieved to be sane.
It turns out they have brainwashed me and now I’m relieved to be sane.
I am hit over the head with a rock which permanently damages my ability to form lunar concepts. Or it outright kills me. I think this Goodharts (is that the closest term I’m looking for?) the question but it’s interesting to know what are bad/nonepistemic/out-of-context reasons I would stop believing in a thing.
These anticipations were System 2 generated and I’m still uncertain to what extent I can imagine them actually happening and changing my mind. It’s probably sane and functional that the mind doesn’t just let you update on anything you imagine, though I also hear the apocryphal saying that the mind 80% believes whatever you imagine is real.
Russia’s state faces an existential threat.
The implication is that attacks on the territories it is annexing are interpretable as an existential threat.
How do we do this without falling into the Crab Bucket problem AKA Heckler’s Veto, which is definitely a thing that exists and is exacerbated by these concerns in EA-land? “Don’t do risky things” equivocates into “don’t do things”.
The wifi hacking also immediately struck me as reminiscent of paranoid psychosis. Though a significant amount of psychosis-like things are apparently downstream of childhood trauma, including sexual abuse, but I forget the numbers on this.
This happens intergenerationally as parents forget to alert their children to the actual reasons for things. Having observed this happen with millenials, I am scared of what we are all collectively missing because older generations literally just forgot to tell us.
What do you think we are missing?
I’m so glad someone did a writeup of this! Part of me has wanted to, I think I have a draft… I remember going through severe depression over four years ago and one of my reprieves was joyfully reading the papers written about coherence psychology. I will definitely be linking this post as a reference.
There are many times I am talking with people and want to reference from the conceptual structure of coherence psychology, but there is way too much inferential distance especially with aspiring rationalists who are not therapy geeks, so I end up mentally flailing my arms in frustration. The theory seems like a better candidate for The One True Psychotherapy than almost any other and it pains me to see people go about solving their problems without it in their toolkit, and not being able to communicate this to them. e.g. It’s frustrating to see people trying to correct the output of emotional schemas without accessing the generating model for disconfirmation. e.g. A person may feel uncomfortable with someone else who has low self-esteem so they will try to correct it verbally without engaging in a process that will change the underlying ‘pro-symptom position’.
There’s the related problem that there are very few coherence therapists. I don’t think most psychologists have heard of this and I find that confusing.
Oh, there’s also the fact that I tried a coherence therapist and didn’t find it that helpful the way it was done. They were fine to talk to but it seems retrospectively like they were cargo-culting the motions of coherence therapy as outlined by Ecker et al. I haven’t had other therapists but I suspect the inefficacy is only very weak evidence pointing against the modality vs other modalities and more a problem with cramming an attempt at powerful introspection into expensive 1-hour blocks. i.e. I think psychotherapeutic structure across the board is broken and when the singularity happens it won’t be a problem anymore.
My hope is that we can develop new delivery structures into which we can import psychological techniques and have them deployed at scale while being better than 1-hour weeklies, 8-hour shamanic trips, or that annoying app with the emotionally saccharine bird.
See also: The Method of Levels
Have you looked at the Atlas Fellowship, btw?
I will keep harping on that more people should try starting (public benefit) corporations instead of nonprofits. At least, give it five minutes’ thought. Especially if handwaves impact markets something something. This should be in their Overton Window, but it might not be because they automatically assume “doing good ⇒ charity ⇒ nonprofit”. Corporations are the standard procedure for how effective helpful things are done in the world; they are RLHF’d by the need to acquire profit by providing real value to customers, reducing surfacce area for bullshitting. I am not an expert here by any means, but I’m noticing the fact that I can go on Clerky or Stripe Atlas and spend a couple hours spinning up an organization, versus, well, I haven’t actually gone through with trying to incorporate a nonprofit, but the process seems at least 10x more painful based on reading a book on it and with how many people seek fiscal sponsorship. I’m pretty surprised this schlep isn’t talked about more. Having to rely on fiscal sponsorship seems pretty obviously terrible to me, and I hadn’t even considered the information-distortive effects here. I would not be caught dead being financially enmeshed with the EVF umbrella of orgs after FTX. From my naive perspective, the castle could have easily been a separate business entity with EVF having at least majority control?
(I just realized I’m on LessWrong and not EA Forum, and could have leaned harder into capitalismpunk without losing as many social points.)
Why does CHAI exclude people who don’t have a near-perfect GPA? This doesn’t seem like a good way to maximize the amount of alignment work being done. High GPA won’t save the world and in fact selects for obedience to authority and years of status competition, leading to poor mental health to do work in, decreasing the total amount of cognitive resources being thrown at the problem.
(Hypothesis 1: “Yes, this is first-order bad but the second-order effect is we have one institutionally prestigious organization, and we need to say we have selective GPA in order to fit in and retain that prestige.” [Translator’s Note: “We must work with evil in order to do good.” (The evil being colleges and grades and most of the economic system.)])
(Hypothesis 2: “GPA is the most convenient way we found to select for intelligence and conscientiousness, and those are the traits we need the most.”)
(Hypothesis 3: “The university just literally requires us to do this or we’ll be shut down.”)
Won’t somebody think of the grad students!
Fascinating. Way, way more examples and empirical treatment of rituals would help me understand your case better.
Noting that the more real, second-order disaster resulting from Chernobyl may have been less usage of nuclear power (assuming that had an influence on antinuclear sentiment). Likewise, I’m guessing the Challenger disaster had a negative influence on the U.S. space program. Covid lockdowns also have this quality of not tracking the cost-benefit of their continuation. Human reactions to disasters can be worse than the disasters themselves, especially if the costs of those reactions are hidden. I don’t know how this translates to AI safety but it merits thought.
What was the most valuable habit you had during the past decade?
What is the most valuable habit you could inculcate or strengthen over the next decade?
(Habit here broadly construed as: “specific activity that lasts anywhere from a number of seconds to half an hour or more. Examples: playing golf each morning. Better example: practicing your driving swing at 6:00am for 30 minutes (but you can give much more detail than that!). Bad example: poorly operationalized vague statements like “being more friendly”.)
See: The One Thing
Nitpicking a particular topic of interest to me:
Power/money/being-the-head-of-OpenAI doesn’t do anything post-singularity.
It obviously does?
I am very confused why people make claims in this genre. “When the Singularity happens, this (money, conflict, the problems I’m experiencing) won’t be a problem anymore.”
This mostly strikes me as magical, far-mode thinking. It’s like people have an afterlife-shaped hole after losing religion. The specific, real reality in front of you won’t magically suddenly change after an Intelligence Explosion and assuming we’re alive in some coherent state. Money and power are very, very likely to still exist afterwards, just in a different state that makes sense as a transformation of the current world.
when aggression with conventional weapons greatly endangers Russia’s existence
Putin could interpret an attack on its newly annexed territories as “greatly endangering Russia’s existence”. He seems to be generating rhetoric in that direction.
This post explains more, I don’t have any other info: https://www.lesswrong.com/posts/F7RgpHHDpZYBjZGia/high-schoolers-can-apply-to-the-atlas-fellowship-usd50k
How might a person develop INCREDIBLY low time preference? (They value their future selves in decades to a century nearly as much as they value their current selves?)
Who are people who have this, or have acquired this, and how did they do it?
Do these concepts make sense or might they be misunderstanding something? Tabooing/decomposing them, what is happening cognitively, experientially, when a human mind does this thing?
What would a literature review say?
I was wrong on producing a writeup that qualifies as “a writeup” (I’m not sure exactly where I would have put it after the draft had been finished). I am poorly calibrated in personal action predictions (it may be the case that I am only tempted to make a prediction that I’ll do a thing when I want to signal to myself or others that I will in fact do a thing when the outside view says I won’t, so I should probably update downward that I’ll do a thing if I find myself trying to predict a probability that I’ll do it, over and above the normal downward adjustment for planning fallacy and Hofstadter’s Law).
Thankfully there is satisfactory content on the subject. For instance, “Group Debugging” seems to be the thing-that-is-doing-the-closest-thing-to-this at meetups that is more repeatable and tractable than the original Hamming question (it’s basically what the Hamming thing I said I facilitated was), though it is somewhat different from the broad scope of the original (though I don’t like the word “Debugging” associated with this exercise, it seems to fetishize using programming metaphors to apply to human psychology, which feels sterile, cliquey, overreliant on usage of “System 2” solutions, and not as obviously descriptive of what is happening as it could be. Maybe “Group Problem-Solving”?).
Interesting summary and interpretation of a speech outlining Putin’s intentions, “The End of Western Hegemony is INEVITABLE”: