Or you could think of misalignment as the AI doing things its designers explicitly tried to prevent it from doing (giving people suicide instructions and the like), then in this case the AI is clearly “misaligned”, and that says something about how difficult it’ll be to align our next AIs.
cousin_it(Vladimir Slepnev)
Thomas C. Schelling’s “Strategy of Conflict”
Announcing the AI Alignment Prize
Announcement: AI alignment prize round 3 winners and next round
Understanding is translation
Can you describe what changed / what made you start feeling that the problem is solvable / what your new attack is, in short?
There’s an amazing HN comment that I mention everytime someone links to this essay. It says don’t do what the essay says, you’ll make yourself depressed. Instead do something a bit different, and maybe even opposite.
Let’s say for example you feel annoyed by the fat checkout lady. DFW advises you to step over your annoyance, imagine the checkout lady is caring for her sick husband, and so on. But that kind of approach to your own feelings will hurt you in the long run, and maybe even seriously hurt you. Instead, the right thing is to simply feel annoyed at the checkout lady. Let the feeling come and be heard. After it’s heard, it’ll be gone by itself soon enough.
Here’s the whole comment, to save people the click:
DFW is perfect towards the end, when he talks about acceptance and awareness— the thesis (“This is water”) is spot on. But the way he approaches it, as a question of choosing what to think, is fundamentally, tragically wrong.
To Mindfulness-Based Cognitive Therapy folks call that focusing on cognition rather than experience. It’s the classic fallacy of beginning meditators, who believe the secret lies in choosing what to think, or in fact choosing not to think at all. It makes rational sense as a way to approach suffering; “Thinking this way is causing me to suffer. I must change my thinking so that the suffering stops.”
In fact, the fundamental tenet of mindfulness is that this is impossible. Not even the most enlightened guru on this planet can not think of an elephant. You cannot choose what to think, cannot choose what to feel, cannot choose not to suffer.
Actually, that is not completely true. You can, through training over a period of time, teach yourself to feel nothing at all. We have a special word to describe these people: depressed.
The “trick” to both Buddhist mindfulness and MBCT, and the cure for depression if such a thing exists, lies in accepting that we are as powerless over our thoughts and emotions as we are over our circumstances. My mind, the “master” DFW talks about, is part of the water. If I am angry that an SUV cut me off, I must experience anger. If I’m disgusted by the fat woman in front of me in the supermarket, I must experience disgust. When I am joyful, I must experience joy, and when I suffer, I must experience suffering. There is no other option but death or madness— the quiet madness that pervades most peoples’ lives as they suffer day in and day out in their frantic quest to avoid suffering.
Experience. Awareness. Acceptance. Never thought— you can’t be mindful by thinking about mindfulness, it’s an oxymoron. You have to just feel it.
There’s something indescribably heartbreaking in hearing him come so close to finding the cure, to miss it only by a hair, knowing what happens next.
[Full disclosure: My mother is a psychiatrist who dabbles in MBCT. It cured her depression, and mine.]
And another comment from a different person making the same point:
Much of what DFW believed about the world, about himself, about the nature of reality, ran counter to his own mental wellbeing and ultimately his own survival. Of the psychotherapies with proven efficacy, all seek to inculcate a mode of thinking in stark contrast to Wallace’s.
In this piece and others, Wallace encourages a mindset that appears to me to actively induce alienation in the pursuit of deeper truth. I believe that to be deeply maladaptive. A large proportion of his words in this piece are spent describing that his instinctive reaction to the world around him is one of disgust and disdain.
Rather than seeking to transmute those feelings into more neutral or positive ones, he seeks to elevate himself above what he sees as his natural perspective. Rather than sit in his car and enjoy the coolness of his A/C or the feeling of the wheel against his skin or the patterns the sunlight makes on his dash, he abstracts, he retreats into his mind and an imagined world of possibilities. He describes engaging with other people, but it’s inside his head, it’s intellectualised and profoundly distant. Rather than seeing the person in the SUV in front as merely another human and seeking to accept them unconditionally, he seeks a fictionalised narrative that renders them palatable to him.
He may have had some sort of underlying chemical or structural problem that caused his depression, but we have no real evidence for that, we have no real evidence that such things exist. What we do know is that patterns of cognition that he advocated run contrary to the basic tenets of the treatment for depression with the best evidence base—CBT and it’s variants.
What a reduction of “could” could look like
Announcement: AI alignment prize winners and next round
Announcement: AI alignment prize round 4 winners
Another attempt to explain UDT
“Epiphany addiction”
A model of UDT with a halting oracle
OpenAI has already been the biggest contributor to accelerating the AI race; investing in chips is just another step in the same direction. I’m not sure why people keep assuming Altman is optimizing for safety. Sure, he has talked about safety, but it’s very common for people to give lip service to something while doing the opposite thing. I’m not surprised by it and nobody should be surprised by it. Can we just accept already that OpenAI is going full speed in a bad direction, and start thinking what we can/should do about it?
What Thomas Schelling would do. Partly tongue-in-cheek.
The Clumsy Game-Player: agree to the deal, then perform an identical “finger slip” several turns later.
The Lazy Student, The Grieving Student, The Sports Fan: make the deadline for reports a curve instead of a cliff. Each day of delay costs some percentage of the grade.
The Murderous Husband: if you really don’t want these things to happen, make the wife partially responsible for the murder in such cases, by law. (Or the lover, if the husband chooses to murder the wife.)
The Bellicose Dictator: publicly threaten sanctions unless the invading army withdraws immediately. Do this before any negotiations.
The Peyote-Popping Native, The Well-Disguised Atheist: when the native first comes to you, offer to balance out the permission to smoke peyote with some sanction against the Native American church. Then the atheists won’t bother asking for a free lunch.