Glad to hear it’s been of use!
Which agent should the sympathetic listener be talking to? The manager, the exile, or both?
First with any of the managers which might be protecting the exiles. Eventually they might give access to the exile, but it’s important to not try to rush through them. You only go to the exile after the managers have agreed to give you access to it: bypassing them risks causing damage because the managers had concerns which weren’t taken into account. (Self-Therapy has detailed instructions on this.) You might e.g. end up exposing an exile in a situation where you don’t have the resources to handle it, and then instead of healing the exile, you end up worsening the original trauma. That will also have the added effect of making your managers less likely to trust you with access to the exile again.
Though sometimes I’ve had exiles pop up pretty spontaneously, without needing to negotiate with managers. In those situations I’ve just assumed that all managers are fine with this, since there’s no sense of a resistance to contacting the exile. If that happens then it’s probably okay, but if it feels like any managers are getting in the way, then address their concerns as much as possible. (As the instructor said in an IFS training I did: “to go fast, you need to go slow”.)
IFS also recommends checking back with the managers after healing the exile, so that they can see that the exile is actually healed now and that they can behave differently in the future. Also, you may want to keep checking back with the exile for a while afterwards, to ensure that it’s really been healed.
Assuming that one correctly identifies which thoughts (and ultimately, which situations) a manager deems dangerous, and that one successfully does cognitive defusion, to what extent is it feasible, in your opinion, to have the manager (the exile) update by just talking to them vs by experiencing the dangerous situation again but positively?
Depends. I think that either are possible, but I don’t have a hard and fast rule: usually I’ve just gone with whatever felt more right. But I’d guess that in the situations where you can get parts to update just by talking to them, it’s in situations where you’ve already accumulated plenty of evidence about how things are, and the relevant parts just need to become aware of them. E.g. if you had some challenge which was very specifically about your childhood environment, then it shouldn’t be too hard to let your parts know that you’re no longer in that environment.
On the other hand, for some issues (e.g. social anxiety), the parts might have kept you from ever testing the safety of most situations. For instance, if you’re scared of talking to strangers, then you generally won’t be talking to strangers. And when you do, you will have parts screaming at you to get out of that situation, which makes it intrinsically unpleasant and won’t let you experience it as safe. In that case, you won’t actually have collected the evidence needed for making the update, so you need to first persuade the parts to agree that collecting it is sufficiently safe. Then you can go out and get it.
It seems like market forces could even actively damage existing cooperation. While I’m not terribly familiar with the details, I’ve heard complaints of this happening at one university that I know of. There’s an internal market where departments need to pay for using spaces within the university building. As a result, rooms that would otherwise be used will sit empty because the benefit of paying the rent isn’t worth it.
Possibly this is still overall worth it—the system increasing the amount of spare capacity means that there are more spaces available for when a department really does need a space—but people do seem to complain about it anyway.
I used to consider myself NU, but have since then rejected it.Part of my rejection was that, on a psychological level, it simply didn’t work for me. The notion that everything only has value to the extent that it reduces suffering meant that most of the things which I cared about, were pointless and meaningless except for their instrumental value in reducing my suffering or making me more effective at reducing suffering. Doing things which I enjoyed, but constantly having a nagging sensation of “if I could just learn to no longer need this, then it would be better for everyone” basically meant that it was very hard to ever enjoy anything. It was basically setting my mind up to be a battlefield, dominated by an NU faction trying to suppress any desires which did not directly contribute to reducing suffering, and opposed by an anti-NU faction which couldn’t do much but could at least prevent me from getting any effective NU work done, either.Eventually it became obvious that even from an NU perspective, it would be better for me to stop endorsing NU, since that way I might end up actually accomplishing more suffering reduction than if I continued to endorse NU. And I think that this decision was basically correct.A related reason is that I also rejected the need for a unified theory of value. I still think that if you wanted to reduce human values into a unified framework, then something like NU would be one of the simplest and least paradoxical answers. But eventually I concluded that any simple unified theory of value is likely to be wrong, and also not particularly useful for guiding practical decision-making. I’ve written more about this here.
Finally, and as a more recent development, I notice that NU neglects to take into account non-suffering-based preferences. My current model of minds and suffering is that minds are composed of many different subagents with differing goals; suffering is the result of the result of different subagents being in conflict (e.g. if one subagent wants to push through a particular global belief update, which another subagent does not wish to accept).
This means that I could imagine an advanced version of myself who had gotten rid of all personal suffering, but was still motivated by pursue other goals. Suppose for the sake of argument that I only had subagents which cared about 1) seeing friends 2) making art. Now if my subagents reached agreement of spending 30% of their time making art and 70% of their time seeing friends, then this could in principle eliminate my suffering by removing subagent conflict, but it would still be driving me to do things for reasons other than reducing suffering. Thus the argument that suffering is the only source of value fails; the version of me which had eliminated all personal suffering might be more driven to do things than the current one! (since subagent conflict was no longer blocking action in any situation)
As a practical matter, I still think that reducing suffering is one of the most urgent EA priorities: as long as death and extreme suffering exist in the world, anything that would be called “altruism” should focus its efforts on reducing that. But this is a form of prioritarianism, not NU. I do not endorse NU’s prescription that an entirely dead world would be equally good or better as a world with lots of happy entities, simply because there are subagents within me who would prefer to exist and continue to do stuff, and also for other people to continue to exist and do stuff if they so prefer. I want us to liberate people’s minds from involuntary suffering, and then to let people do whatever they still want to do when suffering is a thing that people experience only voluntarily.
If it is not profitable, then it morally shouldn’t ought to exist; the market is indicating that the business is a waste of society’s limited resources.
If it is not profitable, then it morally shouldn’t ought to exist; the market is indicating that the business is a waste of society’s limited resources.
This seems much too strong: it e.g. suggests that no non-profits should exist. Profitability and overall benefit to society are two very different things.
(did you mean to ask goal-directed?)
We are not currently in the situation of s-risks, so it is not typical state of affairs.
Wouldn’t this apply to almost anything? If we are currently not in the situation of X, then X is not a typical state of affairs.
However, this risk is significantly different. If you believed that superintelligent AI must be goal-directed because of math, then your only recourse for safety would be to make sure that the goal is good, which is what motivated us to study ambitious value learning. But if the argument is actually that AI will be goal-directed because humans will make it that way, you could try to build AI that is not goal-directed that can do the things that goal-directed AI can do, and have humans build that instead.
I’m curious about the extent to which people have felt like “superintelligent AI must be goal-directed” has been the primary problem? Now that I see it expressed in this form, I realize that there have for a long time been lots of papers and comments which seem to suggest that this might be people’s primary concern. But I always kind of looked at it from the perspective of “yeah this is one concern, but even assuming that we could make a non-goal-directed AI, that doesn’t solve the problem of other people having an incentive to make goal-directed-AI (and that’s the much more pressing problem)”. So since we seemed to agree on goal-directed superintelligence being a big problem, maybe I overestimated the extent of my agreement with other people concerned about goal-directed superintelligence.
Wow. I didn’t expect to see a therapy approach based on morphic fields.
It seems very unlikely to me that they would have gotten any less publicity if they’d reported the APM restrictions any differently. (After all, they didn’t get any less publicity for reporting the system’s other limitations either, like it only being able to play Protoss v. Protoss on a single map, or 10⁄11 of the agents having whole-camera vision.)
I was using “fair” to mean something like “still made for an interesting test of the system’s capabilities”. Under that definition, the explanations seem entirely compatible—they thought that it was an interesting benchmark to try, and also got excited about the results and wanted to share them because they had run a test which showed that the system had passed an interesting benchmark.
I curated this post because “why do people disagree and how can we come to agreement” feels like one of the most important questions of the whole rationality project. In my experience, when two smart people keep disagreeing while having a hard time understanding how the other could fail to understand something so obvious, it’s because they are operating under different frameworks without realizing it. Having analyzed examples of this helps understand and recognize it, and then hopefully learn to avoid it.
This seems like it would mainly affect instrumental values rather than terminal ones.
True, not disputing that. Only saying that it seems like an easier problem than solving human values first, and then checking whether those values are satisfied.
It’s been on my to-read list for a while, I bounced off the first time I tried reading it (it seemed to be taking its time to get to the point, started with the personal narrative of the authors and how they got into meditation research etc.) but expect to get around it eventually.
Apparently as a result of this analysis, DeepMind has edited the caption in the graph:
The distribution of AlphaStar’s APMs in its matches against MaNa and TLO and the total delay between observations and actions. CLARIFICATION (29/01/19): TLO’s APM appears higher than both AlphaStar and MaNa because of his use of rapid-fire hot-keys and use of the “remove and add to control group” key bindings. Also note that AlphaStar’s effective APM bursts are sometimes higher than both players.
Well, apparently that’s exactly what happened with TLO and MaNa, and then the DeepMind guys were (at least going by their own accounts) excited about the progress they were making and wanted to share it, since being able to beat human pros at all felt like a major achievement. Like they could just have tested it in private and continued working on it in secret, but why not give a cool event to the community while also letting them know what the current state of the art is.
E.g. some comments from here:
I am an administrator in the SC2 AI discord and that we’ve been running SC2 bot vs bot leagues for many years now. Last season we had over 50 different bots/teams with prizes exceeding thousands of dollars in value, so we’ve seen what’s possible in the AI space.
I think the comments made in this sub-reddit especially with regards to the micro part left a bit of a sour taste in my mouth, since there seems to be the ubiquitous notion that “a computer can always out-micro an opponent”. That simply isn’t true. We have multiple examples for that in our own bot ladder, with bots achieving 70k APM or higher, and them still losing to superior decision making. We have a bot that performs god-like reaper micro, and you can still win against it. And those bots are made by researchers, excellent developers and people acquainted in that field. It’s very difficult to code proper micro, since it doesn’t only pertain to shooting and retreating on cooldown, but also to know when to engage, disengage, when to group your units, what to focus on, which angle to come from, which retreat options you have, etc. Those decisions are not APM based. In fact, those are challenges that haven’t been solved in 10 years since the Broodwar API came out—and last Thursday marks the first time that an AI got close to achieving that! For that alone the results are an incredible achievement.
And all that aside—even with inhuman APM—the results are astonishing. I agree that the presentation could have been a bit less “sensationalist”, since it created the feeling of “we cracked SC2″ and many people got defensive about that (understandably, because it’s far from cracked). However, you should know that the whole show was put together in less than a week and they almost decided on not doing it at all. I for one am very happy that they went through with it.
And the top comment from that thread:
Thank you for saying this. A decent sized community of hobbyists and researchers have been working on this for YEARS, and the conversation has really never been about whether or not bots can beat humans “fairly”. In the little documentary segment, they show a scene where TLO says (summarized) “This is my off race, but i’m still a top player. If they’re able to beat me, i’ll be really surprised.”
That isn’t him being pompous, that’s completely reasonable. AI has never even come CLOSE to this level for playing starcraft. The performance of AlphaStar in game 3 against MaNa left both Artosis AND MaNa basically speechless. It’s incredible that they’ve come this far in such a short amount of time. We’ve literally gone from “Can an AI play SC2 at a high level AT ALL” to “Can an AI win ‘fairly’“. That’s a non-trivial change in discourse that’s being completely brushed over IMO.
Maybe the “why” book for a language would be something like reading manga for someone wanting to learn Japanese—something that makes the language and culture seem cool and something that you want to learn. :)
Given that learning a language also includes a fair chunk of learning the culture (e.g. knowing which forms of address are appropriate at which times), reading literature from that culture is probably actually useful for accomplishing the “Why” book’s goal of explaining the mindset and intuitions behind the skill.
As noted in e.g. the conversation between Wei Dai and me elsewhere in this thread, it’s quite plausible that people thought beforehand that the current APM limits were fair (DeepMind apparently consulted pro players on them). Maybe AlphaStar needed to actually play a game against a human pro before it became obvious that it could be so overwhelmingly powerful with the current limits.
things that had been too scary for me to think about became thinkable (e.g. regrettable dynamics in my romantic relationships), and I think this is a crucial observation for the rationality project. When you have exile-manager-firefighter dynamics going on and you don’t know how to unblend from them, you cannot think clearly about anything that triggers the exile, and trying to make yourself do it anyway will generate tremendous internal resistance in one form or another (getting angry, getting bored, getting sleepy, getting confused, all sorts of crap), first from managers trying to block the thoughts and then from firefighters trying to distract you from the thoughts. Top priority is noticing that this is happening and then attending to the underlying emotional dynamics.
Valentine has also written some good stuff on this, in e.g. The Art of Grieving Well:
I think the first three so-called “stages of grief” — denial, anger, and bargaining — are avoidance behaviors. They’re attempts to distract oneself from the painful emotional update. Denial is like trying to focus on anything other than the hurt foot, anger is like clutching and yelling and getting mad at the situation, and bargaining is like trying to rush around and bandage the foot and clean up the blood. In each case, there’s an attempt to keep the mind preoccupied so that it can’t start the process of tracing the pain and letting the agonizing-but-true world come to feel true. It’s as though there’s a part of the psyche that believes it can prevent the horror from being real by avoiding coming to feel as though it’s real. [...]
In every case, the part of the psyche driving the behavior seems to think that it can hold the horror at bay by preventing the emotional update that the horror is real. The problem is, success requires severely distorting your ability to see what is real, and also your desire to see what’s real. This is a cognitive black hole — what I sometimes call a “metacognitive blindspot” — from which it is enormously difficult to return.
This means that if we want to see reality clearly, we have to develop some kind of skill that lets us grieve well — without resistance, without flinching, without screaming to the sky with declarations of war as a distraction from our pain.
We have to be willing to look directly and unwaveringly at horror.
and also in Looking into the Abyss:
It would be bad if pain weren’t automatically aversive and we had to consciously remember to avoid things that cause it. Instead, we have a really clever automatic system that notices when something is bad or dangerous, grabs our conscious attention to make us change our behavior, and often has us avoiding the problem unconsciously thereafter.
But because pain is an interpretation rather than a sensation, avoiding it acts as an approximation of avoiding things that are actually bad for us.
This can result in some really quirky behavior on beyond things like dangerously bending at the waist. For instance, moving or touching ourselves seems to distract us from painful sensations. So if the goal is to decrease conscious experience of pain, we might find ourselves automatically clutching or rubbing hurt body parts, rocking, or pounding our feet or fists in response to pain. Especially the latter actions probably don’t help much with the injury, but they push some of the pain out of mind, so many of us end up doing this kind of behavior without really knowing why.
Writhing in agony strikes me as a particularly loud example: if some touch and movement can block pain, then maybe more touch and movement can block more pain. So if you’re in extreme pain and the goal is to get away from it, large whole-body movements seem to make sense. (Although I think there might be other reasons we do this too.)
To me, this looks like a Red Queen race, with the two “competitors” being the pain system and the “distract from pain” reflex. First the pain system tries to get our attention and change our behavior (protect a body part, get help, etc.). This is unpleasant, so the look-away reflex grabs onto the nearest available way to stop the experience of pain, and muddles some of the sensation that’s getting labeled as pain. The pain system still perceives a threat, though, so it turns up the volume so to speak. And then the look-away reflex encourages us to look even more wildly for a way out, which causes pain’s volume to go up even more….
The bit about a Red Queen race sounds to me exactly like the description of an exile/firefighter dynamic, though of course there’s a deeper bit there about some things being so painful as to trigger a firefighter response even if one didn’t exist previously. Probably everyone has some “generic” firefighters built right into the psyche which are our default response to anything sufficiently uncomfortable—similar to the part in my robot design which mentioned that
If a certain threshold level of “distress” is reached, the current situation is designated as catastrophic. All other priorities are suspended and the robot will prioritize getting out of the situation.
even before I started talking about specialized firefighters dedicated to keeping some specific exiles actually exiled. And in the context of something like physical pain or fear of a predator, just having a firefighter response that’s seeking to minimize the amount of experienced distress signal makes sense. The presence of the distress signal is directly correlated with the extent of danger or potential threat, so just having “minimize the presence of this signal” works as an optimization criteria which is in turn directly correlated with optimizing survival.
But when we get to things like “thinking about romantic success” or “thinking about existential risk”, it’s no longer neatly the case that simply not experiencing the stress of thinking about those things is useful for avoiding them...
Whoa, glad you found it that useful! Thank you for letting me know. :)
I do recommend reading at least Self-Therapy too, it mentions a number of details which I left out of this explanation, and which might be useful to know about when addressing future issues.