[Valence series] 5. “Valence Disorders” in Mental Health & Personality

5.1 Post summary /​ Table of contents

Part of the Valence series.

Here in the final post of the Valence series, I will discuss how valence might shed light on three phenomena in mental health and personality: depression, mania, and narcissistic personality disorder.

  • Section 5.2 gives some context: What kind of relationship do we expect a priori between algorithm-level mental components like “valence”, versus observable mental health syndromes and personality disorders? I’ll argue that we should expect salient clusters of symptoms that correspond to systematic changes in valence, but we should not expect this kind of analysis to account for all the symptoms that co-occur in real patients.

  • Section 5.3 discusses what happens if valence has a strong general negative bias—i.e., if almost all thoughts are negative valence. I will argue that the result is a good match to clinical depression. I’ll particularly discuss the inability to voluntarily move and think without unusual effort and willpower.

  • Section 5.4 discusses the opposite: what happens if valence has a strong general positive bias—i.e., if almost all thoughts are positive valence? I will suggest that the expected result is a pretty good match to mania.

  • Section 5.5 discusses what happens if valence is systematically extremized—i.e., if thoughts can have very positive valence, or very negative valence, but rarely in between. I will suggest that the result is a set of symptoms that seem to be a close match to narcissistic personality disorder.

  • Section 5.6 will wrap up the post and series, including a brief discussion of how it relates to my job description as an Artificial General Intelligence safety and alignment researcher.

5.2 Context: What are we expecting to find a priori?

We can think of the following indirect path to get from “root causes” to psychological observations & personality traits:

(Don’t scrutinize the red arrows—I just put them in randomly, to illustrate the idea that each layer can influence the layer below.) As illustrated by the bold text and thick arrows, we should expect to find salient clusters of symptoms that tend to co-occur because they flow from the same proximal cause: systematic changes to valence signals in the brain. But we should also not be surprised to find a mish-mosh of other algorithmically-unrelated symptoms that often appear along with those clusters of symptoms.

As argued in Post 1, valence is one of the most important ingredients in one of the most important algorithms in the brain. So we should expect:

  • Some possible root causes may happen to have a big systematic impact on valence. (But they’ll probably have other consequences too, and the details will differ among different root causes.)

  • Given the centrality of valence in the brain, if there is a big systematic change to valence, then it should have lots of obvious downstream effects on psychology and behavior.

As a consequence:

  • We should expect to find clusters of symptoms /​ behaviors that can be elegantly explained in terms of something happening to valence signals

  • We should also expect to find other symptoms /​ behaviors that commonly co-occur in practice, but cannot be explained in terms of valence. Instead, they are different consequences of the same root cause(s), and may have no relation whatsoever at the “algorithm level”.

For example, dopamine is centrally involved in valence signals, and meanwhile, off in an obscure corner of the brain, dopamine is also centrally involved in a little specialized circuit controlling prolactin hormone release. I firmly believe that, at the algorithm level, these two functions have nothing whatsoever to do with each other. But they both happen to involve dopamine, and thus they can cross-talk in some people—hence the somewhat rare “dysphoric milk ejection reflex” where there’s a flood of intense negative emotions upon milk let-down during lactation.

That example is meant to illustrate the perils of theorizing about psychology purely at the algorithm level. Don’t get me wrong—the algorithm level is great! There are lots of insights to be found there. This post will hopefully be an example. But we shouldn’t expect to find all the insights there. Some things in psychology can only be explained at other levels, including lower (biochemistry) and higher (culture).

5.3 If valence has a strong negative bias (i.e., almost every thought is negative valence), it should lead to a cluster of symptoms suspiciously close to clinical depression

Everyone has a range of thoughts, with varying valence. I claim that, in depression, there’s a strong offset towards negative valence. So for almost every thought you think (e.g. “I’m gonna get out of bed”), your brain immediately assesses that thought as a bad idea, tosses it out, and re-rolls for a new thought (cf. §1.3). For unusually appealing /​ motivating thoughts, like “I’m gonna scratch that really itchy bug bite right now”, I bet that even quite depressed, bedridden people will wind up executing that plan.

5.3.1 Voluntary motor and attention control can only happen with great effort

Going back to §1.3, valence is a control signal. When valence is negative, whatever thought you’re thinking tends to get thrown out, and the brain goes fishing for a new thought instead. When valence is positive, whatever thought you’re thinking tends to stick around. If that thought is part of a temporal sequence (e.g. you’re in the middle of singing a song), that sequence will continue. If that thought entails motor outputs (e.g. “I’m gonna stand up right now”), those motor outputs will actually happen.

If the valence of every thought gets pulled negative, the two most direct consequences are:

  • Voluntary motor control can only happen with great effort /​ willpower.

  • Voluntary attention control (a.k.a. “voluntary thinking”, a.k.a. “System 2”) can only happen with great effort /​ willpower.

If you’re confused by that, I’ll elaborate some potentially-confusing parts:

“Voluntary attention control”: As discussed in §3.3, I firmly believe that motor control and attention control are “the same kind of thing” in many ways. Both have “voluntary” output channels that are under the control of the brain’s “main” reinforcement learning system (§1.5.6), and both also have “involuntary” mechanisms that can be triggered by other brain systems, particularly innate reactions in the brainstem. See the table in §3.3.5 for examples of voluntary and involuntary motor control versus attention control.

“…a.k.a. ‘voluntary thinking’, a.k.a. ‘System 2’…”: There’s a 2019 blog post by Kaj Sotala that I heartily endorse: System 2 as working-memory augmented System 1 reasoning. I would summarize it as the idea that deliberate “System 2” reasoning entails thinking lots of thoughts in sequence, and relating them to each other by holding particular things in working memory. Voluntary attention control is the switchboard making this whole process work, and we learn to skillfully operate that switchboard through reinforcement learning over the course of our life experience.

“…can only happen with great effort /​ willpower”: In the diagram above with the two gaussians, I showed the extreme right tail of the red gaussian just barely squeezing into positive-valence territory. I’ll try to illustrate what that can mean in practice, with an example. Let’s say that you are currently motivated to stay in bed rather than get up, but let’s also say that this motivation is ego-dystonic (§2.6)—i.e., you want to want to get out of bed. Then motivated thinking /​ brainstorming (§3.3) will kick in, and with luck you’ll be able to concoct a thought that spins “I will get out of bed” in the most positive-valence light possible—you’ll call to mind all the great consequences and associations of getting out of bed, and you’ll avoid paying attention to all the unappealing aspects of getting out of bed, insofar as that’s possible. With luck, the result of this brainstorming process will be that your “Thought Generator” (§1.3) crafts a thought Θ that both involves a plan to immediately get out of bed and is assessed by your brain as having net positive valence—probably just barely net positive. And by forming that thought Θ, you will then, in fact, actually get out of bed. Now, everything I’ve written in this paragraph is a mechanistic third-person description, but think about what this same process would feel like “from the inside”: I claim that it’s exactly the kind of thing we’re talking about, when we casually say “I can get out of bed, but only with great effort /​ willpower”.

5.3.2 Anhedonia and other symptoms

Moving on, another famous aspect of depression is anhedonia (inability to feel pleasure). I’m not immediately sure whether the anhedonia of depression is upstream of negative valence, or downstream, or a different consequence of the same root cause, or something else. But I definitely think anhedonia is intimately related to negative valence, for reasons spelled out in Appendix A.

And what about every other aspect of clinical depression? As best as I can tell, at least most of them are consequences of a global negative bias on valence. But in some cases, the story is a bit indirect and speculative. I hope what I’ve said is enough to pique interest in my valence-centric hypothesis of depression, so I’ll leave the story here, although I’m happy to chat more in the comments section.

5.3.3 Root causes

As in §5.2, nothing I’ve said so far is a claim about root causes. But still, what about root causes? I imagine there are a variety of them. For example, here’s a made-up example of obsessive-compulsive disorder (OCD) leading to depression (edited from this older post of mine):

  • If my current thought involves an immediate plan to wash my hands again, then it’s negative valence, because it reminds me of the fact that OCD is ruining my life and relationships.

  • If my current thought does not involve an immediate plan to wash my hands again, then it’s negative valence, because I will get sick and die.

  • I can’t just think a thought about something entirely unrelated to washing my hands and disease and OCD, because of constraints-on-thoughts stemming from “involuntary attention” associated with my anxiety (§3.3.5)

Maybe you’re thinking: OK, but then that just kicks the question one level back: what’s the root cause of the OCD here? But I don’t have a great answer.

Also, this is just one made-up example; even if it’s valid, I imagine that it’s one of many causes of depression, and I have no particular insight to offer.

In case you’re wondering, I also have no particular knowledge about treatments. If you’re suffering from depression, then dang, I’m really sorry; maybe try this general resource page.

5.4 If valence has a strong positive bias (i.e., almost every thought is positive valence), it should lead to a cluster of symptoms suspiciously close to mania

Here, the obvious consequence is that whatever plan happens to pop into your head seems to be a really really awesome plan, and therefore you will actually go and do it. Hence, we get consequences like impulsivity, terrible judgment, unrealistic optimism, and high energy.

Another major symptom of mania is psychosis. But I think that psychosis is basically not algorithmically related to valence. Instead I think psychosis is biochemically related to valence, because both are related to the dopamine system. I have a blog post with some (speculative) details: Model of psychosis, take 2.

OK, that’s what I do believe about psychosis. Why don’t I believe that psychosis is a direct consequence of positive valence? Several reasons (but note that I’m not certain of all these details):

  • Psychosis can happen in the absence of unusual positive valence—especially in schizophrenia. (There’s even such a thing as “psychotic depression”, although it’s less common.) As best as I can tell, the psychotic symptoms in schizophrenia are not wildly different from the psychotic symptoms in manic psychosis, although obviously we expect it to present differently to some extent because the psychosis is occurring in very different background contexts of co-occurring symptoms.

  • As discussed in §3.3, our sensory perceptions are generally constrained by our sensory inputs. If I want to sincerely believe that I’m scuba diving right now, I just can’t, no matter how strong my motivation. Thus, since sensory inputs are independent of valence, a valence bias cannot explain the visual and auditory hallucinations, delusions of reference, and so on, that occur in manic psychosis. (Per §3.3.1, attention-control and motor-control have an influence on perception on the margin, but I don’t think that’s adequate to explain these phenomena.)

  • I don’t think the content of hallucinations, delusions of reference, etc., is a perfect match to what we are motivated to see and believe, even after accounting for the §3.3.4 caveat that motivations are not always obvious.

  • Putting aside the origin of psychotic delusions, perhaps one could argue that their persistence is related to confirmation bias, which in turn is related to valence (§3.3). But I don’t buy that story either, because confirmation bias is not particularly related to positive valence. A big part of confirmation bias is that “the idea of changing one’s mind” has to be negative valence. And indeed, I don’t think it’s the case that mania involves a general unwillingness to change one’s mind. Quite the contrary—in the reports I’ve read, people talk about how a new idea will pop into their head, and it seems great, and they go with it, forgetting about whatever they were into a moment earlier. Thus, in mania, the psychotic delusions are persistent, but pretty much every other kind of thought, plan, and belief has unusually little persistence, I think. So I don’t think the persistence of psychotic delusions can be explained by a general positive bias on valence.

5.5 If valence is “extremized” (i.e., almost every thought is either very positive valence, or very negative valence, but rarely anywhere in between), it should lead to a cluster of symptoms suspiciously close to Narcissistic Personality Disorder (NPD)

Note: I could have alternatively drawn the purple curve as a wider gaussian.

NPD is one of the four “Cluster B personality disorders” listed in DSM-V; the others are borderline personality disorder (BPD), histrionic personality disorder (HPD), and antisocial personality disorder (ASPD) a.k.a. psychopathy a.k.a. sociopathy.

Contrary to what you might think, NPD is not especially related to the everyday meaning of “narcissism”; indeed, there’s a “narcissistic personality inventory” survey, but it turns out that NPD patients get the same score on the survey as controls (!!). The issue seems to revolve around self-esteem. A “narcissist”, as the term is used in everyday language, is a person who thinks they’re really special and great—they have high self-esteem by definition. Whereas an NPD patient need not think they’re really special and great. But if they don’t think that, then boy do they feel lousy about it. (As discussed in that paper, DSM-V emphasizes that “individuals with this disorder have a grandiose sense of self-importance”, but also notes that “vulnerability in self-esteem makes individuals with narcissistic personality disorder very sensitive to ‘injury’ from criticism or defeat”.)

I’m not too sure that an NPD diagnosis “carves nature at its joints”, and I am very open-minded to NPD having subtypes that are only superficially related. (I actually think antisocial personality disorder is like that, i.e. that it has at least two subtypes that are only superficially related.[1]) So the discussion here might only concern a subset of NPD. The discussion here is probably also somewhat applicable to BPD and HPD, although I’m not too sure about the details.[2]

Now let’s consider the hypothesis of “valence extremization”. What happens if almost every thought is either very positive valence, or very negative valence, but rarely anywhere in between? We might expect the following downstream consequences, among other things:

  • Unusual difficulty in talking or thinking about the world independently from how we feel about it: As discussed in §3.4, our brain treats valence as salient sense data which thus gets incorporated into our concepts, categories, and words. If valence signals are unusually strong in general, then presumably they would also play an unusually central role in beliefs, thinking, and communication. For example, there would be an unusually strong mental force for believing that if two things “go together” conceptually, then they must have the same valence.

  • Unusually strong halo effect, affect heuristic, and “splitting”: This is closely related to the above bullet point—again see §3.4. Jargon note: “Splitting” is where someone with NPD views a person they know as a perfect saint during some periods, and views the same person as irredeemably terrible during other periods. (Splitting is a symptom of BPD too.)

  • Unusually strong “drive to be liked /​ admired”: I argued in the previous post that there’s an intimate connection between valence and a “drive to be liked /​ admired” related to (one aspect of) social status. Well, if all of your valence signals are unusually high or low, then presumably the signals related to liking /​ admiration wind up being unusually strong too. More concretely, suppose I have NPD, and I’m doing “splitting” where people are either wonderful (I very strongly like /​ admire them) or terrible (the opposite). Suppose further that I mentally model (by empathetic simulation) what other people think of me. My brain will implicitly assume that they’re splitting too, i.e. that they think that I’m either wonderful (they very strongly like /​ admire me) or terrible (the opposite), which in turn feels extremely motivating or aversive respectively, thanks to my “drive to be liked /​ admired” (doubly so, because that drive is an increasing function of both how much I like /​ admire them and how much I feel like they like /​ admire me).[3]

As far as I can tell, this cluster of symptoms (and more that I’ve omitted) is a decent match to NPD. I think it especially resonates with this thought-provoking essay by the late Emma Borhanian. (In fact, I was reading that essay when the hypothesis of this section first popped into my head. But my theory is different from Emma’s.)

Two more quick things:

Root causes? As in the previous sections, if “valence extremization” is a proximate cause of NPD, you may still be wondering what root cause leads to “valence extremization”. My answer is: I have no idea, sorry.

What’s the “opposite” of NPD? Food for thought: If mania and depression correspond to equal-and-opposite distortions of valence signals, then what would be the opposite of NPD, i.e. what would be a condition where valence signals stay close to neutral, rarely going either very positive or very negative? I don’t know, and maybe it doesn’t have a clinical label. One thing is: I would guess that it’s associated with a “high-decoupling” (as opposed to “contextualizing”) style of thinking.[4]

5.6 Conclusion

5.6.1 Conclusion of this post

I’ll reiterate that I’m very far from an expert on mental health or personality disorders, and this post is pretty speculative. I am blessed by a lack of real-world experience with depression, mania, or NPD; rather I’m trying to piece things together from stuff I’ve read. Hopefully there’s at least some food for thought here. As usual, please reach out (in the comments section or email) if you want to chat about this more!

5.6.2 Conclusion of the whole series

Thanks for sticking it out to the end! I hope that I have convinced you that valence is indeed an extraordinarily important part of everyday mental life, and that pondering valence for 26,000 words is a good way to illuminate and crystallize a wide variety of phenomena that might otherwise be confusing.

I started writing this series because I recently had two valence-related “aha” moments (the social status thing in Post 4, and the Narcissistic Personality Disorder thing in §5.5), and wanted to write a short post about them, and “valence” was a convenient hook that would tie them together and allow me to write about both at once. But that short post turned into a long post, and then a whole series, as I kept finding that, the more I thought about valence, the more phenomena I found that were just beautifully clicking into place!

As my regular readers know, my long-term work goal is researching alignment and safety for possible future brain-like Artificial General Intelligence (AGI). I have long been interested in Narcissistic Personality Disorder and social status drive (among many other things) because both seemed likely to shed some light on how human social instincts work, which in turn is connected to brain-like AGI safety for reasons briefly summarized here. Valence also has a more direct connection to AGI safety via understanding motivation—see my valence-based “plan for mediocre alignment”.

Unfortunately, I can’t say that writing this series has given me new concrete ideas for programming future safe & beneficial AGI, beyond what I already knew before I started. But I think I got some mental frameworks that will be useful going forward. In particular, I think §3.4 helps me think more clearly about what’s really going on with my “plan for mediocre alignment”. (As it happens, the update is in the pessimistic direction, although not very strongly. I may write about this in a separate post sometime.)

I also feel like I now have my “foot in the door” on how innate status drive works in the human brain, which is very exciting to me. Maybe that’s not directly something that we should put into AGIs (see §4.4.3), but I do think we might want our AGIs to have compassion. Unfortunately, the “innate compassion drive” is still pretty mysterious to me, as of this writing, but I think compassion drive might have structural overlap with status drive, in the specific sense that I expect both to rely centrally on transient empathetic simulations (more discussion here). So hopefully this “foot in the door” towards understanding innate status drive will ultimately constitute meaningful progress towards safe and beneficial AGI, even if it’s still several steps removed. To be explicit:

  • The next step might look like my fleshing out §4.4.2 into a theory of human innate status drive with a similar level of detail as my laughter post, i.e. getting all the way to specific pseudocode mapped to particular hypothesized neuroanatomical connections and logic.

  • Then the next step after that, with luck, might look like a somewhat-analogous hypothesis for whatever innate drives are upstream of compassion.

This is very high on my list of things to try in 2024! But it might take a long time, and/​or I might get stuck. See how it goes.

In contrast to status drive, I’m now much less interested in NPD and other personality disorders than I was before I came up with the §5.5 idea, and I’m correspondingly moving personality disorders much lower on my list of urgent research priorities. (I still have much more that I’d like to learn about them! Alas, there’s only so much time in the day.) An analogy: If someone is trying to understand the detailed mechanism of how car engines work, it’s not very useful for them to understand what goes wrong when they get a flat tire, even though a flat tire prevents the engine from accomplishing what it normally accomplishes (i.e., moving the car forward quickly). By the same token, my current guess is that further studying personality disorders would not offer much illumination into the nuts-and-bolts mechanisms underlying human social instincts. To be clear, I don’t think this guess was obvious a priori, and it still might be wrong.

Well, thanks again for reading! Again, please reach out (in the comments section or by email) if you want to talk about valence, this series, or whatever else.

Thanks to Seth Herd, Aysja Johnson, Justis Mills, Charlie Steiner, Adele Lopez, and Garrett Baker for critical comments on earlier drafts. Thanks tailcalled for some helpful discussions and references related to this post.

  1. ^

    This is getting off-topic, but I currently think that some cases of antisocial personality disorder involve globally low arousal levels (see here), and other cases involve being unusually quick to anger. At a root-cause level, these are wildly different—probably anticorrelated, if anything. But they have some superficial overlap of symptoms /​ presentation, so they get lumped together in clinical practice. (I’m very interested in feedback—does this hot-take ring true or false to you?)

  2. ^

    My current vague impression (e.g. based on this) is that BPD tends to involve “strong emotions” of all sorts, and extremized valence can happen incidentally as a consequence. Whereas I currently guess NPD is more centered around this valence story. I don’t know anything about HPD. I feel very uncertain about all of this, and enthusiastically welcome people’s ideas and discussion.

  3. ^

    Fine print: Perhaps I shouldn’t have said that NPD people have an “unusually strong drive to be liked /​ admired” per se; rather, they have a normal innate drive to be liked /​ admired in their brain, but the inputs feeding into this circuit are unusually strong, and thus the circuit sends unusually strong outputs.

  4. ^

    At this point, my contextualizer readers are saying “Hey, he’s insulting me! After all, NPD is bad, and now he’s saying decoupling is the diametric opposite of NPD, so he’s basically saying decoupling is good and therefore that contextualizing is bad and therefore that I’m bad! I resent that, sir!!” Hopefully it goes without saying that I don’t mean to imply that—after all, I’m a high-decoupler, I don’t think that way!