Forcing Yourself is Self Harm, or Don’t Goodhart Yourself

I recently wrote about how forcing yourself to keep your identity small harms you in the same way suppressing emotions is self harm. I got two kinds of criticism on the piece. One was that people seemed to read “actively working to keep your identity small is bad” as “keeping your identity small is bad” and disliked what they perceived as a contrary position on advice they like. This was at least partially my fault because the original title of the post didn’t make this very clear. The other criticism was that I had made a general argument that “forcing yourself to do X is self harm” but had used it only to address keeping your identity small.

So, here is a better titled and generalized version of that post.

There are any number of virtuous things we might strive for. Here’s a small list of the kinds of things I have in mind:

  • productivity

  • being nice

  • eating healthy

  • exercising regularly

  • getting enough sleep

  • keeping your identity small

  • meditating

  • learning something you considers important, like a foreign language or math

  • writing more

  • keeping the house clean

A tempting strategy to achieve these things is some variant of forcing, striving, trying real hard, or otherwise just actively and directly pushing yourself to achieve the goal. This typically looks a few ways:

  • punishing yourself when you fail

  • creating rewards for yourself when you succeed

  • beating yourself up for not being better

  • creating incentives to “trick” yourself

  • exerting self control or willpower

The problem is that these methods all have a common fatal flaw: they require optimizing towards a measured target and so are subject to Goodhart effects.

Here’s an illustrative story of what can happen:

You want to be more productive, so you start measuring how productive you are. The metric used isn’t that important for this example, so let’s say it’s something like words written per hour. Using a spreadsheet and a simple word counting program, you capture this information and find that you can write about 500 words of useable content per hour at your most productive. Great, now you can start holding yourself accountable and expecting, let’s say, at least 400 words per hour.

Uh oh, it’s a Tuesday, you look at the clock, and you realize you’ve only written 100 words in the last hour. Well, that’s okay, you’ll do better next hour. But another hour passes and you only wrote 100 more words. This is looking bad. The day goes on like this. You go to bed thinking “well, I’ll do better tomorrow”.

The days pass and you keep having bouts where you don’t meet the target. You start to feel some mental anguish at the thought of sitting down to write because you reasonably predict that you might not meet your target. You feel less motivated to write, and find yourself in a positive feedback loop, writing less because you are distracted feeling bad about not writing enough. Pretty soon you rarely have productive periods of writing where you crank out more than 400 words in an hour.

Now you want to write, but hesitate to because of the pain of failing to meet your own expectations. You agonize over the cognitive dissonance of both wanting to write and wanting not to fail to meet your target. Absent some external motivation, you just stop writing, because the joy of producing words doesn’t compare with the distress of not producing enough words. You’ve forgotten you started out trying to be more productive and are now trapped doing less than nothing, because you spend time pining for the days when you would write anything at all.

Most people have been through something like this, but replace writing and words with learning and grades, work and money, love and dates, or health and calories.

A common way to think about what happened is to put it in terms of multi-agent models of mind. Internal Family Systems provides a good model here: some parts of your mind try to protect you by “exiling” other parts of you mind that want things you think are bad. In the story above it was the part that would notice you didn’t hit your writing target. If one of those exiles tries to “sneak back in” (e.g. you think you can write, which implicitly means letting back in the part that notices how much you wrote), the protecting part of you uses a host of techniques like guilt, forgetfulness, and anger to keep the exile out. Over time this exiling can lead to dissociation at best and cognitive fusion at worst, causing you to become confused about what’s happening in your own mind as a feature of protecting yourself from doing “bad” things (e.g. you want to write and don’t know why you don’t or how to change the situation).

As an alternative to a multi-agent model, you can also think of this simply in terms of competing desires and beliefs with some overpowering others because they are not in complete reflective equilibrium. There’s also probably a predictive processing take on what’s happening, but I’ve not worked out the details, so I’ll leave it as an exercise to the reader. The point is that, however you conceptualize what’s happening, by trying to force yourself to be a particular way you harm yourself by engaging mental behaviors that will make you less aware of yourself and so ultimately less fluid and agenty because that’s the most direct way to achieve what you want, i.e. you never fail if you never try.

This leaves us with two issues to address: what should you do if you’ve already harmed yourself in the way described, and what should you do instead of Goodharting?

To answer the first question, a few things seem helpful here. If you are out of touch with yourself, especially your emotions or feelings, focusing seems like a great choice. Even if you don’t think you’re cut off from your emotions, focusing can still be a good thing to try because sometimes a person can become so cutoff they don’t realize they are, akin to not realizing you can be well rested because work, school, etc. schedules keep you sleep deprived for years and you forget that there’s another way. Other simple interventions you can try yourself include journaling and long, solitary walks that may help you discover what you’ve lost connection with in yourself.

If those don’t help, therapy may be necessary, as the therapist can act as a mirror to help you see yourself in ways you can’t on your own. Internal Family Systems, for example, is not just a model of the psyche but a theory of psychotherapy, and a therapist can help with reintegrating parts.

I’d also be remiss if I didn’t suggest that meditation practice can help, because it can, but it requires a large commitment to get results. Ten minutes a day of mindfulness meditation with an app is nice, but it’s unlikely to help you find a path out of the kind of self-harm discussed above, so when I suggest meditation I have in mind something more on the order of daily practice that involves a community and a teacher.

As to what you should do instead of Goodharting yourself by forcing yourself to do things, I’d generally describe the alternative strategy as nonmonotonically seeking Pareto improvements. In less jargony terms, that’s trying to find ways to make yourself broadly rather than narrowly better, accepting that you might get worse at some things for a while as you explore the space, and achieving virtuous things as a downstream consequence or side-effect of becoming generally more virtuous across many dimensions simultaneously. This can be frustrating because the feedback loop is often long and you have to trust in and have patience with yourself to keep going even when it’s not clear what to do next or when you’ll see results.

Ways you might carry such a program out include self-improvement through things like positive psychology, a mentorship relationship with someone who can help guide you, therapy, and, as you might expect me to suggest, Zen or another Buddhist practice. This is the path of becoming more virtuous overall so that necessarily you’re more virtuous along any particular dimension you choose to look.

The best endorsement of this approach I can offer is myself. In my teens and twenties I tried to narrowly optimize myself in various ways to correct what I felt were defects, like not reading enough, not learning math fast enough, not caring for other people enough, and spending too much time doing frivolous things. The result was that I was a mess of depression and anxiety and profoundly out of touch with my emotions and desires. Things got better only after I failed to force myself to achieve something for the Nth time and finally accepted that it was a failing strategy for living a life I would find fulfilling. At that point I stepped back and just started taking care of myself, and slowly discovered that focusing on general capacity building worked way better and undid much of the damage I had done to myself.

Now in my late 30s I’m living the best life I have so far: I’m doing things I find fulfilling, I’m happy, and I seem to keep getting better at things I was previously bad at. Past performance may be no guarantee of future gains, but given the natural experiment I carried out on myself it certainly seems that, at least for me, the theory that focusing on Pareto improvements over narrow improvements is better for mental health has proven correct. Maybe it will be for you, too.