Reflective Consequentialism

Epistemic Status: My position makes sense to me and seems pretty plausible. However, I haven’t thought too hard about it, nor have I put it to the test by discussing with others. Putting it to the test is what I’m trying to do here. I also don’t have the best understanding of what virtue ethics (and/​or deontology) actually propose, so I might be mischaracterizing them.

There has been some talk recently, triggered by the FTX saga, about virtue ethics being good and consequentialism being bad. Eliezer talked about this years ago in Ends Don’t Justify Means (Among Humans), which was summed up in the following tweet:

The rules say we must use consequentialism, but good people are deontologists, and virtue ethics is what actually works.

Maybe I am misunderstanding, but something about this has always felt very wrong to me.

In the post, Eliezer starts by confirming that the rules do in fact say we must use consequentialism. Then he talks about how we are running on corrupted hardware, and so we can’t necessarily trust our perception of what will produce the best consequences. For example, cheating to seize power might seem like it is what will produce the best consequences — the most utilons, let’s say — but that seeming might just be your brain lying to you.

So far this makes total sense to me. It is well established that our hardware is corrupted. Think about the planning fallacy and how it really, truly feels like your math homework will only take you 20 minutes.

Or the phenomena of optical illusions. Squares A and B are the exact same shade of grey, but your brain tells you that they are quite different, and it really, truly feels that way to you.

But look: they are, in fact, the exact same shade of grey.

Again, corrupted hardware. Your brain lies to you. You can’t always trust it.

And this is where reflectiveness comes in. From The Lens That Sees Its Flaws:

If you can see this—if you can see that hope is shifting your first-order thoughts by too large a degree—if you can understand your mind as a mapping engine that has flaws—then you can apply a reflective correction. The brain is a flawed lens through which to see reality. This is true of both mouse brains and human brains. But a human brain is a flawed lens that can understand its own flaws—its systematic errors, its biases—and apply second-order corrections to them. This, in practice, makes the lens far more powerful. Not perfect, but far more powerful.

So far this is still all making sense to me. Our hardware is corrupted. But we’re fortunate enough to be able to reflect on this and apply second order corrections to this. It makes sense to apply such corrections.

But here is where I get confused, and perhaps diverge from the position that Eliezer and the large majority of the community take when they recommend virtue ethics. Virtue ethics, from what I understand, basically says to decide on rules ahead of time and stick to them. Rules that a good person would follow, like “don’t cheat to seize power”.

But on a human level, the patch seems straightforward. Once you know about the warp, you create rules that describe the warped behavior and outlaw it. A rule that says, “For the good of the tribe, do not cheat to seize power even for the good of the tribe.” Or “For the good of the tribe, do not murder even for the good of the tribe.”

Ends Don’t Justify Means (Among Humans)

The idea being that we are running on corrupted hardware, and following such rules would produce better consequences than trying to perform consequentialist calculus in the moment. If you did try to perform consequentialist calculus in the moment, your corrupted hardware would significantly bias you towards, eg. “normally it’s bad to cheat to seize power… but in this particular scenario it seems like it is worth it”.

That seems too extreme to me though. Instead of deciding ahead of time that “don’t cheat to seize power” is an absolute rule you must follow, why not just incorporate it into your thinking as a heuristic? Ie. when presented with a situation where you can cheat to seize power, 1) perform your first order consequentialist calculus, and then 2) adjust for the fact that you’re biased towards thinking you should cheat, and for the fact that not cheating is known to be a good heuristic.

I guess the distinction between what I am proposing and what virtue ethics proposes comes down to timing. Virtue ethics says to decide on rules ahead of time. I say that you should decide on heuristics ahead of time, but in the moment you can look at the specifics of the situation and make a decision, after adjusting for your corrupted hardware and incorporating your heuristics.

Let’s call my approach reflective consequentialism. Naive consequentialism would just take a first stab at the calculus and then go with the result. Reflective consequentialism would also take that first stab, but then it would:

  1. Reflect on the fact that your hardware is corrupted

  2. Make adjustments based on (1)

  3. Go with the result

With that distinction made, I suppose the question is whether reflective consequentialism or virtue ethics produces better consequences. And I suppose that is an empirical question. Which means that we can hypothesize.

My hypothesis is that virtue ethics would produce better consequences for most people. I don’t trust most people to be able to make adjustments in the moment. After all, there is strong evidence that they don’t do a good job of adjusting for other biases, even after being taught about biases.

However, I also hypothesize that for sufficiently strong people, reflective consequentialism would perform better. Think of it like a video game where your character can level up. Maybe you need to be past level 75 out of 100 in order for reflective consequentialism to yield better results for you. The sanity waterline is low enough where most people are moreso on something like level 13, but I’d guess that the majority of this community is leveled up high enough where reflective consequentialism would perform better.

To be clear, you can still lean very strongly away from trusting your first order instincts with reflective consequentialism. You can say to yourself, “Yeah… I know it seems like I should cheat to seize power, but I am significantly biased towards thinking that, so I am going to adjust correspondingly far in the opposite direction, which brings me to a point where not cheating is the very clear winner.” You just have to be a strong enough person to actually do that in the moment.

At the very least, I think that reflective consequentialism is something that we should aspire towards. If we accept that it yields better consequences amongst strong enough people, well, we should seek to become strong enough to be able to wield it as a tool. Tsuyoku Naritai.

Deciding that we are too weak to wield it isn’t something that we should be proud of. It isn’t something that we should be content with. Feeling proud and content in this context feels to me like an improper use of humility.

If the strength required to wield the tool of reflective consequentialism was immense, like the strength required to wield Thor’s hammer, then I wouldn’t hold this position. It’s implausible that a human being would come anywhere close to that level of strength. So then, it’d be wise to recognize this and appropriate to shoot someone a disapproving glance who is trying to wield the tool.

I don’t think this is where we’re at with reflective consequentialism though. I think the strength required to wield it is well within reach of someone who takes self-improvement moderately seriously.