Christiano decision theory excerpt

A transcribed excerpt I found interesting from the decision theory outtake from 80,000 Hours’ second interview of Paul Christiano (starting at 14:00):

Robert Wiblin: It seems like philosophers have not been terribly interested in these heterodox decision theories. They just seem to not have been persuaded, or not even to really have engaged with them a great deal. What do you think is going on there?

Paul Christiano: I think there’s a few things going on. I don’t think it’s right to see a spectrum with CDT and then EDT and then UDT. I think it’s more right to see a box, where there’s the updatelessness axis and then there’s the causal vs. evidential axis. And the causal vs. evidential thing I don’t think is as much of a divide between geography, or between philosophers and rationalists.

CDT is the majority position in philosophy, but not an overwhelming majority; there are reasonable philosophers who defend EDT. EDT is a somewhat more common view, I think, in the world — certainly in the prisoner’s dilemma with a twin case, I think most people would choose “cooperate” if you just poll people on the street. Again, not by a large majority.

And amongst the rationalists, there’s also not agreement on that. Where I would say Eliezer has a view which I would describe as the causal view, not the evidential view. But he accepts updatelessness.

So on the causal vs. evidential split, I think it’s not so much geographical. It’s more just like everyone is fairly uncertain. Although I think it is the case maybe people in the Bay are more on evidential decision theory and philosophers are more on causal decision theory.

Then on the updatelessness thing, I think a lot of that is this semantic disagreement / this understanding of “what is the project of decision theory?” Where if you’re building AI systems, the point of decision theory is to understand how we make decisions such that we can encode it into a machine.

And if you’re doing that, I think there’s not actually a disagreement about that question. Like, once you’re really specific about what is meant. No one thinks you should make — well, I don’t… Hopefully no one thinks you should make an AI that uses causal decision theory. Causal decision theorists would not recommend making such an AI, because it has bad consequences to make such an AI.

So I think once we’re talking about “How do you make an AI?”, if you’re saying, “I want to understand which decision theory my AI ought to use — like, how my AI ought to use the word ‘right’” — everyone kind of agrees about that, that you should be more in the updateless regime.

It’s more like a difference in, “What are the questions that are interesting, and how should we use language?” Like, “What do concepts like ‘right’ mean?”

And I think there is a big geographical and community distinction there, but I think it’s a little bit less alarming than it would be if it were like, “What are the facts of the case?” It’s more like, “What are the questions that are interesting?” — which, like, people are in fact in different situations. “How should we use language?” — which is reasonable. The different communities, if they don’t have to talk that much, it’s okay, they can just evolve to different uses of the language.

Robert Wiblin: Interesting. So you think the issue is to a large degree semantic.

Paul Christiano: For updatelessness, yeah.

Robert Wiblin: So you think it’s the case that when it comes to programming an AI, there’s actually a lot of agreement on what kind of decision theory it should be using in practice. Or at least, people agree that it needn’t be causal decision theory, even though philosophers think in some more fundamental sense causal decision theory is the right decision theory.

Paul Christiano: I think that’s right. I don’t know exactly what… I think philosophers don’t think that much about that question. But I think it’s not a tenable position, and if you really got into an argument with philosophers it wouldn’t be a tenable position that you should program your AI to use causal decision theory.

Robert Wiblin: So is it the case that people started thinking about this in part because they were thinking, “Well, what decision theory should we put in an AI?” And then they were thinking, “Well, what decision theory should I commit to doing myself? Like, maybe this has implications in my own life.” Is that right?

Paul Christiano: Yeah, I think that’s how the rationalists got to thinking about this topic. They started thinking about AI, then maybe they started thinking about how humans should reason insofar as humans are, like, an interesting model. I’m not certain of that, but I think that’s basically the history for most of them.

Robert Wiblin: And it seems like if you accept one of the updateless decision theories, then potentially it has pretty big implications for how you ought to live your life. Is that right?

Paul Christiano: Yeah, I think there are some implications. Again, they’re mostly going to be of this form, “You find yourself in a situation. You can do a thing that is bad for yourself in that situation but good for yourself in other situations. Do you do the thing that’s bad right now?” And you would have committed to do that. And maybe if you’ve been thinking a lot about these questions, then you’ve had more opportunity to reason yourself into the place where you would actually take that action which is locally bad for yourself.

And again, I think philosophers could agree. Most causal decision theorists would agree that if they had the power to stop doing the right thing, they should stop taking actions which are right. They should instead be the kind of person that you want to be.

And so there, again, I agree it has implications, but I don’t think it’s a question of disagreement about truth. It’s more a question of, like: you’re actually making some cognitive decisions. How do you reason? How do you conceptualize what you’re doing?

There are maybe also some actual questions about truth, like: If I’m in the Parfit’s hitchhiker case, and I’m imagining I’m at the ATM, I’m deciding whether to take the money out. I’m kind of more open to a perspective that’s like, “Maybe I’m at the ATM, deciding whether to take the money out. But maybe I’m inside the head of someone reasoning about me.”

Maybe I’m willing to ascribe more consciousness to things being imagined by some other person. And that’s not necessarily a decision theory disagreement, but it’s more of a disagreement-about-facts case. And it’s a kind of perspective that makes it easier to take that kind of action.

Robert Wiblin: Yeah, so… Before we imagine that we’re potentially in people’s imaginations, in your writing, you seem very interested in integrity and honesty. And it seems in part like this is informed by your views on decision theory.

For example, it seems like you’re the kind of person to be like, “No, absolutely, I would always pay the money at the ATM,” in the hitchhiker case. Do you want to explain, how has decision theory affected your life? What concrete implications do you think it should potentially have for listeners?

Paul Christiano: Yeah, so I think there are a lot of complicated empirical questions that bear on almost every one of these practical decisions. And we could talk about what fraction of the variation is explained by considerations about decision theory.

And maybe I’m going to go for, like, a third of variation is explained by decision theory stuff. And I think there a lot of it is also not the core philosophical issues, but just chugging through a lot of complicated analysis. I think it’s surprising that it gets as complicated as it does.

My basic take-away from that kind of reasoning is that a reasonable first-pass approximation, in many situations, is to behave as if, when making a decision, consider not only the direct consequences of that decision, but also the consequences of other people having believed that you would make that decision.

So, for example, if you’re deciding whether to divulge something told to you in confidence, you’re like, “Well, on the one hand, divulging this fact told to me in confidence is good. On the other hand, if I were known to have this policy, I wouldn’t have learned such information in the first place.” This gives you a framework in which to weigh up the costs and benefits there.

So that’s my high-level take-away, first-order approximation. Which I think a priori looks kind of silly — maybe there’s a little bit to it, but it probably is not going to hold up. And then the more I’ve dug in, the more I feel like when all the complicated analysis shakes out, you end up much closer to that first-order theory than you would have guessed.

And I think that summarizes most of my takeaway from weird decision theory stuff.

Maybe the other question is: There’s some “How nice to be to other value systems?” where I come down on a more nice perspective, which maybe we’ll touch a little bit on later. This gets weirder.

But those are maybe the two things. How much should you sympathize with value systems that you don’t think are intrinsically valuable, just based on some argument of the form, “Well, I could have been in there position and they could have been in mine”? And then: how much should you behave as if, by deciding, you were causing other people to know what you would have decided?