Potential Research Topic: Vingean Reflection, Value Alignment and Aspiration

Epistemic Status: Potential research idea. Time-limited, so not as clear as it could have been.

Vingean reflection is the process of trying to anticipate how an agent smarter than you might think, in order to ensure that it will be aligned with your values. This is hard, because “if [an agent] could predict [a smarter agent’s] actions in detail, it would already be as smart as them.” Value Learning is the problem of trying to use machine learning to train an AI to care about what humans care about.

I haven’t read much about these problems, but they struck me as related to a concept introduced by philosopher Agnes Callard: “aspiration.” Her idea is that, sometimes, we come to care about things that we didn’t care about before, and, in particular, that: (1) this doesn’t happen all at once, and (2) we play an active role in the process. She argues in her book (which I haven’t read yet, but see the interview I just linked with her and Robert Wright) that in several different areas of philosophy (decision theory, moral psychology, and moral responsibility) the prevailing theories make assumptions that would render this process paradoxical or impossible.

To see what aspiration looks like, consider some value that you didn’t have before, but now do. Since I don’t know you, I’ll give a generic example, but substitute in whatever actually applies to you. Suppose you are now a gourmand, though you didn’t care much about good food when you were younger (this apparently happened to a friend of Callard’s). How did you get from there to here? Perhaps there was a moment where you first got excited about food (in the case of Callard’s friend, she took a trip to Ousaka, Japan). But this probably isn’t the whole story, at least not in many cases. This lucky, random encounter provided the first shove to get you onto the path towards being a gourmand, but it didn’t take you all the way. You got an inkling of the value of good food by having some in Oosaka, but you had to choose to cultivate this interest. But how is it possible to move yourself further along this path, without already knowing how a gourmand would value good food? It seems like if you care enough to want to get better at valuing good food, then you must already be the kind of person who cares about good food. And how can you critique your own taste without already having the sort of trained palette that future-you will (might) have? How can you improve, without being able to fully see the end of the path? And if you could fully see the end of the path, wouldn’t you already be there? (If this description seems unclear, it probably is, and I unfortunately don’t have the time to put into making it clearer; please go watch the Robert Wright interview to actually understand what’s going on).

Some clarifying points from the interview:

Wright: The paradox is: until you have a value, you don’t value it. So how does one get from the place of not valuing it at all, to suddenly valuing it?

...

Callard [clarifying]: The way I think about it is, how do you go from caring about it very little, to caring about it a little more; how do you increase your caring for something?

...

Wright: So you’re interested in the dynamics of the process itself—what sustains the transition and the progress?

...

Callard [later]: I’m saying there’s such a thing as self-creation [because your values are part of yourself, so if you have a hand in creating your values, then you have a hand in creating yourself].

This sounds a lot like the Vingean Reflection: if an agent could predict how future-them would act, they would already be future-them. It also sounds a lot like value learning: in a sense it is a type of value learning—learning the values that you want yourself to have, or the values that your potential future self has. There are obvious differences, but I think the similarities should also be apparent (especially if you’ve also watched the interview).

One of the prevailing methods of doing philosophy on LessWrong is this: for any philosophical concept, ask, “how would you build an AI that does that.” And I think that asking “how would you build an AI that could do aspiration?” sounds a lot like the problems of Vingean Reflection and Value Learning (or perhaps some combination of the two: learning to predict how a future version of you with better values would act, and emulate them in order to become them). I think an interesting research project would be to investigate to what extent Callard’s work on aspiration is relevant to solving Vingean reflection and the value learning problem. Unfortunately, I’m not in a position to do this myself right now, but I wanted to advertise that this was a possible research question, either for my future self (heh) or some other person. The main tasks would be to read Callard’s book, and the literatures on Vingean reflection and the Value Learning Problem, and see what fruitful connections can be made, if any. Again, apologies that I can’t lay out the research question more clearly; if I were in a position to do that (time-wise and expertise-wise) I would probably also be in a position to actually do the project, but I’m not (note, this need not be a long, protracted project; reading Callard’s book and the relevant literature could probably be done in a week of full-time work, give or take a few days depending on how much literature there is, and at that point one would be in a position to evaluate whether there were any fruitful connections to be drawn. And if one is already familiar with the VR and VL literatures, it might just take the time of reading Callard’s book and writing up relevant findings if any).

(Side note: I actually think the concept of aspiration may also have relevance to value drift, and movement growth, on both a personal and movement level: learning how to change one’s values may also provide insight on how to keep them stable, and learning how value change is possible may provide insight on how to shape other people’s values to be more aligned with EA. But I think Callard’s book doesn’t talk as much about the nitty-gritty of how aspiration works, but rather more about the philosophical problems it poses. My suggestion is that these problems seem very similar to the problems posed by Vingean Reflection/​Value Learning, and that looking at her solutions may provide new insight on these alignment problems. The movement-growth stuff would take more extrapolation from her book, I think).

(Also, if this would be better posted as a question, I’d be happy to repost it as one or have the mods do so.)