Vaniver comments on Richard Ngo’s Shortform

Vaniver 1 May 2020 23:49 UTC
LW: 2 AF: 1
AF
To be pedantic: we care about “consequence-desirability-maximisers” (or in Rohin’s terminology, goal-directed agents) because they do backwards assignment.
Valid point.
But I think the pedantry is important, because people substitute utility-maximisers for goal-directed agents, and then reason about those agents by thinking about utility functions, and that just seems incorrect.
This also seems right. Like, my understanding of what’s going on here is we have:
- ‘central’ consequence-desirability-maximizers, where there’s a simple utility function that they’re trying to maximize according to the VNM axioms
- ‘general’ consequence-desirability-maximizers, where there’s a complicated utility function that they’re trying to maximize, which is selected because it imitates some other behavior
The first is a narrow class, and depending on how strict you are with ‘maximize’, quite possibly no physically real agents will fall into it. The second is a universal class, which instantiates the ‘trivial claim’ that everything is utility maximization.
Put another way, the first is what happens if you hold utility fixed / keep utility simple, and then examine what behavior follows; the second is what happens if you hold behavior fixed / keep behavior simple, and then examine what utility follows.
Distance from the first is what I mean by “the further a robot’s behavior is from optimal”; I want to say that I should have said something like “VNM-optimal” but actually I think it needs to be closer to “simple utility VNM-optimal.”
I think you’re basically right in calling out a bait-and-switch that sometimes happens, where anyone who wants to talk about the universality of expected utility maximization in the trivial ‘general’ sense can’t get it to do any work, because it should all add up to normality, and in normality there’s a meaningful distinction between people who sort of pursue fuzzy goals and ruthless utility maximizers.