Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo 1 May 2020 10:56 UTC
LW: 4 AF: 2
AF
Instead I read it as something like “some unreasonable percentage of an agent’s actions are random”
This is in fact the intended reading, sorry for ambiguity. Will edit. But note that there are probably very few situations where exploring via actual randomness is best; there will almost always be some type of exploration which is more favourable. So I don’t think this helps.
We care about utility-maximizers because they’re doing their backwards assignment, using their predictions of the future to guide their present actions to try to shift the future to be more like what they want it to be.
To be pedantic: we care about “consequence-desirability-maximisers” (or in Rohin’s terminology, goal-directed agents) because they do backwards assignment. But I think the pedantry is important, because people substitute utility-maximisers for goal-directed agents, and then reason about those agents by thinking about utility functions, and that just seems incorrect.
And so if I read the original post as “the further a robot’s behavior is from optimal, the less likely it is to demonstrate convergent instrumental goals”
What do you mean by optimal here? The robot’s observed behaviour will be optimal for some utility function, no matter how long you run it.
- Vaniver 1 May 2020 23:49 UTC
  LW: 2 AF: 1
  AF Parent
  To be pedantic: we care about “consequence-desirability-maximisers” (or in Rohin’s terminology, goal-directed agents) because they do backwards assignment.
  Valid point.
  But I think the pedantry is important, because people substitute utility-maximisers for goal-directed agents, and then reason about those agents by thinking about utility functions, and that just seems incorrect.
  This also seems right. Like, my understanding of what’s going on here is we have:
  - ‘central’ consequence-desirability-maximizers, where there’s a simple utility function that they’re trying to maximize according to the VNM axioms
  - ‘general’ consequence-desirability-maximizers, where there’s a complicated utility function that they’re trying to maximize, which is selected because it imitates some other behavior
  The first is a narrow class, and depending on how strict you are with ‘maximize’, quite possibly no physically real agents will fall into it. The second is a universal class, which instantiates the ‘trivial claim’ that everything is utility maximization.
  Put another way, the first is what happens if you hold utility fixed / keep utility simple, and then examine what behavior follows; the second is what happens if you hold behavior fixed / keep behavior simple, and then examine what utility follows.
  Distance from the first is what I mean by “the further a robot’s behavior is from optimal”; I want to say that I should have said something like “VNM-optimal” but actually I think it needs to be closer to “simple utility VNM-optimal.”
  I think you’re basically right in calling out a bait-and-switch that sometimes happens, where anyone who wants to talk about the universality of expected utility maximization in the trivial ‘general’ sense can’t get it to do any work, because it should all add up to normality, and in normality there’s a meaningful distinction between people who sort of pursue fuzzy goals and ruthless utility maximizers.