tom4everitt comments on Risks from Learned Optimization: Introduction

tom4everitt 9 Jun 2019 9:03 UTC
LW: 15 AF: 6
0
AF
What’s at stake here is: describing basically any system as an agent optimising some objective is going to be a leaky abstraction. The question is, how do we define the conditions of calling something an agent with an objective in such a way to minimise the leaks?
Indeed, this is a super slippery question. And I think this is a good reason to stand on the shoulders of a giant like Dennett. Some of the questions he has been tackling are actually quite similar to yours, around the emergence of agency and the emergence of consciousness.
For example, does it make sense to say that a tree is *trying to* soak up sun, even though it doesn’t have any mental representation itself? Many biologists would hesitate to use such language other than metaphorically.
In contrast, Dennett’s answer is yes: Basically, it doesn’t matter if the computation is done by the tree, or by the evolution that produced the tree. In either case, it is right to think of the tree as an agent. (Same goes for DQN, I’d say.)
There are other situations where the location of the computation matters, such as for consciousness, and for some “self-reflective” skills that may be hard to pre-compute.
Basically, I would recommend looking closer at Dennett to
- avoid reinventing the wheel (more than necessary), and
- connect to his terminology (since he’s so influential).
He’s a very lucid writer, so quite a joy to read him really. His most recent book Bacteria to Bach summarizes and references a lot of his earlier work.
I am just wary of throwing away seemingly relevant assumptions about internal structure before we can show they’re unhelpful.
Yes, starting with more assumptions is often a good strategy, because it makes the questions more concrete. As you say, the results may potentially generalize.
But I am actually unsure that DQN agents should be considered non-optimisers, in the sense that they do perform rudimentary optimisation: they take an argmax of the Q function.
I see, maybe PPO would have been a better example.
- Vlad Mikulik 9 Jun 2019 18:48 UTC
  LW: 4 AF: 3
  0
  AF Parent
  I’ve been meaning for a while to read Dennett with reference to this, and actually have a copy of Bacteria to Bach. Can you recommend some choice passages, or is it significantly better to read the entire book?
  
  P.S. I am quite confused about DQN’s status here and don’t wish to suggest that I’m confident it’s an optimiser. Just to point out that it’s plausible we might want to call it one without calling PPO an optimiser.
  
  P.P.S.: I forgot to mention in my previous comment that I enjoyed the objective graph stuff. I think there might be fruitful overlap between that work and the idea we’ve sketched out in our third post on a general way of understanding pseudo-alignment. Our objective graph framework is less developed than yours, so perhaps your machinery could be applied there to get a more precise analysis?
  - tom4everitt 14 Jun 2019 10:11 UTC
    LW: 6 AF: 4
    0
    AF Parent
    Chapter 4 in Bacteria to Bach is probably most relevant to what we discussed here (with preceding chapters providing a bit of context).
    Yes, it would interesting to see if causal influence diagrams (and the inference of incentives) could be useful here. Maybe there’s a way to infer the CID of the mesa-optimizer from the CID of the base-optimizer? I don’t have any concrete ideas at the moment—I can be in touch if I think of something suitable for collaboration!