Vlad Mikulik comments on Risks from Learned Optimization: Introduction

Vlad Mikulik 9 Jun 2019 18:48 UTC
LW: 4 AF: 3
0
AF
I’ve been meaning for a while to read Dennett with reference to this, and actually have a copy of Bacteria to Bach. Can you recommend some choice passages, or is it significantly better to read the entire book?

P.S. I am quite confused about DQN’s status here and don’t wish to suggest that I’m confident it’s an optimiser. Just to point out that it’s plausible we might want to call it one without calling PPO an optimiser.

P.P.S.: I forgot to mention in my previous comment that I enjoyed the objective graph stuff. I think there might be fruitful overlap between that work and the idea we’ve sketched out in our third post on a general way of understanding pseudo-alignment. Our objective graph framework is less developed than yours, so perhaps your machinery could be applied there to get a more precise analysis?
- tom4everitt 14 Jun 2019 10:11 UTC
  LW: 6 AF: 4
  0
  AF Parent
  Chapter 4 in Bacteria to Bach is probably most relevant to what we discussed here (with preceding chapters providing a bit of context).
  Yes, it would interesting to see if causal influence diagrams (and the inference of incentives) could be useful here. Maybe there’s a way to infer the CID of the mesa-optimizer from the CID of the base-optimizer? I don’t have any concrete ideas at the moment—I can be in touch if I think of something suitable for collaboration!