paulfchristiano comments on Formal Solution to the Inner Alignment Problem

paulfchristiano 23 Feb 2021 16:32 UTC
LW: 2 AF: 2
0
AF
I think this is doable with this approach, but I haven’t proven it can be done, let alone said anything about a dependence on epsilon. The closest bound I show not only has a constant factor of like 40; it depends on the prior on the truth too. I think (75% confidence) this is a weakness of the proof technique, not a weakness of the algorithm.
I just meant the dependence on epsilon, it seems like there are unavoidable additional factors (especially the linear dependence on p(treachery)). I guess it’s not obvious if you can make these additive or if they are inherently multipliactive.
But your bound scales in some way, right? How much training data do I need to get the KL divergence between distributions over trajectories down to epsilon?
- michaelcohen 24 Feb 2021 0:26 UTC
  LW: 1 AF: 1
  0
  AF Parent
  No matter how much data you have, my bound on the KL divergence won’t approach zero.