jessicata comments on Informed oversight through an entropy-maximization objective

jessicata 21 Mar 2016 1:14 UTC
0 points
0
AF
The approach I have in mind is (roughly) to let the agent output some number of bits of garbage, but penalize for the number of bits of garbage (so generating additional uniformly random garbage doesn’t make a difference to the score). I think this can be done using autoencoders (use layer $n + 1$ to compress layer $n$ into a small number of bits of garbage). It’s not clear whether this approach is practical for complex agents, though.
- paulfchristiano 21 Mar 2016 1:37 UTC
  0 points
  0
  AF Parent
  In the OWF example, the garbage is necessarily low-entropy though (at least $k$ bits short on entropy, where $k$ is the size of advice needed to invert the OWF). Right?
  - jessicata 21 Mar 2016 2:49 UTC
    0 points
    0
    AF Parent
    Yes, that seems right. So this won’t work for that example.