On point 2, which is the only one I can really comment on, yes, this seems like a useful paper, and I buy the argument that such an approach is critical for some purposes, including some of what we discussed on Goodhart’s Law—https://arxiv.org/abs/1803.04585 - where one class of misalignment can be explicitly addressed by your approach. Also see the recent paper here: https://arxiv.org/abs/1905.12186 that explicitly models causal dependencies (like in figure 2,) to show a safety result.
On point 2, which is the only one I can really comment on, yes, this seems like a useful paper, and I buy the argument that such an approach is critical for some purposes, including some of what we discussed on Goodhart’s Law—https://arxiv.org/abs/1803.04585 - where one class of misalignment can be explicitly addressed by your approach. Also see the recent paper here: https://arxiv.org/abs/1905.12186 that explicitly models causal dependencies (like in figure 2,) to show a safety result.