I feel most confused about the third and fourth problems. Choosing a yardstick could work to aggregate reward functions, but I still worry about the issue that this tends to overweight reward functions that assign a low value to the yardstick but high value to other outcomes. With widely-varying rewards, it seems hard to importance sample high-stakes decisions, without knowing what those decisions might be. Maybe if we notice a very large reward, we instead make it lower reward, but oversample it in the future? Something like this could potentially work, but I don’t see how yet.
For complex, expensive-to-evaluate rewards, Paul suggests using semi-supervised learning; this would be fine if semi-supervised learning was sufficient, but I worry that there actually isn’t enough information in just a few evaluations of the reward function to narrow down on the true reward sufficiently, which means that even conceptually we will need something else.
I feel most confused about the third and fourth problems. Choosing a yardstick could work to aggregate reward functions, but I still worry about the issue that this tends to overweight reward functions that assign a low value to the yardstick but high value to other outcomes. With widely-varying rewards, it seems hard to importance sample high-stakes decisions, without knowing what those decisions might be. Maybe if we notice a very large reward, we instead make it lower reward, but oversample it in the future? Something like this could potentially work, but I don’t see how yet.
For complex, expensive-to-evaluate rewards, Paul suggests using semi-supervised learning; this would be fine if semi-supervised learning was sufficient, but I worry that there actually isn’t enough information in just a few evaluations of the reward function to narrow down on the true reward sufficiently, which means that even conceptually we will need something else.