I agree that it’s good to try to answer the question, under what sort of reliability guarantee is my model optimal, and it’s worth making the optimization power vs robustness trade off explicit via toy models like the one you use above.
That being said, re: the overall approach. Almost every non degenerate regularization method can be thought of as “optimal” wrt some robust optimization problem (in the same way that non degenerate optimization can be trivially cast as Bayesian optimization) -- e.g. the RL—KL objective with respect to some π0 is optimal the following minimax problem:
for some ϵ>0. So the question is not so much “do we cap the optimization power of the agent” (which is a pretty common claim!) but “which way of regularizing agent policies more naturally captures the robust optimization problems we want solved in practice”.
(It’s also worth noting that an important form of implicit regularization is the underlying capacity/capability of the model we’re using to represent the policy.)
I also think that it’s probably worth considering soft optimization to the old ImpactMeasures work from this community—in particular, I think it’d be interesting to cast soft optimization methods as robust optimization, and then see how the critiques raised against impact measures (e.g. in this comment or this question) apply to soft optimization methods like RL-KL or the minimax objective you outline here.
I also think that it’s probably worth considering soft optimization to the old ImpactMeasures work from this community—in particular, I think it’d be interesting to cast soft optimization methods as robust optimization, and then see how the critiques raised against impact measures (e.g. in this comment or this question) apply to soft optimization methods like RL-KL or the minimax objective you outline here.
Thanks for linking these, I hadn’t read most of these. As far as I can tell, most of the critiques don’t really apply to soft optimization. The main one that does is Paul’s “drift off the rails” thing. I expect we need to use the first AGI (with soft opt) to help solve alignment in a more permanent and robust way, then use that make a more powerful AGI that helps avoid “drifting off the rails”.
In my understanding, impact measures are an important part of the utility function that we don’t want to get wrong, but not much more than that. Whereas soft optimization directly removes Goodharting of the utility function. It feels like the correct formalism for attacking the root of that problem. Whereas impact measures just take care of a (particularly bad) symptom.
Abram Demski has a good answer to the question you linked that contrasts mild optimization with impact measures, and it’s clear that mild optimization is preferred. And Abram actually says:
An improvement on this situation would be something which looked more like a theoretical solution to Goodhart’s law, giving an (in-some-sense) optimal setting of a slider to maximize a trade-off between alignment and capabilities (“this is how you get the most of what you want”), allowing ML researchers to develop algorithms orienting toward this.
I agree that it’s good to try to answer the question, under what sort of reliability guarantee is my model optimal, and it’s worth making the optimization power vs robustness trade off explicit via toy models like the one you use above.
That being said, re: the overall approach. Almost every non degenerate regularization method can be thought of as “optimal” wrt some robust optimization problem (in the same way that non degenerate optimization can be trivially cast as Bayesian optimization) -- e.g. the RL—KL objective with respect to some π0 is optimal the following minimax problem:
for some ϵ>0. So the question is not so much “do we cap the optimization power of the agent” (which is a pretty common claim!) but “which way of regularizing agent policies more naturally captures the robust optimization problems we want solved in practice”.
Yep, agreed. Except I don’t understand how you got that equation from RL with KL penalties, can you explain that further?
I think the most novel part of this post is showing that this robust optimization problem (maximizing average utility while avoiding selection for upward errors in the proxy) is the one we want to solve, and that it can be done with a bound that is intuitively meaningful and can be determined without just guessing a number.
(It’s also worth noting that an important form of implicit regularization is the underlying capacity/capability of the model we’re using to represent the policy.)
Yeah I wouldn’t want to rely on this without a better formal understanding of it though. KL regularization I feel like I understand.
Thanks for writing this.
I agree that it’s good to try to answer the question, under what sort of reliability guarantee is my model optimal, and it’s worth making the optimization power vs robustness trade off explicit via toy models like the one you use above.
That being said, re: the overall approach. Almost every non degenerate regularization method can be thought of as “optimal” wrt some robust optimization problem (in the same way that non degenerate optimization can be trivially cast as Bayesian optimization) -- e.g. the RL—KL objective with respect to some π0 is optimal the following minimax problem:
min~r∈~R(π)Eπ[∑t~r(st,at)],~R(π)={~r|Eπ[∑tlog∫Aexp(r(st,a)−~r(st,a))π0(a) da]≤ϵ}
for some ϵ>0. So the question is not so much “do we cap the optimization power of the agent” (which is a pretty common claim!) but “which way of regularizing agent policies more naturally captures the robust optimization problems we want solved in practice”.
(It’s also worth noting that an important form of implicit regularization is the underlying capacity/capability of the model we’re using to represent the policy.)
I also think that it’s probably worth considering soft optimization to the old Impact Measures work from this community—in particular, I think it’d be interesting to cast soft optimization methods as robust optimization, and then see how the critiques raised against impact measures (e.g. in this comment or this question) apply to soft optimization methods like RL-KL or the minimax objective you outline here.
Thanks for linking these, I hadn’t read most of these. As far as I can tell, most of the critiques don’t really apply to soft optimization. The main one that does is Paul’s “drift off the rails” thing. I expect we need to use the first AGI (with soft opt) to help solve alignment in a more permanent and robust way, then use that make a more powerful AGI that helps avoid “drifting off the rails”.
In my understanding, impact measures are an important part of the utility function that we don’t want to get wrong, but not much more than that. Whereas soft optimization directly removes Goodharting of the utility function. It feels like the correct formalism for attacking the root of that problem. Whereas impact measures just take care of a (particularly bad) symptom.
Abram Demski has a good answer to the question you linked that contrasts mild optimization with impact measures, and it’s clear that mild optimization is preferred. And Abram actually says:
This is exactly what I’ve got.
Yep, agreed. Except I don’t understand how you got that equation from RL with KL penalties, can you explain that further?
I think the most novel part of this post is showing that this robust optimization problem (maximizing average utility while avoiding selection for upward errors in the proxy) is the one we want to solve, and that it can be done with a bound that is intuitively meaningful and can be determined without just guessing a number.
Yeah I wouldn’t want to rely on this without a better formal understanding of it though. KL regularization I feel like I understand.