I think you misunderstood how the iterated quantilization works. It does not work by the AI setting a long-term goal and then charting a path towards that goal s.t. it doesn’t deviate too much from the baseline over every short interval. Instead, every short-term quantilization is optimizing for the user’s evaluation in the end of this short-term interval.
Ah. I indeed misunderstood, thanks :) I’d read “short-term quantilization” as quantilizing over short-term policies evaluated according to their expected utility. My story doesn’t make sense if the AI is only trying to push up the reported value estimates (though that puts a lot of weight on these estimates).
I think you misunderstood how the iterated quantilization works. It does not work by the AI setting a long-term goal and then charting a path towards that goal s.t. it doesn’t deviate too much from the baseline over every short interval. Instead, every short-term quantilization is optimizing for the user’s evaluation in the end of this short-term interval.
Ah. I indeed misunderstood, thanks :) I’d read “short-term quantilization” as quantilizing over short-term policies evaluated according to their expected utility. My story doesn’t make sense if the AI is only trying to push up the reported value estimates (though that puts a lot of weight on these estimates).