You should’ve included a simple important point: on average, quantilizers that picks the top 1/Nth (e.g. top 10%) are guaranteed to be no worse on average than N times (e.g. 10 times) a human/whatever your “safe” distribution you are quantilizing over is, no matter how badly you specified the goal (with some caveats if you quantilize over a bunch of individual episodes (like each day of trading on the stock market), as then correlations between the individual episodes can multiply into something worse).
As a bonus: if you want to do no worse on average than N times the safe distribution for every (unknown) true goal, then the only way is to be a quantilizer.
Furthermore, approximating a quantilizer by sampling N examples from the safe distribution and picking randomly from the top 1/Nth has good guarantees on the approximation (idr how good). So, if you trust your predictive model to correctly predict the safe distribution (e.g. a LLM base model, before any reinforcement learning), you can practically make a quantilizer.
It really shouldn’t be hard to make a quantilizer LLM right now, though I expect the performance to be too inferior in practice.
You should’ve included a simple important point: on average, quantilizers that picks the top 1/Nth (e.g. top 10%) are guaranteed to be no worse on average than N times (e.g. 10 times) a human/whatever your “safe” distribution you are quantilizing over is, no matter how badly you specified the goal (with some caveats if you quantilize over a bunch of individual episodes (like each day of trading on the stock market), as then correlations between the individual episodes can multiply into something worse).
As a bonus: if you want to do no worse on average than N times the safe distribution for every (unknown) true goal, then the only way is to be a quantilizer.
Furthermore, approximating a quantilizer by sampling N examples from the safe distribution and picking randomly from the top 1/Nth has good guarantees on the approximation (idr how good). So, if you trust your predictive model to correctly predict the safe distribution (e.g. a LLM base model, before any reinforcement learning), you can practically make a quantilizer.
It really shouldn’t be hard to make a quantilizer LLM right now, though I expect the performance to be too inferior in practice.