Quantilizing can be thought of as maximizing a lower bound on the expected true utility, where you know that your true utility V is close to your proxy utility function U in some region γ, such that Ea∼γ|U(a)−V(a)|≤ϵ. If we shape this closeness assumption a bit differently, such that the approximation gets worse faster, then sometimes it can be optimal to cut off the top of the distribution (as I did here, see some of the diagrams for quantilizers with the top cut off, I’ll paste one below).
I’m not currently super happy with that story and I’m keen for people to look for alternatives, or variations of soft optimization with different types of knowledge about the relationship between the proxy and true utility. Because intuitively it does seem like taking the 99%ile action should be fine under slightly different assumptions.
One example of this is if we know that U=V+e, where e is some heavy tailed noise, and we know the distribution of e (and V), then we can calculate the actual optimal percentile action to take, and we should deterministically take that action. But this is sometimes quite sensitive to small errors in our knowledge about the distribution of e and particularly V. My AISC team has been testing scenarios like this as part of their research.