Note also that v0(P) is maximized when P has full support on the distribution of Q and when s has a high average on P. That is, it’s at most ϵ from maximized when P is (1−ϵ) times a delta function on an s-maximizing point, plus ϵ times the distribution of Q.
So v0 essentially corresponds to a raw maximizer, and vα for 0<α<1 interpolates between maximizing and softmax.
Note also that v0(P) is maximized when P has full support on the distribution of Q and when s has a high average on P. That is, it’s at most ϵ from maximized when P is (1−ϵ) times a delta function on an s-maximizing point, plus ϵ times the distribution of Q.
So v0 essentially corresponds to a raw maximizer, and vα for 0<α<1 interpolates between maximizing and softmax.