I would guess the issue with KL relates to the fact that a bound on DKL(P∥Q) permits situations where P(X=x) is small but Q(X=x) is large (as we take the expectation under P), whereas JS penalizes both ways.
In particular, in the original theorem on resampling using KL divergence, the assumption bounds KL w.r.t the joint distribution P(X,Λ), so there may be situation where the resampled probabilityQ(X=x,Λ=λ)=P(X=x)P(Λ=λ|X2=x2) is large but P(X=x,Λ=λ) is small. But the intended conclusion bounds the KL under the resampled distribution Q, so the error on the values (X=x,Λ=λ) would be weighted much more under Q than under P. Since we’re taking expectation under Q for the conclusion, the bound on the other resampling error under P becomes insufficient.
Congratulations!
I would guess the issue with KL relates to the fact that a bound on DKL(P∥Q) permits situations where P(X=x) is small but Q(X=x) is large (as we take the expectation under P), whereas JS penalizes both ways.
In particular, in the original theorem on resampling using KL divergence, the assumption bounds KL w.r.t the joint distribution P(X,Λ), so there may be situation where the resampled probabilityQ(X=x,Λ=λ)=P(X=x)P(Λ=λ|X2=x2) is large but P(X=x,Λ=λ) is small. But the intended conclusion bounds the KL under the resampled distribution Q, so the error on the values (X=x,Λ=λ) would be weighted much more under Q than under P. Since we’re taking expectation under Q for the conclusion, the bound on the other resampling error under P becomes insufficient.