I think this is doable with this approach, but I haven’t proven it can be done, let alone said anything about a dependence on epsilon. The closest bound I show not only has a constant factor of like 40; it depends on the prior on the truth too. I think (75% confidence) this is a weakness of the proof technique, not a weakness of the algorithm.
I just meant the dependence on epsilon, it seems like there are unavoidable additional factors (especially the linear dependence on p(treachery)). I guess it’s not obvious if you can make these additive or if they are inherently multipliactive.
But your bound scales in some way, right? How much training data do I need to get the KL divergence between distributions over trajectories down to epsilon?
I just meant the dependence on epsilon, it seems like there are unavoidable additional factors (especially the linear dependence on p(treachery)). I guess it’s not obvious if you can make these additive or if they are inherently multipliactive.
But your bound scales in some way, right? How much training data do I need to get the KL divergence between distributions over trajectories down to epsilon?
No matter how much data you have, my bound on the KL divergence won’t approach zero.