In a more realistic and complicated setting, we may definitely want to be obtaining a high probability under some distribution we trust to be well-grounded, as our condition for a chain of trust. In terms of the technical difficulty I’m interested in working through, I think it should be possible to get satisfying results about proving that another proof system is correct, and whatnot, without needing to invoke probability distributions. To the extent that you can make things work with probabilistic reasoning, I think they can also be made to work in a logic setting, but we’re currently missing some pieces.
Anyhow, regarding probability distributions, there’s some philosophical difficulty in my opinion about “grounding”. Specifically, what reason should I have to trust that the probability distribution is doing something sensible around my safety questions of interest? How did we construct things such that it was?
The best approach I’m aware of to building a computable (but not practical) distribution with some “grounding” results is logical induction / Garrabrant induction. They come with have a self-trust result of the form that logical inductors will, across time, converge to predicting their future selves’ probabilities agree with their current probabilities. If I understand correctly, this includes limiting toward predicting a conditional probability p for an event X if we are given that the future inductor assigns probability p.
...however, as I understand, there’s still scope for any probability distributions we try to base on logical inductors to be “ungrounded”, in that we only have a guarantee that ungrounded/adversarial perturbations must be “finite” across the limit to infinity.
In a more realistic and complicated setting, we may definitely want to be obtaining a high probability under some distribution we trust to be well-grounded, as our condition for a chain of trust. In terms of the technical difficulty I’m interested in working through, I think it should be possible to get satisfying results about proving that another proof system is correct, and whatnot, without needing to invoke probability distributions. To the extent that you can make things work with probabilistic reasoning, I think they can also be made to work in a logic setting, but we’re currently missing some pieces.
Anyhow, regarding probability distributions, there’s some philosophical difficulty in my opinion about “grounding”. Specifically, what reason should I have to trust that the probability distribution is doing something sensible around my safety questions of interest? How did we construct things such that it was?
The best approach I’m aware of to building a computable (but not practical) distribution with some “grounding” results is logical induction / Garrabrant induction. They come with have a self-trust result of the form that logical inductors will, across time, converge to predicting their future selves’ probabilities agree with their current probabilities. If I understand correctly, this includes limiting toward predicting a conditional probability p for an event X if we are given that the future inductor assigns probability p.
...however, as I understand, there’s still scope for any probability distributions we try to base on logical inductors to be “ungrounded”, in that we only have a guarantee that ungrounded/adversarial perturbations must be “finite” across the limit to infinity.
Here is something more technical on the matter that I alas haven’t made the personal effort to read through: https://www.lesswrong.com/posts/5bd75cc58225bf067037556d/logical-inductor-tiling-and-why-it-s-hard