Would it help if we relaxed to accepting probabilistic evidence, a proof that the odds of a successor at generation n+1 accepting chocolate if the model at generation n did was provably greater than 1 - epsilon_n, for some series of epsilon_n such that the product series of all the (1 - epsilon_n) lower bound success chances converges to a number that is still almost one — i.e. if we’re “almost” sure that the successors will always keep accepting chocolate? Many people might accept a P(DOOM) that was provably sufficiently low, but not provably zero.
Meta note: Thanks for your comment! I failed to reply to this for a number of days, since I was confused about how to do that in the context of this post. Still though I think it’s relevant about probabilistic reasoning, and I’ve now offered my thoughts in the other replies.
In a more realistic and complicated setting, we may definitely want to be obtaining a high probability under some distribution we trust to be well-grounded, as our condition for a chain of trust. In terms of the technical difficulty I’m interested in working through, I think it should be possible to get satisfying results about proving that another proof system is correct, and whatnot, without needing to invoke probability distributions. To the extent that you can make things work with probabilistic reasoning, I think they can also be made to work in a logic setting, but we’re currently missing some pieces.
Anyhow, regarding probability distributions, there’s some philosophical difficulty in my opinion about “grounding”. Specifically, what reason should I have to trust that the probability distribution is doing something sensible around my safety questions of interest? How did we construct things such that it was?
The best approach I’m aware of to building a computable (but not practical) distribution with some “grounding” results is logical induction / Garrabrant induction. They come with have a self-trust result of the form that logical inductors will, across time, converge to predicting their future selves’ probabilities agree with their current probabilities. If I understand correctly, this includes limiting toward predicting a conditional probability p for an event X if we are given that the future inductor assigns probability p.
...however, as I understand, there’s still scope for any probability distributions we try to base on logical inductors to be “ungrounded”, in that we only have a guarantee that ungrounded/adversarial perturbations must be “finite” across the limit to infinity.
Would it help if we relaxed to accepting probabilistic evidence, a proof that the odds of a successor at generation n+1 accepting chocolate if the model at generation n did was provably greater than 1 - epsilon_n, for some series of epsilon_n such that the product series of all the (1 - epsilon_n) lower bound success chances converges to a number that is still almost one — i.e. if we’re “almost” sure that the successors will always keep accepting chocolate? Many people might accept a P(DOOM) that was provably sufficiently low, but not provably zero.
Meta note: Thanks for your comment! I failed to reply to this for a number of days, since I was confused about how to do that in the context of this post. Still though I think it’s relevant about probabilistic reasoning, and I’ve now offered my thoughts in the other replies.
In a more realistic and complicated setting, we may definitely want to be obtaining a high probability under some distribution we trust to be well-grounded, as our condition for a chain of trust. In terms of the technical difficulty I’m interested in working through, I think it should be possible to get satisfying results about proving that another proof system is correct, and whatnot, without needing to invoke probability distributions. To the extent that you can make things work with probabilistic reasoning, I think they can also be made to work in a logic setting, but we’re currently missing some pieces.
Anyhow, regarding probability distributions, there’s some philosophical difficulty in my opinion about “grounding”. Specifically, what reason should I have to trust that the probability distribution is doing something sensible around my safety questions of interest? How did we construct things such that it was?
The best approach I’m aware of to building a computable (but not practical) distribution with some “grounding” results is logical induction / Garrabrant induction. They come with have a self-trust result of the form that logical inductors will, across time, converge to predicting their future selves’ probabilities agree with their current probabilities. If I understand correctly, this includes limiting toward predicting a conditional probability p for an event X if we are given that the future inductor assigns probability p.
...however, as I understand, there’s still scope for any probability distributions we try to base on logical inductors to be “ungrounded”, in that we only have a guarantee that ungrounded/adversarial perturbations must be “finite” across the limit to infinity.
Here is something more technical on the matter that I alas haven’t made the personal effort to read through: https://www.lesswrong.com/posts/5bd75cc58225bf067037556d/logical-inductor-tiling-and-why-it-s-hard