a uniform weighting of mathematical structures in a Tegmark-like ’verse
I don’t know what this is supposed to mean. There isn’t any uniform distribution over a countably infinite set. We can have a continuous uniform distribution over certain special kinds of uncountable sets (for example, the real unit interval [0,1]), but somehow I doubt that this was the intended reading.
(I’m curious, why is it countably infinite rather than uncountably infinite?) I had assumed, perhaps quite ridiculously, that there was some silly improper or uniform-like distribution, then for shorthand just called that prior a uniform distribution. The reason I’d assumed that is because I remembered Tegmark not seeming to be worried about the problem in one of his longer more-detailed multiverse papers despite saying he wasn’t fond of Schmidhuber’s preference for the universal or speed priors; or something like that? I’m pretty sure he explicitly considered it though. I feel silly for not having actually looked it up myself. Anyway, is there some halfway sane equivalent to talking about a uniform prior here? (Are improper priors not too meaningless/ugly for this purpose?) (Warning, I might edit this comment as I read or re-read the relevant Wikipedia articles.)
Anyway, is there some halfway sane equivalent to talking about a uniform prior here?
I don’t think so. At least I’ve never seen anyone mention any prior over mathematical structures that might be even roughly described as a “uniform prior”.
I think the intuition here is basically that of the everything-list’s “white rabbit” problem. If you consider e.g. all programs at most 10^100 bits in length, there will be many more long than short programs that output a given mind. But I think the standard answer is most of those long programs will just be short programs with irrelevant junk bits tacked on?
I basically don’t understand such arguments as applied to real-world cosmology, i.e. computing programs and not discovering them. ’Cuz if we’re talking about cosmology aren’t we assuming that at some point some computation is going to occur? If so, there’s a very short program that outputs a universal dovetailer that computes all programs of arbitrary length, that repeatedly outputs a universal dovetailer for all programs at most 10^5 bits in length, that.… and it’s just not clear to me what generators win out in the end, whether short short-biased or long long-biased, how that depends on choice of language, or generally what the heck is going.
Warning! Almost assuredly blithering nonsense: (Actually, in that scenario aren’t there logical attractors for programs to output 0, 1, 10, 11, … which results in universal distribution/generator constructed from the uniform generator, which then goes on to compute whatever universe we would have seen from an original universal distribution anyway? This self-organization looks suspiciously like getting information from nowhere, but those computations must cost negentropy if they’re not reversible. If they are reversible then how? Reversible by what? Anyway that is information as seen from outside the system which might not be meaningful—information from any point inside the system seems like it might be lost with each irreversible computation? Bleh, speculations.)
(ETA: Actually couldn’t we just run some simulations of this argument or translate it into terms of Hashlife and see what we get? My hypothesis is that as we compute all programs of length x=0, x++ till infinity, the binary outputs of all computations when sorted into identical groups converge on a universal prior distribution, though for small values of x the convergence is swamped by language choice. I have no real reason to suspect this is hypothesis is accurate or even meaningful.)
(ETA 2: Bleh, forgot about the need to renormalize outputs by K complexity (i.e. maximum lossless compression) ’cuz so many programs will output “111111111111...”. Don’t even know if that’s meaningful or doesn’t undermine the entire point. Brain doesn’t like math.)
Warning! Almost assuredly blithering nonsense: Hm, is this more informative if instead we consider programs between 10^100 and 10^120 bits in length? Should it matter all that much how long they are? If we can show convergence upon characteristic output distributions by various reasonably large sets of all programs of bit lengths a to b, a < b, between 0 and infinity, then we can perhaps make some weak claims about “attractive” outputs for programs of arbitrary length. I speculated in my other comment reply to your comment that after maximally compressing all of the outputs we might get some neat distribution (whatever the equivalent of the normal distribution is for enough arbitrary program outputs in a given language after compression), though it’s probably something useless that doesn’t explain anything, like, I’m not sure that compressing the results doesn’t just destroy the entire point of getting the outputs. (Instead maybe we’d run all the outputs as programs repeatedly; side question: if you keep doing this how quickly does the algorithm weed out non-halting programs?) Chaitin would smile upon such methods, I think, even if he’d be horrified at my complete bastardization of pretend math, let alone math?
I suppose this is where such a monstrosity might lurk, but it does indeed seem mythical. Dammit infinities, why won’t you ever listen to my common sense notions?
Anyway, is there some halfway sane equivalent to talking about a uniform prior here?
Yes, (depending what you mean by half-way sane). You can have a prior that all your sense inputs are uniformly distributed. This prior is stable since no matter what sense inputs you experience they’re just as likely as any other. For reasons that should be obvious, I certainly don’t recommend adopting this prior.
(I’m curious, why is it countably infinite rather than uncountably infinite?)
(In retrospect, my comment could have been worded better; I only meant to consider both cases. The point is that we can’t talk about a uniform distribution until we’ve specified more precisely just what the Tegmark multiverse is supposed to include; “all mathematical structures” is not well-defined.)
Oh ah. I share your sadness, and thus made this comment, which is more up-front about how confused all of this is, or at least how confused I am about it. Perhaps you’d like to try your hand at some wild speculation?
I don’t know what this is supposed to mean. There isn’t any uniform distribution over a countably infinite set. We can have a continuous uniform distribution over certain special kinds of uncountable sets (for example, the real unit interval [0,1]), but somehow I doubt that this was the intended reading.
(I’m curious, why is it countably infinite rather than uncountably infinite?) I had assumed, perhaps quite ridiculously, that there was some silly improper or uniform-like distribution, then for shorthand just called that prior a uniform distribution. The reason I’d assumed that is because I remembered Tegmark not seeming to be worried about the problem in one of his longer more-detailed multiverse papers despite saying he wasn’t fond of Schmidhuber’s preference for the universal or speed priors; or something like that? I’m pretty sure he explicitly considered it though. I feel silly for not having actually looked it up myself. Anyway, is there some halfway sane equivalent to talking about a uniform prior here? (Are improper priors not too meaningless/ugly for this purpose?) (Warning, I might edit this comment as I read or re-read the relevant Wikipedia articles.)
I don’t think so. At least I’ve never seen anyone mention any prior over mathematical structures that might be even roughly described as a “uniform prior”.
I think the intuition here is basically that of the everything-list’s “white rabbit” problem. If you consider e.g. all programs at most 10^100 bits in length, there will be many more long than short programs that output a given mind. But I think the standard answer is most of those long programs will just be short programs with irrelevant junk bits tacked on?
I basically don’t understand such arguments as applied to real-world cosmology, i.e. computing programs and not discovering them. ’Cuz if we’re talking about cosmology aren’t we assuming that at some point some computation is going to occur? If so, there’s a very short program that outputs a universal dovetailer that computes all programs of arbitrary length, that repeatedly outputs a universal dovetailer for all programs at most 10^5 bits in length, that.… and it’s just not clear to me what generators win out in the end, whether short short-biased or long long-biased, how that depends on choice of language, or generally what the heck is going.
Warning! Almost assuredly blithering nonsense: (Actually, in that scenario aren’t there logical attractors for programs to output 0, 1, 10, 11, … which results in universal distribution/generator constructed from the uniform generator, which then goes on to compute whatever universe we would have seen from an original universal distribution anyway? This self-organization looks suspiciously like getting information from nowhere, but those computations must cost negentropy if they’re not reversible. If they are reversible then how? Reversible by what? Anyway that is information as seen from outside the system which might not be meaningful—information from any point inside the system seems like it might be lost with each irreversible computation? Bleh, speculations.)
(ETA: Actually couldn’t we just run some simulations of this argument or translate it into terms of Hashlife and see what we get? My hypothesis is that as we compute all programs of length x=0, x++ till infinity, the binary outputs of all computations when sorted into identical groups converge on a universal prior distribution, though for small values of x the convergence is swamped by language choice. I have no real reason to suspect this is hypothesis is accurate or even meaningful.)
(ETA 2: Bleh, forgot about the need to renormalize outputs by K complexity (i.e. maximum lossless compression) ’cuz so many programs will output “111111111111...”. Don’t even know if that’s meaningful or doesn’t undermine the entire point. Brain doesn’t like math.)
Warning! Almost assuredly blithering nonsense: Hm, is this more informative if instead we consider programs between 10^100 and 10^120 bits in length? Should it matter all that much how long they are? If we can show convergence upon characteristic output distributions by various reasonably large sets of all programs of bit lengths a to b, a < b, between 0 and infinity, then we can perhaps make some weak claims about “attractive” outputs for programs of arbitrary length. I speculated in my other comment reply to your comment that after maximally compressing all of the outputs we might get some neat distribution (whatever the equivalent of the normal distribution is for enough arbitrary program outputs in a given language after compression), though it’s probably something useless that doesn’t explain anything, like, I’m not sure that compressing the results doesn’t just destroy the entire point of getting the outputs. (Instead maybe we’d run all the outputs as programs repeatedly; side question: if you keep doing this how quickly does the algorithm weed out non-halting programs?) Chaitin would smile upon such methods, I think, even if he’d be horrified at my complete bastardization of pretend math, let alone math?
I suppose this is where such a monstrosity might lurk, but it does indeed seem mythical. Dammit infinities, why won’t you ever listen to my common sense notions?
Yes, (depending what you mean by half-way sane). You can have a prior that all your sense inputs are uniformly distributed. This prior is stable since no matter what sense inputs you experience they’re just as likely as any other. For reasons that should be obvious, I certainly don’t recommend adopting this prior.
(In retrospect, my comment could have been worded better; I only meant to consider both cases. The point is that we can’t talk about a uniform distribution until we’ve specified more precisely just what the Tegmark multiverse is supposed to include; “all mathematical structures” is not well-defined.)
Oh ah. I share your sadness, and thus made this comment, which is more up-front about how confused all of this is, or at least how confused I am about it. Perhaps you’d like to try your hand at some wild speculation?
And even then, that distribution is only uniform with respect to the usual measure on the unit interval.