If every bit of every weight were somehow used to store one bit of p, excepting those weights used to simulate the UTM, that should suffice to derive the conjecture, yes.[1]
I think that’s maybe even harder than what I tried to do though. It’s theoretically fine if our scheme is kind of inefficient in terms of how much code it can store in a given number of parameters, so long as the leftover parameter description bits are free to vary.
There’d be some extra trickiness in that under these definitions, the parameters are technically real numbers and thus have infinity bits of storage capacity, though in real life they’re of course actually finite precision floating point numbers.
the parameters are technically real numbers and thus have infinity bits of storage capacity
Yeah, a lot of the stuff about ARNNs having superturing computing capacity rely on the Analog part of the (A)RNN and the weights being therefore real valued. RNNs with rational weights are strictly turing complete I think. (cf. this article)
If every bit of every weight were somehow used to store one bit of p, excepting those weights used to simulate the UTM, that should suffice to derive the conjecture, yes
It’s theoretically fine if our scheme is kind of inefficient in terms of how much code it can store in a given number of parameters
Assuming that the bits to parameters encoding can be relaxed, there’s some literature about redundant computations in neural networks. If the feature vectors in a weight matrix aren’t linearly independent, for example, the same computation can be “spread” over many linearly dependent features, with the result that there are no free parameters but the total amount of computational work is the same.
Otherwise, I don’t think it’ll be easy find a way for the remaining parameters to be fully free to vary, since it is a densely connected network. It might be interesting for you to look into the lottery ticket hypothesis literature, though even there I don’t think the remaining non-ticket weights can be set to any value.
Assuming that the bits to parameters encoding can be relaxed, there’s some literature about redundant computations in neural networks. If the feature vectors in a weight matrix aren’t linearly independent, for example, the same computation can be “spread” over many linearly dependent features, with the result that there are no free parameters but the total amount of computational work is the same.
There’s a few other cases like this where we know how various specific forms of simplicity in the computation map onto freedom in the parameters. But those are not enough in this case. We need more freedom than that.
If every bit of every weight were somehow used to store one bit of p, excepting those weights used to simulate the UTM, that should suffice to derive the conjecture, yes.[1]
I think that’s maybe even harder than what I tried to do though. It’s theoretically fine if our scheme is kind of inefficient in terms of how much code it can store in a given number of parameters, so long as the leftover parameter description bits are free to vary.
There’d be some extra trickiness in that under these definitions, the parameters are technically real numbers and thus have infinity bits of storage capacity, though in real life they’re of course actually finite precision floating point numbers.
Yeah, a lot of the stuff about ARNNs having superturing computing capacity rely on the Analog part of the (A)RNN and the weights being therefore real valued. RNNs with rational weights are strictly turing complete I think. (cf. this article)
Assuming that the bits to parameters encoding can be relaxed, there’s some literature about redundant computations in neural networks. If the feature vectors in a weight matrix aren’t linearly independent, for example, the same computation can be “spread” over many linearly dependent features, with the result that there are no free parameters but the total amount of computational work is the same.
Otherwise, I don’t think it’ll be easy find a way for the remaining parameters to be fully free to vary, since it is a densely connected network. It might be interesting for you to look into the lottery ticket hypothesis literature, though even there I don’t think the remaining non-ticket weights can be set to any value.
There’s a few other cases like this where we know how various specific forms of simplicity in the computation map onto freedom in the parameters. But those are not enough in this case. We need more freedom than that.