It seems to me that the “approximately-shortest program” here could just be generating S_1 and S_2, then throwing away S_2 and generating the data from S_1 (or vice versa)?
One class of cases where that definitely won’t work: S_1 and S_2 independent, so K(data|S_1, S_2) is roughly K(data) - K(S_1) - K(S_2) (as shown in the post at the end of the main section). In that case, the program for data given S_1, S_2 has to be significantly shorter than either the program for data given S_1 or the program for data given S_2 (assuming S_1 and S_2 themselves have significantly more than zero K-complexity).
It would be really nice if I could provably construct a third program, , which in some sense “combines the internal structure” of the two programs and , while still achieving approximately the same compression.
I don’t think that would fit the definition, since it would be over 2x as complex as either of the original programs. It doesn’t seem like it would solve the spirit of the problem either.
I’m not talking about the spirit of the problem, I’m talking about the actual program corresponding to the derivation in this article. I’m not super familiar with the field though, so I could be wrong.
The math you’re doing here implies that if 0 ≈ K(A) then 0 ≈ K(A) + K(A) = 2K(A), so constant multipliers seem to be allowed in your approximations.
It seems to me that the “approximately-shortest program” here could just be generating S_1 and S_2, then throwing away S_2 and generating the data from S_1 (or vice versa)?
One class of cases where that definitely won’t work: S_1 and S_2 independent, so K(data|S_1, S_2) is roughly K(data) - K(S_1) - K(S_2) (as shown in the post at the end of the main section). In that case, the program for data given S_1, S_2 has to be significantly shorter than either the program for data given S_1 or the program for data given S_2 (assuming S_1 and S_2 themselves have significantly more than zero K-complexity).
I don’t think that would fit the definition, since it would be over 2x as complex as either of the original programs. It doesn’t seem like it would solve the spirit of the problem either.
I’m not talking about the spirit of the problem, I’m talking about the actual program corresponding to the derivation in this article. I’m not super familiar with the field though, so I could be wrong.
The math you’re doing here implies that if 0 ≈ K(A) then 0 ≈ K(A) + K(A) = 2K(A), so constant multipliers seem to be allowed in your approximations.