I presume I’m not understanding something, so why isn’t this a trivial counterexample:
I know a channel either strips out every 11, or every 00. This forces me to perform some sort of encoding scheme to ensure we send the same message either way. Observing a single bit of information (whether 11 or 00 is stripped out), allows me to use a much simpler encoding, saving a large number of bits.
After chewing it on it a bit, I find it very plausible that this is indeed a counterexample. However, it is not obvious to me how to prove that there does not exist some clever encoding scheme which would achieve bit-throughput competitive with the O-dependent encoding without observing O. (Note that we don’t actually need to ensure the same Y pops out either way, we just need the receiver to be able to distinguish between enough possible inputs A by looking at Y.)
I don’t think this one works. In order for the channel capacity to be finite, there must be some maximum number of bits N you can send. Even if you don’t observe the type of the channel, you can communicate a number n from 0 to N by sending n 1s and N-n 0s. But then even if you do observe the type of the channel (say, it strips the 0s), the receiver will still just see some number of 1s that is from 0 to N, so you have actually gained zero channel capacity. There’s no bonus for not making full use of the channel; in johnswentworth’s formulation of the problem, there’s no such thing as some messages being cheaper to transmit through the channel than others.
A fair point. Or a similar argument, you can only transfer one extra bit of information this way, since the message representing a number of size 2n is only 1 bit larger than the message representing n.
Trying to patch the thing which I think this example was aiming for:
Let A be an n-bit number, O be 0 or 1 (50/50 distribution). Then let Y = A if A≡Omod2, else Y = 0. If the sender knows O, then they can convey n-1 bits with every message (i.e. n bits minus the lowest-order bit). If the sender does not know O, then half the messages are guaranteed to be 0 (and which messages are 0 communicates at most 1 bit per, although I’m pretty sure it’s in fact zero bits per in this case, so no loophole there). So at most ~n/2 bits per message can be conveyed if the sender does not know O.
So, sender learning the one bit O doesn’t add one bit to the capacity, it doubles the capacity.
I’m now basically convinced that that’s the answer; the conjecture is false.
I now expect that the upper limit to bits of optimization gained from one bit of observation is ~doubling the number of bits of optimization (as in this example). Though we also know that observing one bit can take us from zero bits of optimization to one bit of optimization, so it’s not as simple as just doubling; there’s at least a small additional term in there.
Tripling is I think definitely a hard max since you can send 3 messages, action if true, action if false, + which is which—at least assuming you can reliably send a bit at all without the observation.
More tightly it’s doubling + number of bits required to send a single bit of information.
Damn, that one sounded really promising at first, but I don’t think it works. Problem is, if A is fixed-length, then knowing the number of 1′s also tells us the number of 0′s. And since we get to pick P[A] in the optimization problem, we can make A fixed-length.
G can still be independent bits in the proposed counterexample. The mechanism is just about how Y is constructed from A: A is a sequence of bits, Y is A with either 11′s removed or 00′s removed.
I presume I’m not understanding something, so why isn’t this a trivial counterexample:
I know a channel either strips out every 11, or every 00. This forces me to perform some sort of encoding scheme to ensure we send the same message either way. Observing a single bit of information (whether 11 or 00 is stripped out), allows me to use a much simpler encoding, saving a large number of bits.
After chewing it on it a bit, I find it very plausible that this is indeed a counterexample. However, it is not obvious to me how to prove that there does not exist some clever encoding scheme which would achieve bit-throughput competitive with the O-dependent encoding without observing O. (Note that we don’t actually need to ensure the same Y pops out either way, we just need the receiver to be able to distinguish between enough possible inputs A by looking at Y.)
Ok simpler example:
You know the channel either removes all 0s or all 1s, but you don’t know which.
The most efficient way to send a message is to send n 1s, followed by n 0s, where n is the number the binary message you want to send represents.
If you know whether 1s or 0s are stripped out, then you only need to send n bits of information, for a total saving of n bits.
EDIT: this doesn’t work, see comment by AlexMennen.
I don’t think this one works. In order for the channel capacity to be finite, there must be some maximum number of bits N you can send. Even if you don’t observe the type of the channel, you can communicate a number n from 0 to N by sending n 1s and N-n 0s. But then even if you do observe the type of the channel (say, it strips the 0s), the receiver will still just see some number of 1s that is from 0 to N, so you have actually gained zero channel capacity. There’s no bonus for not making full use of the channel; in johnswentworth’s formulation of the problem, there’s no such thing as some messages being cheaper to transmit through the channel than others.
A fair point. Or a similar argument, you can only transfer one extra bit of information this way, since the message representing a number of size 2n is only 1 bit larger than the message representing n.
Trying to patch the thing which I think this example was aiming for:
Let A be an n-bit number, O be 0 or 1 (50/50 distribution). Then let Y = A if A≡Omod2, else Y = 0. If the sender knows O, then they can convey n-1 bits with every message (i.e. n bits minus the lowest-order bit). If the sender does not know O, then half the messages are guaranteed to be 0 (and which messages are 0 communicates at most 1 bit per, although I’m pretty sure it’s in fact zero bits per in this case, so no loophole there). So at most ~n/2 bits per message can be conveyed if the sender does not know O.
So, sender learning the one bit O doesn’t add one bit to the capacity, it doubles the capacity.
I’m now basically convinced that that’s the answer; the conjecture is false.
I now expect that the upper limit to bits of optimization gained from one bit of observation is ~doubling the number of bits of optimization (as in this example). Though we also know that observing one bit can take us from zero bits of optimization to one bit of optimization, so it’s not as simple as just doubling; there’s at least a small additional term in there.
That sounds about right.
Tripling is I think definitely a hard max since you can send 3 messages, action if true, action if false, + which is which—at least assuming you can reliably send a bit at all without the observation.
More tightly it’s doubling + number of bits required to send a single bit of information.
Damn, that one sounded really promising at first, but I don’t think it works. Problem is, if A is fixed-length, then knowing the number of 1′s also tells us the number of 0′s. And since we get to pick P[A] in the optimization problem, we can make A fixed-length.
EDIT: oh, Alex beat me to the punch.
The post says
I believe that, in your example, the bits are not independent (indeed, the first and second bits are always equal), thus it isn’t a counterexample.
Sorry if I misunderstood
G can still be independent bits in the proposed counterexample. The mechanism is just about how Y is constructed from A: A is a sequence of bits, Y is A with either 11′s removed or 00′s removed.