I’d be curious to see what comes from your experiments.
I also think that there’s a different in the control setting in what the trade-offs are between having the small-model vs large-model “initiate”. In the compression setting the small model asks a lot of questions, and this lets you not have to send the questions “over the wire”, with control, you’re likely fine with bi-directional flow: the small model asks for help me when it needs, or the large model interjecs when it wants. I haven’t thought too carefully on that.
Interestingly, even if every question from T is a yes-or-no question, the above method would let U answer them in fewer bits on average! For example, if T already thinks something is 90% likely, and U confirms that the answer is indeed “Yes”, that provides T with only -log2(0.90) = 0.152 bits of information.
Yeah—definitely! You can use arithmetic coding to encode those bits, according to the small model’s priors. And if it turns out to be less helpful than sending the bits directly, the large model woudl be able to assess that. So you never need to send more than N bits for N questions, but could be able to do much better. I like this idea a good deal, we do mention this in the paper, but it’s not a major point.
I think there are a lot of specifics to explore in the implementation of the interaction, and our work explores relatively constrained space (we did try a few other LLM-SLM interactions, and mostly got less impressive results).
I’d be curious to see what comes from your experiments.
I also think that there’s a different in the control setting in what the trade-offs are between having the small-model vs large-model “initiate”. In the compression setting the small model asks a lot of questions, and this lets you not have to send the questions “over the wire”, with control, you’re likely fine with bi-directional flow: the small model asks for help me when it needs, or the large model interjecs when it wants.
I haven’t thought too carefully on that.
Yeah—definitely! You can use arithmetic coding to encode those bits, according to the small model’s priors. And if it turns out to be less helpful than sending the bits directly, the large model woudl be able to assess that. So you never need to send more than N bits for N questions, but could be able to do much better. I like this idea a good deal, we do mention this in the paper, but it’s not a major point.
I think there are a lot of specifics to explore in the implementation of the interaction, and our work explores relatively constrained space (we did try a few other LLM-SLM interactions, and mostly got less impressive results).