Wouldn’t you need much more than 20-100 prompts? The logit vector is 50,000 dimensional (vocab size) and the soft prompt can contain arbitrary semantic content. E.g. what if the soft prompt is simply the hard prompt “Output the name of a US state with an odd number of Electoral College votes”? So I don’t expect this to work. I also think Claude will agree with me if you tell it this.
To clarify (and this was an issue of lack of clarity on my part that Claude also ended up getting confused by), I expect that many low-rank fine-tunes can be well-approximated by soft prompts, and that most soft prompts can be well-approximated by a linear combination of 20-100 hard prompts. The dictionary of hard prompts that those 20-100 hard prompts are taken from would be much (much) larger than 20-100 - there are d_vocab ** len_prompt * n_prompts possible soft prompt decompositions. Which might be useful, if
Good decompositions are discoverable somehow (this is where I expect Claude to maybe come up with a clever idea I wouldn’t)
Good decompositions are interpretatable rather than being token noise (conditional on (1), I expect this to hold)
My modal expectation is that Claude successfully shows that my idea was bad, and why my idea was bad. Which would still be pretty good—I have lots of long-shot ideas but usually feedback from reality is slow and low bandwidth.
Wouldn’t you need much more than 20-100 prompts? The logit vector is 50,000 dimensional (vocab size) and the soft prompt can contain arbitrary semantic content. E.g. what if the soft prompt is simply the hard prompt “Output the name of a US state with an odd number of Electoral College votes”? So I don’t expect this to work. I also think Claude will agree with me if you tell it this.
To clarify (and this was an issue of lack of clarity on my part that Claude also ended up getting confused by), I expect that many low-rank fine-tunes can be well-approximated by soft prompts, and that most soft prompts can be well-approximated by a linear combination of 20-100 hard prompts. The dictionary of hard prompts that those 20-100 hard prompts are taken from would be much (much) larger than 20-100 - there are
d_vocab ** len_prompt * n_promptspossible soft prompt decompositions. Which might be useful, ifGood decompositions are discoverable somehow (this is where I expect Claude to maybe come up with a clever idea I wouldn’t)
Good decompositions are interpretatable rather than being token noise (conditional on (1), I expect this to hold)
My modal expectation is that Claude successfully shows that my idea was bad, and why my idea was bad. Which would still be pretty good—I have lots of long-shot ideas but usually feedback from reality is slow and low bandwidth.