Fabien Roger comments on Fabien’s Shortform

Fabien Roger 26 Apr 2024 12:38 UTC
LW: 13 AF: 6
0
AF
List sorting does not play well with few-shot mostly doesn’t replicate with davinci-002.
When using length-10 lists (it crushes length-5 no matter the prompt), I get:
- 32-shot, no fancy prompt: ~25%
- 0-shot, fancy python prompt: ~60%
- 0-shot, no fancy prompt: ~60%
So few-shot hurts, but the fancy prompt does not seem to help. Code here.
I’m interested if anyone knows another case where a fancy prompt increases performance more than few-shot prompting, where a fancy prompt is a prompt that does not contain information that a human would use to solve the task. This is because I’m looking for counterexamples to the following conjecture: “fine-tuning on k examples beats fancy prompting, even when fancy prompting beats k-shot prompting” (for a reasonable value of k, e.g. the number of examples it would take a human to understand what is going on).