alexlyzhov comments on Testing PaLM prompts on GPT3

alexlyzhov 6 Apr 2022 5:51 UTC
14 points
0
Are PaLM outputs cherry-picked?
I reread the description of the experiment and I’m still unsure.
The protocol is on page 37 goes like this:
- the 2-shot exemplars used for few-shot learning were not selected or modified based on model output. I infer this from the line “the full exemplar prompts were written before any examples were evaluated, and were never modified based on the examination of the model output”.
- greedy decoding is used, so they couldn’t filter outputs given a prompt.
What about the queries (full prompt without the QAQA few-shot data part)? Are they included under “the full exemplar prompts” or not? If they are there’s no output selection, if they aren’t the outputs could be strongly selected with the selection magnitude unreported. On one hand, “full prompts” should refer to full prompts. On the other hand, they only use “exemplar” when talking about the QAQA part they prepend to every query versus “evaluated example” meaning the query.
- Algon 8 Apr 2022 0:39 UTC
  7 points
  0
  Parent
  We cannot make that inference with the information given.
  - Adam Selker 12 May 2022 17:54 UTC
    1 point
    0
    Parent
    Insufficient data for meaningful answer
- [ ]
  [deleted]