Arthur Conmy comments on AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering

Arthur Conmy 18 Oct 2023 1:15 UTC
3 points
0
The RepEng paper claims SOTA on TruthfulQA by 18%. Is this MC1 from here https://paperswithcode.com/sota/question-answering-on-truthfulqa ? Where is this number coming from? And why is the only maintext evaluation against any other method a single table evaluation against ActAdd (what about ITI? And surely there are other methods outside the LW sphere?)?

I’m glad there’s work trying to use model internals for useful things but the evidence didn’t seem that strong besides single prompts that don’t provide me with that much signal
- aogara 18 Oct 2023 16:50 UTC
  2 points
  0
  Parent
  Hey, great question. I wasn’t on the research team but asked Andy Zou, and this is what he said:
  Our method is completely unsupervised, whereas ITI is not only supervised but also uses TruthfulQA questions for training and validation, violating the true zero-shot nature of the task, so we do not compare with them. (We tried their LLaMA-2 model hosted on HF and found it to still underperform our methods.) The claim is that we “outperform zero-shot baseline by 18%.” Overall, to my knowledge, there has been very little movement on the hardest MC1 task on TruthfulQA. GPT-4 gets ~60% and the performance of our 13B model is quite close to that.
  Linear probing and CCS are representation reading methods instead of control methods so we compare with them in other sections of the paper where the task is discriminative.