The RepEng paper claims SOTA on TruthfulQA by 18%. Is this MC1 from here https://paperswithcode.com/sota/question-answering-on-truthfulqa ? Where is this number coming from? And why is the only maintext evaluation against any other method a single table evaluation against ActAdd (what about ITI? And surely there are other methods outside the LW sphere?)?
I’m glad there’s work trying to use model internals for useful things but the evidence didn’t seem that strong besides single prompts that don’t provide me with that much signal
Hey, great question. I wasn’t on the research team but asked Andy Zou, and this is what he said:
Our method is completely unsupervised, whereas ITI is not only supervised but also uses TruthfulQA questions for training and validation, violating the true zero-shot nature of the task, so we do not compare with them. (We tried their LLaMA-2 model hosted on HF and found it to still underperform our methods.) The claim is that we “outperform zero-shot baseline by 18%.” Overall, to my knowledge, there has been very little movement on the hardest MC1 task on TruthfulQA. GPT-4 gets ~60% and the performance of our 13B model is quite close to that.
Linear probing and CCS are representation reading methods instead of control methods so we compare with them in other sections of the paper where the task is discriminative.
The RepEng paper claims SOTA on TruthfulQA by 18%. Is this MC1 from here https://paperswithcode.com/sota/question-answering-on-truthfulqa ? Where is this number coming from? And why is the only maintext evaluation against any other method a single table evaluation against ActAdd (what about ITI? And surely there are other methods outside the LW sphere?)?
I’m glad there’s work trying to use model internals for useful things but the evidence didn’t seem that strong besides single prompts that don’t provide me with that much signal
Hey, great question. I wasn’t on the research team but asked Andy Zou, and this is what he said: