aogara comments on AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering

aogara 18 Oct 2023 16:50 UTC
2 points
0
Hey, great question. I wasn’t on the research team but asked Andy Zou, and this is what he said:
Our method is completely unsupervised, whereas ITI is not only supervised but also uses TruthfulQA questions for training and validation, violating the true zero-shot nature of the task, so we do not compare with them. (We tried their LLaMA-2 model hosted on HF and found it to still underperform our methods.) The claim is that we “outperform zero-shot baseline by 18%.” Overall, to my knowledge, there has been very little movement on the hardest MC1 task on TruthfulQA. GPT-4 gets ~60% and the performance of our 13B model is quite close to that.
Linear probing and CCS are representation reading methods instead of control methods so we compare with them in other sections of the paper where the task is discriminative.