GRI comments on [ASoT] Simulators show us behavioural properties by default

GRI 13 Aug 2024 7:39 UTC
3 points
0
Thought this paper (published after this post) seemed relevant: Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting