thanks! heres my initial thought about introspection and how to improve on the setup there:
in my introspection paper we train models to predict their behavior in a single forward pass without CoT.
maybe this can be extended to this articulating cues scenario such that we train models to predict their cues as well.
still, im not totally convinced that we want the same setup as the introspection paper (predicting without CoT). it seems like an unnecessary restraint to force this kind thinking about the effect of a cue in a single forward pass. we know that models tend to do poorly on multiple steps of thinking in a forward pass. so why handicap ourselves?
my current thinking is that it is more effective for models to generate hypotheses explicitly and then reason about what effects their reasoning afterwards. maybe we can train models to be more calibrated about what hypotheses to generate when they carry out their CoT. seems ok.
thanks! heres my initial thought about introspection and how to improve on the setup there:
in my introspection paper we train models to predict their behavior in a single forward pass without CoT.
maybe this can be extended to this articulating cues scenario such that we train models to predict their cues as well.
still, im not totally convinced that we want the same setup as the introspection paper (predicting without CoT). it seems like an unnecessary restraint to force this kind thinking about the effect of a cue in a single forward pass. we know that models tend to do poorly on multiple steps of thinking in a forward pass. so why handicap ourselves?
my current thinking is that it is more effective for models to generate hypotheses explicitly and then reason about what effects their reasoning afterwards. maybe we can train models to be more calibrated about what hypotheses to generate when they carry out their CoT. seems ok.