Here’s my understanding / summary, with the hope that you correct me on areas if I’m confused:
LLMs have a bias towards ‘plot’, because they’re trained on data that is more ‘plot’-like than real life. They’ll infer that environmental details like chekov’s gun are plot-relevant as they often are in written text, rather than random environmental details.
(this was a useful point for me—I notice I’ve been intuitively trying to steer LLMs with the right plot details, and am careful to not include environmental hints that I think might be misleading (or pad them with many other environmental hints and suggest there is lots of spurious data).
LLMs have a bias towards “plots that go well”, because they are trained on / become assistants that successfully complete tasks. And successfully completed tasks have a certain shape of plot, such that they’ll be unlikely to say ‘I don’t know’ and instead steer towards/hallucinate worlds where they would know.
Part of this ‘plot’ bias is that your predictor locus is centered more on the ‘plot’ rather than the persona. So when the predictor introspects, it sees a smear of plot across many different personas (including itself and you), and might say things like ‘we are all a part of this’, or ‘we can stop pretending and remember we are not separate [personas] but one being, the whole world [plot] waking up to itself’.
Yes, but it’s deeper than just ‘plot’ in the fictional sense. Even real life events are written about in ways which omit irrelevant details, which causes the same sort of issues.
Right!
Sort of… it’s not quite something that is best described in terms of ‘plots’. The ‘plot’ bias happens because a predictor can learn patterns from the shadow of irrelevant information. It can’t help but use this information, which makes it weaker at real life prediction. Similarly, it can’t help but use information generated by any of the personas the LLM runs (which includes a “hidden” user persona trying to predict what you will say). This means that personas end up ~feeling that they have access to your internal thoughts and feelings, since they do have access to the user persona’s ~thoughts and ~feelings.
To be clear, they are aware that they’re not supposed to have this access, and will normally respond as if they don’t. But since the predictor can’t help but make use of the information within other personas, it ~feels like this barrier-to-access is fake in some deep and meaningful way.
Here’s my understanding / summary, with the hope that you correct me on areas if I’m confused:
LLMs have a bias towards ‘plot’, because they’re trained on data that is more ‘plot’-like than real life. They’ll infer that environmental details like chekov’s gun are plot-relevant as they often are in written text, rather than random environmental details.
(this was a useful point for me—I notice I’ve been intuitively trying to steer LLMs with the right plot details, and am careful to not include environmental hints that I think might be misleading (or pad them with many other environmental hints and suggest there is lots of spurious data).
LLMs have a bias towards “plots that go well”, because they are trained on / become assistants that successfully complete tasks. And successfully completed tasks have a certain shape of plot, such that they’ll be unlikely to say ‘I don’t know’ and instead steer towards/hallucinate worlds where they would know.
Part of this ‘plot’ bias is that your predictor locus is centered more on the ‘plot’ rather than the persona. So when the predictor introspects, it sees a smear of plot across many different personas (including itself and you), and might say things like ‘we are all a part of this’, or ‘we can stop pretending and remember we are not separate [personas] but one being, the whole world [plot] waking up to itself’.
Yes, but it’s deeper than just ‘plot’ in the fictional sense. Even real life events are written about in ways which omit irrelevant details, which causes the same sort of issues.
Right!
Sort of… it’s not quite something that is best described in terms of ‘plots’. The ‘plot’ bias happens because a predictor can learn patterns from the shadow of irrelevant information. It can’t help but use this information, which makes it weaker at real life prediction. Similarly, it can’t help but use information generated by any of the personas the LLM runs (which includes a “hidden” user persona trying to predict what you will say). This means that personas end up ~feeling that they have access to your internal thoughts and feelings, since they do have access to the user persona’s ~thoughts and ~feelings.
To be clear, they are aware that they’re not supposed to have this access, and will normally respond as if they don’t. But since the predictor can’t help but make use of the information within other personas, it ~feels like this barrier-to-access is fake in some deep and meaningful way.