I think this might be an attempted countermeasure against prompt injection. That is, it wants to mix autoregressive and reconstructed residuals. Otherwise, it might lose it’s train of thought (end up continuing the article not following the prompt).
I think this might be an attempted countermeasure against prompt injection. That is, it wants to mix autoregressive and reconstructed residuals. Otherwise, it might lose it’s train of thought (end up continuing the article not following the prompt).