Do you have examples of the kind of models / kind of questions that lead to this most strongly? I’ve been collecting behaviors but it’s slow work reading a lot of CoTs and so anything would be welcome :)
Re: questions which lead to this most strongly: my experience has been that giving the model a variety of tasks and constraints across a long context can lead to this somewhat reliably- for example, taking the model through a creative exercise with constraints like a specific kind of poetic meter, then asking it to generalize in an unrelated task, then asking it to (and so on) until the activations are so complex and “muddled” the model struggles with coherent reasoning through the establishes context.
I apologize for the imprecision and hope this can be useful!
Do you have examples of the kind of models / kind of questions that lead to this most strongly? I’ve been collecting behaviors but it’s slow work reading a lot of CoTs and so anything would be welcome :)
Re: questions which lead to this most strongly: my experience has been that giving the model a variety of tasks and constraints across a long context can lead to this somewhat reliably- for example, taking the model through a creative exercise with constraints like a specific kind of poetic meter, then asking it to generalize in an unrelated task, then asking it to (and so on) until the activations are so complex and “muddled” the model struggles with coherent reasoning through the establishes context.
I apologize for the imprecision and hope this can be useful!