It is indeed pretty weird to see these behaviors appear in pure LMs. It’s especially striking with sycophancy, where the large models seem obviously (?) miscalibrated given the ambiguity of the prompt.
By ‘pure LMs’ do you mean ‘pure next token predicting LLMs trained on a standard internet corpus’? If so, I’d be very surprised if they’re miscalibrated and this prompt isn’t that improbable (which it probably isn’t). I’d guess this output is the ‘right’ output for this corpus (so long as you don’t sample enough tokens to make the sequence detectably very weird to the model. Note that t=0 (or low temp in general) may induce all kinds of strange behavior in addition to to making the generation detectably weird.
By ‘pure LMs’ do you mean ‘pure next token predicting LLMs trained on a standard internet corpus’? If so, I’d be very surprised if they’re miscalibrated and this prompt isn’t that improbable (which it probably isn’t). I’d guess this output is the ‘right’ output for this corpus (so long as you don’t sample enough tokens to make the sequence detectably very weird to the model. Note that t=0 (or low temp in general) may induce all kinds of strange behavior in addition to to making the generation detectably weird.