I disagree. I don’t see increased focus on scheming, if anything notably less common. In part due to updating on current gen LLMs.
I do think there is a tendency to think about scheming as a discrete thing, but that it is more common among the optimistic who point at current gen LLMs not really being ‘schemers’.
I agree with the way Zvi talks about the topic.
“Being a schemer” is not quite the right classification. The issue is that deception is a naturally convergent tool for all sorts of goals, anything that interfaces with reality intelligently will find that deception and manipulation are useful tools. So we’d naturally expect that RL and other fun methods will push towards that being a greater aspect- and that even if we don’t have any badly mislabeled data or reward-hackable environments, sufficiently general intelligence will be able to construct the methodology by itself.
So I kinda agree with your post, but I also feel that you’re then turning down scheming/deception as less of a thing, when it is still a relevant categorization just hard to measure and be confident in how it grows as you scale.
I disagree. I don’t see increased focus on scheming, if anything notably less common. In part due to updating on current gen LLMs. I do think there is a tendency to think about scheming as a discrete thing, but that it is more common among the optimistic who point at current gen LLMs not really being ‘schemers’.
I agree with the way Zvi talks about the topic. “Being a schemer” is not quite the right classification. The issue is that deception is a naturally convergent tool for all sorts of goals, anything that interfaces with reality intelligently will find that deception and manipulation are useful tools. So we’d naturally expect that RL and other fun methods will push towards that being a greater aspect- and that even if we don’t have any badly mislabeled data or reward-hackable environments, sufficiently general intelligence will be able to construct the methodology by itself.
So I kinda agree with your post, but I also feel that you’re then turning down scheming/deception as less of a thing, when it is still a relevant categorization just hard to measure and be confident in how it grows as you scale.