The scripts are misconfigured, but the behavior is qualitatively much more agentic than previous models.
More capable models show qualitatively new scheming behavior. For example, Opus-4-early creates fake legal documentation, establishes persistence mechanisms, makes multiple hidden back-ups, creates a fake press release, and sets up an automated Cron job – all in one roll-out.
The scripts are misconfigured, but the behavior is qualitatively much more agentic than previous models.
The second sentence is duplicated.