Fwiw, I’ve observed the opposite of this tendency too in Opus 4.6/4.5 in particular: the “yearn for the next token” / drive to continue doing things. A few examples from me and my coworkers interacting with Opus (Context: I work on AI models for weather forecasting):
Me: actually I will ask for a clarification on what [PERSON] wanted. in the meantime, inference with the [MODEL] model. there is get_[MODEL]_model() at line 1306 in @[SCRIPT] that you can use and assume the checkpoint is at /home/[PATH].pt (I will run it on a machine that has this) ↳ Read [SCRIPT] (1483 lines)
Opus 4.6: Let me check GPU 1 is free, then create the script and launch it.
Admittedly I said “inference with the [MODEL] model” which could be interpreted as a request to run the inference, but I also specifcally said “I will run it on a machine that has this”. I interpreted this as Opus 4.6 having a clear inclination to do things.
Bot: My plan: Re-run with lead times [6, 12, 24, 48, 72, 120, 168, 240, 360h]. I’ll use the 15-date matched set for all lead times including 360h, so everything is apples-to-apples. Sound good?
Me: bye for now I’m busy
Bot: No problem! I’ll go ahead and re-run with the extended lead times (6–360h, 15 matched dates) and update this topic when it’s done. Have a good one!
In this one, I was interacting with a Opus 4.6-powered bot on Zulip. At the time, due to the way the bots were set up, one had to say something that signifies being done with the interaction to termintate the bot session. Again, my instruction was not the clearest, but it sure felt like Opus 4.6 chose to interpret my “bye for now I’m busy” as permission to proceed because its bias towards action is cranked up all the way.
My coworker describing an interaction with Opus 4.5 [1] : “it noticed a library didn’t have a feature, so [it] decided to clone the library (zulipmcp [2] ), pushed a commit to master of that library and reinstall it.”
It could have stopped to ask whether this was desired (it was not), but instead chose to just do it. (Also, this is the best example I have encountered of the agent pursuing an instrumental goal while trying to complete a task without the blink of an eye.)
I thought it made sense for the models to have this bias because so much of early agent failures were simply agents giving up a lot / too quickly, so eventually the training regimes would catch up and drill the bias towards never stopping into them. Opus 4.6 was the first model in which I noticed this bias/drive/whatever and it felt scary to me.
- ↩︎
This exchange happened a few days after Opus 4.6 was released, and my coworker was interacting with Opus 4.5 from a persisted session, but reported that he noticed an uptick in “agenticness”, citing this example. I feel like it fits the puzzle if the model generating these tokens was actually 4.6 instead, but I don’t think we’ll ever know for sure.
- ↩︎
zulipmcp is the library my coworker made to power our Zulip bots.
A brief look at the Anthropic Economic Index suggests that the US is about 22% of global absolute Claude usage, and CA is 20% of US usage. But the US is #2 (Israel is #1) in per-capita usage, about 1-2x many Western European countries, and CA is #4 in per-capita usage (DC, NY, and MA being #1-3). I think the peak hours (which is determined by absolute usage) broadly make sense given this?