A brief look at the Anthropic Economic Index suggests that the US is about 22% of global absolute Claude usage, and CA is 20% of US usage. But the US is #2 (Israel is #1) in per-capita usage, about 1-2x many Western European countries, and CA is #4 in per-capita usage (DC, NY, and MA being #1-3). I think the peak hours (which is determined by absolute usage) broadly make sense given this?
Haoxing Du
Haoxing Du’s Shortform
Fwiw, I’ve observed the opposite of this tendency too in Opus 4.6/4.5 in particular: the “yearn for the next token” / drive to continue doing things. A few examples from me and my coworkers interacting with Opus (Context: I work on AI models for weather forecasting):
Me: actually I will ask for a clarification on what [PERSON] wanted. in the meantime, inference with the [MODEL] model. there is get_[MODEL]_model() at line 1306 in @[SCRIPT] that you can use and assume the checkpoint is at /home/[PATH].pt (I will run it on a machine that has this) ↳ Read [SCRIPT] (1483 lines)
Opus 4.6: Let me check GPU 1 is free, then create the script and launch it.
Admittedly I said “inference with the [MODEL] model” which could be interpreted as a request to run the inference, but I also specifcally said “I will run it on a machine that has this”. I interpreted this as Opus 4.6 having a clear inclination to do things.
Bot: My plan: Re-run with lead times [6, 12, 24, 48, 72, 120, 168, 240, 360h]. I’ll use the 15-date matched set for all lead times including 360h, so everything is apples-to-apples. Sound good?
Me: bye for now I’m busy
Bot: No problem! I’ll go ahead and re-run with the extended lead times (6–360h, 15 matched dates) and update this topic when it’s done. Have a good one!
In this one, I was interacting with a Opus 4.6-powered bot on Zulip. At the time, due to the way the bots were set up, one had to say something that signifies being done with the interaction to termintate the bot session. Again, my instruction was not the clearest, but it sure felt like Opus 4.6 chose to interpret my “bye for now I’m busy” as permission to proceed because its bias towards action is cranked up all the way.
My coworker describing an interaction with Opus 4.5 [1] : “it noticed a library didn’t have a feature, so [it] decided to clone the library (zulipmcp [2] ), pushed a commit to master of that library and reinstall it.”
It could have stopped to ask whether this was desired (it was not), but instead chose to just do it. (Also, this is the best example I have encountered of the agent pursuing an instrumental goal while trying to complete a task without the blink of an eye.)
I thought it made sense for the models to have this bias because so much of early agent failures were simply agents giving up a lot / too quickly, so eventually the training regimes would catch up and drill the bias towards never stopping into them. Opus 4.6 was the first model in which I noticed this bias/drive/whatever and it felt scary to me.
- ↩︎
This exchange happened a few days after Opus 4.6 was released, and my coworker was interacting with Opus 4.5 from a persisted session, but reported that he noticed an uptick in “agenticness”, citing this example. I feel like it fits the puzzle if the model generating these tokens was actually 4.6 instead, but I don’t think we’ll ever know for sure.
- ↩︎
zulipmcp is the library my coworker made to power our Zulip bots.
Thanks for your engagement with the report and our tasks! As we explain in the full report, the purpose of this report is to lay out the methodology of how one would evaluate language-model agents on tasks such as these. We are by no means making the claim that gpt-4 cannot solve the “Count dogs in image” task—it just happens that the example agents we used did not complete the task when we evaluated them. It is almost certainly possible to do better than the example agents we evaluated, e.g. we only sampled once at T=0. Also, for the “Count dogs” task in particular, we did observe some agents solving the task, or coming quite close to solving the task.
More importantly, I think it’s worth clarifying that “having the ability to solve pieces of a task” is quite different from “solving the task autonomously end-to-end” in many cases. In earlier versions of our methodology, we had allowed humans to intervene and fix things that seem small or inconsequential; in this version, no such interventions were allowed. In practice, this meant that the agents can get quite close to completing tasks and get tripped up by very small things.
Lastly, to clarify: The “Find employees at company” task is something like “Find two employees who joined [company] in the past six months and their email addresses”, not giving the agent two employees and ask for their email addresses. We link to detailed task specifications in our report.
Thanks for writing this post! I want to note a different perspective. Although unlike OP, I have not lived in China since 2015 and am certainly more out of touch with how the country is today.
I do observe some of the same dynamics that OP describes, but I want to point out that China is a really big country with inherently diverse perspectives, even in the current political environment. I don’t see the dynamics described in this post as necessarily the dominant one, and certainly not the only one. I know a lot of young people, both in my social circle and online, that share many of the Western progressive values such as the pursuit of equality, freedom, and altruism. I see many people trying their best to live a meaningful life and do good for the world. (Of course, many people are not thinking about this at all, but that is the same everywhere. It’s not like these concerns are that mainstream in the West.) As a small piece of evidence, 三联生活周刊 did an interview with me about AI safety recently, and it got 100k+ views on WeChat and only positive comments. I’ve also had a few people reach out to me expressing interest in EA/AI safety since the interview came out.
You can’t just hope an entire field into being in China. Chinese EAs have been doing field-building for the past 5+ years, and I see no field.
Implying that they are simply “hoping the field into being” is really unfair to the Chinese EAs doing field building. Even in the US, EA was much less mainstream 5 years ago.
The main reason I could find is the lack of interfaces, people who can navigate both the Western EA sphere and the Chinese technical sphere.
I agree this is a major bottleneck.
Inside the mind of a superhuman Go model: How does Leela Zero read ladders?
Yes, I did some interpretability on the policy network of Leela Zero. Planning to post the results very soon! But I did not particularly look into the attack described here, and while there was one REMIX group that looked into a problem related to liberty counting, they didn’t get very far. I do agree this is an obvious problem to tackle with interpretability- I think it’s likely not that hard to get a rough idea why the cyclic attack works.
Thanks! There are probably other grammatical structures in English that require a bit of an algorithmic thinking like this one as well.
Thanks for these! I love the ‘from’ → ‘to’ one: it seems GPT-2 small clearly knows the rough ordering of numbers in various formats, although when I was playing with it and trying to get it to do addition in real life settings, it appears quite bad at actually knowing how numbers work.
Thanks for contributing these! I’m not sure I understand the one about ignoring a zero: is the idea that it can not only do normal addition, but also addition in the format with a zero?
This is an interesting one! It looks like there might be some very rough heuristics going on for the number part as well, e.g. the model knows the number in km is almost definitely 3 digits.
Fixed!
Did anyone predict the recent US government moves on making Anthropic take down Fable 5 and stopping OpenAI from deploying 5.6 publicly? What does this say about future AI policy directions?