Thanks for writing this post! I want to note a different perspective. Although unlike OP, I have not lived in China since 2015 and am certainly more out of touch with how the country is today.
I do observe some of the same dynamics that OP describes, but I want to point out that China is a really big country with inherently diverse perspectives, even in the current political environment. I don’t see the dynamics described in this post as necessarily the dominant one, and certainly not the only one. I know a lot of young people, both in my social circle and online, that share many of the Western progressive values such as the pursuit of equality, freedom, and altruism. I see many people trying their best to live a meaningful life and do good for the world. (Of course, many people are not thinking about this at all, but that is the same everywhere. It’s not like these concerns are that mainstream in the West.) As a small piece of evidence, 三联生活周刊 did an interview with me about AI safety recently, and it got 100k+ views on WeChat and only positive comments. I’ve also had a few people reach out to me expressing interest in EA/AI safety since the interview came out.
You can’t just hope an entire field into being in China. Chinese EAs have been doing field-building for the past 5+ years, and I see no field.
Implying that they are simply “hoping the field into being” is really unfair to the Chinese EAs doing field building. Even in the US, EA was much less mainstream 5 years ago.
The main reason I could find is the lack of interfaces, people who can navigate both the Western EA sphere and the Chinese technical sphere.
I agree this is a major bottleneck.
Thanks for your engagement with the report and our tasks! As we explain in the full report, the purpose of this report is to lay out the methodology of how one would evaluate language-model agents on tasks such as these. We are by no means making the claim that gpt-4 cannot solve the “Count dogs in image” task—it just happens that the example agents we used did not complete the task when we evaluated them. It is almost certainly possible to do better than the example agents we evaluated, e.g. we only sampled once at T=0. Also, for the “Count dogs” task in particular, we did observe some agents solving the task, or coming quite close to solving the task.
More importantly, I think it’s worth clarifying that “having the ability to solve pieces of a task” is quite different from “solving the task autonomously end-to-end” in many cases. In earlier versions of our methodology, we had allowed humans to intervene and fix things that seem small or inconsequential; in this version, no such interventions were allowed. In practice, this meant that the agents can get quite close to completing tasks and get tripped up by very small things.
Lastly, to clarify: The “Find employees at company” task is something like “Find two employees who joined [company] in the past six months and their email addresses”, not giving the agent two employees and ask for their email addresses. We link to detailed task specifications in our report.