Haoxing Du

Karma: 248

Haoxing Du 17 Aug 2023 21:02 UTC
4 points
3
in reply to: Patrick Leask’s comment on: ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Thanks for your engagement with the report and our tasks! As we explain in the full report, the purpose of this report is to lay out the methodology of how one would evaluate language-model agents on tasks such as these. We are by no means making the claim that gpt-4 cannot solve the “Count dogs in image” task—it just happens that the example agents we used did not complete the task when we evaluated them. It is almost certainly possible to do better than the example agents we evaluated, e.g. we only sampled once at T=0. Also, for the “Count dogs” task in particular, we did observe some agents solving the task, or coming quite close to solving the task.

More importantly, I think it’s worth clarifying that “having the ability to solve pieces of a task” is quite different from “solving the task autonomously end-to-end” in many cases. In earlier versions of our methodology, we had allowed humans to intervene and fix things that seem small or inconsequential; in this version, no such interventions were allowed. In practice, this meant that the agents can get quite close to completing tasks and get tripped up by very small things.

Lastly, to clarify: The “Find employees at company” task is something like “Find two employees who joined [company] in the past six months and their email addresses”, not giving the agent two employees and ask for their email addresses. We link to detailed task specifications in our report.

Haoxing Du 28 Apr 2023 15:56 UTC
35 points
7
on: My Assessment of the Chinese AI Safety Community
Thanks for writing this post! I want to note a different perspective. Although unlike OP, I have not lived in China since 2015 and am certainly more out of touch with how the country is today.

I do observe some of the same dynamics that OP describes, but I want to point out that China is a really big country with inherently diverse perspectives, even in the current political environment. I don’t see the dynamics described in this post as necessarily the dominant one, and certainly not the only one. I know a lot of young people, both in my social circle and online, that share many of the Western progressive values such as the pursuit of equality, freedom, and altruism. I see many people trying their best to live a meaningful life and do good for the world. (Of course, many people are not thinking about this at all, but that is the same everywhere. It’s not like these concerns are that mainstream in the West.) As a small piece of evidence, 三联生活周刊 did an interview with me about AI safety recently, and it got 100k+ views on WeChat and only positive comments. I’ve also had a few people reach out to me expressing interest in EA/AI safety since the interview came out.

You can’t just hope an entire field into being in China. Chinese EAs have been doing field-building for the past 5+ years, and I see no field.

Implying that they are simply “hoping the field into being” is really unfair to the Chinese EAs doing field building. Even in the US, EA was much less mainstream 5 years ago.

The main reason I could find is the lack of interfaces, people who can navigate both the Western EA sphere and the Chinese technical sphere.

I agree this is a major bottleneck.

Inside the mind of a superhuman Go model: How does Leela Zero read ladders?

Haoxing Du1 Mar 2023 1:47 UTC

158 points

8 comments30 min readLW link

Haoxing Du 22 Feb 2023 4:53 UTC
7 points
0
in reply to: LawrenceC’s comment on: There are (probably) no superhuman Go AIs: strong human players beat the strongest AIs
Yes, I did some interpretability on the policy network of Leela Zero. Planning to post the results very soon! But I did not particularly look into the attack described here, and while there was one REMIX group that looked into a problem related to liberty counting, they didn’t get very far. I do agree this is an obvious problem to tackle with interpretability- I think it’s likely not that hard to get a rough idea why the cyclic attack works.

Haoxing Du 17 Oct 2022 6:17 UTC
2 points
1
in reply to: Jérémy Scheurer’s comment on: Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small
Thanks! There are probably other grammatical structures in English that require a bit of an algorithmic thinking like this one as well.

Haoxing Du 17 Oct 2022 6:15 UTC
1 point
0
in reply to: Chase Carter’s comment on: Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small
Thanks for these! I love the ‘from’ → ‘to’ one: it seems GPT-2 small clearly knows the rough ordering of numbers in various formats, although when I was playing with it and trying to get it to do addition in real life settings, it appears quite bad at actually knowing how numbers work.

Haoxing Du 17 Oct 2022 6:09 UTC
1 point
0
in reply to: Logan Riggs’s comment on: Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small
Thanks for contributing these! I’m not sure I understand the one about ignoring a zero: is the idea that it can not only do normal addition, but also addition in the format with a zero?

Haoxing Du 17 Oct 2022 6:05 UTC
1 point
0
in reply to: Unnamed’s comment on: Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small
This is an interesting one! It looks like there might be some very rough heuristics going on for the number part as well, e.g. the model knows the number in km is almost definitely 3 digits.

Haoxing Du 17 Oct 2022 5:56 UTC
2 points
0
in reply to: lberglund’s comment on: Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small
Fixed!

Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small

Haoxing Du and Buck

12 Oct 2022 21:25 UTC

50 points

11 comments4 min readLW link

Haoxing Du

In­side the mind of a su­per­hu­man Go model: How does Leela Zero read lad­ders?

Help out Red­wood Re­search’s in­ter­pretabil­ity team by find­ing heuris­tics im­ple­mented by GPT-2 small

Inside the mind of a superhuman Go model: How does Leela Zero read ladders?

Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small