Zac Hatfield-Dodds

Karma: 3,265

Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev

Zac Hatfield-Dodds 23 Sep 2025 21:11 UTC
8 points
−2
on: In Defence of False Beliefs
See ‘valley of bad rationality’; of course an incremental move towards rationality is not always ideal (and some moves are not actually towards rationality). But see also generalizing from fictional evidence; empirically it tends to be a good idea.

Zac Hatfield-Dodds 15 Sep 2025 23:42 UTC
10 points
5
in reply to: Shankar Sivarajan’s comment on: Monthly Roundup #34: September 2025
That “mysterious aspect” might be “due process of law”, traditionally considered an essential constraint on state power, and notably absent from this strike.

Zac Hatfield-Dodds 4 Sep 2025 18:22 UTC
4 points
−5
in reply to: ryan_greenblatt’s comment on: Anthropic’s leading researchers acted as moderate accelerationists
The Time article is materially wrong about a bunch of stuff—for example, there is a large difference between incentives and duties; all board members have the same duties but LTBT appointees are likely to have a very different equity stake to whoever is in the CEO board seat.

I really don’t want to get into pedantic details, but there’s no “supposed to” time for LTBT board appointments, I think you’re counting from the first day they were legally able to appoint someone. Also https://www.anthropic.com/company lists five board members out of five seats, and four Trustees out of a maximum five. IMO it’s fine to take a few months to make sure you’ve found the right person!

More broadly, the corporate governance discussions (not just about Anthropic) I see on LessWrong and in the EA community are very deeply frustrating, because almost nobody seems to understand how these structures normally function or why they’re designed that way or the failure modes that occur in practise. Personally, I spent about a decade serving on nonprofit boards, oversight committes which appointed nonprofit boards, and set up the goverance for a for-profit company I founded.

I know we love first-principles thinking around here, but this is a domain with an enormous depth of practice, crystalized from long experience of (often) very smart people in sometimes-adversarial situations.

In any case, I think I’m done with this thread.

Zac Hatfield-Dodds 4 Sep 2025 7:56 UTC
9 points
−17
in reply to: ryan_greenblatt’s comment on: Anthropic’s leading researchers acted as moderate accelerationists
I think it is simply false that Anthropic leadership (excluding the LTB Trustees) have control over board appointments. You may argue they have influence, to the extent that the Trustees defer to their impressions or trust their advice, but formal control of the board is a very different thing. The class T shares held by the LTBT are entitled to appoint a majority of the board, and that cannot change without the approval of the LTBT.^[1]

Delaware law gives the board of a PBC substantial discretion in how they should balance shareholder profits, impacts on the public, and the mission of the organization. Again, I trust current leadership, but think it is extremely important that there is a legally and practically binding mechanism to avoid that balance being set increasingly towards shareholders rather than the long-term benefit of humanity—even as the years go by, financial stakes rise, and new people take leadership roles.

In addition to appointing a majority of the board, the LTBT is consulted on RSP policy changes (ultimately approved by the LTBT-controlled board), and they receive Capability Reports and Safeguards Reports before the company moves forward with a model release. IMO it’s pretty reasonable to call this meaningful oversight—the LTBT is a backstop to ensure that the company continues to prioritize the mission rather than a day-to-day management group, and I haven’t seen any problems with that.
1. ↩︎
  or making some extremely difficult amendments to the Trust arrangements; you can read Anthropic’s certificate of incorporation for details. I’m not linking to it here though, because the commentary I’ve seen here previously has misunderstood basic parts like “who has what kind of shares” pretty badly.

Zac Hatfield-Dodds 3 Sep 2025 20:38 UTC
41 points
0
in reply to: habryka’s comment on: Anthropic’s leading researchers acted as moderate accelerationists
These are personal committments which I wrote down before I joined, or when the topic (e.g. RSP and LTBT) arose later. Some are ‘hard’ lines (if $event happens); others are ‘soft’ (if in my best judgement …) and may say something about the basis for that judgement—most obviously that I won’t count my pay or pledged donations as a reason to avoid leaving or speaking out.

I’m not comfortable giving a full or exact list (cf), but a sample of things that would lead me to quit:
- If I thought that Anthropic was on net bad for the world.
- If the LTBT was abolished without a good replacement.
- Severe or willful violation of our RSP, or misleading the public about it.
- Losing trust in the integrity of leadership.

Zac Hatfield-Dodds 2 Sep 2025 5:56 UTC
105 points
9
on: Anthropic’s leading researchers acted as moderate accelerationists
I joined Anthropic in 2021 because I thought it was an extraordinarily good way to help make AI go well for humanity, and I have continued to think so. If that changed, or if any of my written lines were crossed, I’d quit.

I think many of the factual claims in this essay are wrong (for example, neither Karen Hao nor Max Tegmark are in my experience reliable sources on Anthropic); we also seem to disagree on more basic questions like “has Anthropic published any important safety and interpretability research”, and whether commercial success could be part of a good AI Safety strategy. Overall this essay feels sufficiently one-sided and uncharitable that I don’t really have much to say beyond “I strongly disagree, and would have quit and spoken out years ago otherwise”.

I regret that I don’t have the time or energy for a more detailed response, but thought it was worth noting the bare fact that I have detailed views on these issues (including a lot of non-public information) and still strongly disagree.

Zac Hatfield-Dodds 28 Aug 2025 1:29 UTC
4 points
0
on: Against “Model Welfare” in 2025
I recommend carefully reading Taking AI Welfare Seriously; it seems to me that you’re arguing against a position which I haven’t seen anyone arguing for.

Zac Hatfield-Dodds 7 Aug 2025 16:45 UTC
11 points
8
on: Open weights == Closed source
No, the preferred form for modifying a model is a copy of the weights, plus open source code for training and inference. “Training a similar model from scratch” is wildly more expensive and less convenient, and not even modification!

If the model weights are available under an OSI-approved open source license, and so is code suitable for fine-tuning, I consider the model to be open source. Llama models definitely aren’t; most Chinese models are.

Zac Hatfield-Dodds 27 Jul 2025 17:48 UTC
7 points
0
in reply to: leogao’s comment on: leogao’s Shortform

like imagine if “pter” were a single character in words like helicopter and pterodactyl both contain “pter”, but you’d probably think of “helicopter” as an atomic unit with its own unique identity

I often do chunk them, but if you’ve picked up a bit of taxonomic Greek pter means ‘wing’, so we have helico-pter ‘spiral/rotating wing’ and ptero-dactyl ‘wing fingers’ - both cases where breaking down the name tells you something about what the things are!

Zac Hatfield-Dodds 10 Jul 2025 16:58 UTC
5 points
1
on: AI #124: Grokless Interlude

it would be very good if the main chat services like ChatGPT, Claude and Gemini offered branching (or cloning) and undoing within chats, so you can experiment with different continuations.

claude.ai does! If you edit one of your messages, it creates a branch and you can go back and forth between them, or even continue in parallel using multiple browser tabs.

Zac Hatfield-Dodds 28 Jun 2025 19:41 UTC
2 points
0
on: The next wave of model improvements will be due to data quality

Of course Google and Anthropic have their own version of these features that will provide them with data as well.

I think this is substantially wrong about Anthropic; see e.g. here.

Zac Hatfield-Dodds 6 Jun 2025 0:19 UTC
7 points
2
on: Histograms are to CDFs as calibration plots are to...
- I like the idea, but with n>100 points a histogram seems better, and for few points it’s hard to draw conclusions. e.g., I can’t work out an interpretation of the stdev lines that I find helpful.
- I’d make the starting point p=0.5, and use logits for the x-axis; that’s a more natural representation of probability to me. Optionally reflect p<0.5 about the y-axis to represent the symmetry of predicting likely things will happen vs unlikely things won’t.

Zac Hatfield-Dodds 28 May 2025 17:20 UTC
28 points
−62
on: Zac Hatfield Dodds’s Shortform
LTBT appoints Reed Hastings to Anthropic’s board of directors.

Today we announced that Reed Hastings, Chairman and co-founder of Netflix who served as its CEO for over 25 years, has been appointed to Anthropic’s board of directors by our Long Term Benefit Trust. Hastings brings extensive experience from founding and scaling Netflix into a global entertainment powerhouse, along with his service on the boards of Facebook, Microsoft, and Bloomberg.

“The Long Term Benefit Trust appointed Reed because his impressive leadership experience, deep philanthropic work, and commitment to addressing AI’s societal challenges make him uniquely qualified to guide Anthropic at this critical juncture in AI development,” said Buddy Shah, Chair of Anthropic’s Long Term Benefit Trust. [...]

Hastings said: “Anthropic is very optimistic about the AI benefits for humanity, but is also very aware of the economic, social, and safety challenges. I’m joining Anthropic’s board because I believe in their approach to AI development, and to help humanity progress.”

Personally, I’m excited to add Reed’s depth of business and philanthropic experience to the board, and that more of the LTBT’s work is now public.
What links here?
- Knight Lee's comment on Mikhail Samin’s Shortform by Mikhail Samin (24 Jun 2025 2:36 UTC; 3 points)

Zac Hatfield-Dodds 26 May 2025 16:36 UTC
12 points
4
on: New scorecard evaluating AI companies on safety
If you don’t feel great about the numbers, why are there so many of them on the website? The presentation seems much more focused on the scores than a collection of information.

Zac Hatfield-Dodds 15 Apr 2025 7:08 UTC
5 points
2
in reply to: niplav’s comment on: Stupid Question: Why am I getting consistently downvoted?
This feels pretty harsh, for someone who’s already disengaged and where you don’t know their circumstances. If I see them around I’ll ask if they can afford to pay at least part or set up some kind of plan, but at 2000:1 odds playing hardball would feel like hurting someone rather than collecting on a friendly bet. (see e.g. my ask for a smaller size, above)

I think this does give me a principled basis to ask for some kind of escrow in any similar situations in future though; e.g. counterparty donates at time of bet, and I pay back CPI-adjusted donation plus my loss if I lose. (and I think I’m credible for that, e.g.).

Zac Hatfield-Dodds 4 Apr 2025 15:43 UTC
11 points
0
in reply to: Zac Hatfield-Dodds’s comment on: Stupid Question: Why am I getting consistently downvoted?
Unfortunately MadHatter hasn’t responded to messages sent in March, and I haven’t heard anything from GiveWell to suggest that the donation has been made.

Zac Hatfield-Dodds 1 Mar 2025 22:25 UTC
3 points
−5
in reply to: Mikhail Samin’s comment on: Mikhail Samin’s Shortform
I strongly disagree that OpenAI’s and Anthropic’s efforts were similar (maybe there’s a bet there?). OpenAI formally opposed the bill without offering useful feedback; Anthropic offered consistent feedback to improve the bill, pledged to support it if amended, and despite your description of the second letter Senator Wiener describes himself as having Anthropic’s support.

I also disagree that a responsible company would have behaved differently. You say “The question is, was Anthropic supportive of SB-1047 specifically?”—but I think this is the wrong question, implying that lack of support is irresponsible rather than e.g. due to disagreements about the factual question of whether passing the bill in a particular state would be net-helpful for mitigating catastrophic risks. The Support if Amended letter, for example, is very clear:

Anthropic does not support SB 1047 in its current form. However, we believe the bill’s core aims to ensure the safe development of AI technologies are worthy, and that it is possible to achieve these aims while eliminating most of the current bill’s substantial drawbacks, as we will propose here. … We are committed to supporting the bill if all of our proposed amendments are made.

I don’t expect further discussion to be productive though; much of the additional information I have is nonpublic, and we seem to have different views on what constitutes responsible input into a policy process as well as basic questions like “is Anthropic’s engagement in the SB-1047 process well described as ‘support’ when the letter to Governor Newsom did not have the word ‘support’ in the subject line”. This isn’t actually a crux for me, but I and Senator Wiener seem to agree yes, while you seem to think no.

Zac Hatfield-Dodds 28 Feb 2025 7:36 UTC
5 points
1
on: January-February 2025 Progress in Guaranteed Safe AI
But also, I’d like to ask Zac how it’s “overrated” when the reception from funders is not even lukewarm. … Does Zac mean the current level of funding is already too high, or is he just worried about that number increasing?
My lightning talk was pitched primarily to academic researchers considering a transition into AI safety research, so I meant “overrated in that proponents’ claims, e.g. in papers such as Towards GSAI or Tegmark & Omohundro, often seem unrealistic”.
I don’t have a strong opinion on overall funding levels, though I’d generally favor GSAI projects conceptualized as strengthening or extending existing practice (vs building a new paradigm) because I think they’re much more likely to pay off.

Zac Hatfield-Dodds 28 Feb 2025 6:55 UTC
15 points
1
in reply to: Mikhail Samin’s comment on: Mikhail Samin’s Shortform
Sorry, I’m not sure what proposition this would be a crux for?

More generally, “what fraction good vs bad” seems to me a very strange way to summarize Anthropic’s Support if Amended letter or letter to Governor Newsom. It seems clear to me that both are supportive in principle of new regulation to manage emerging risks, and offering Anthropic’s perspective on how best to achieve that goal. I expect most people who carefully read either letter would agree with the preceeding sentence and would be open to bets on such a proposition.

Personally, I’m also concerned about the downside risks discussed in these letters—because I expect they both would have imposed very real costs, and reduced the odds of the bill passing and similar regulations passing and enduring in other juristictions. I nonetheless concluded that the core of the bill was sufficiently important and urgent, and downsides manageable, that I supported passing it.

Zac Hatfield-Dodds 10 Feb 2025 4:46 UTC
2 points
0
on: Don’t go bankrupt, don’t go rogue

How did the greens get here?

Largely via opposition to nuclear weapons, and some cost-benefit analysis which assumes nuclear proponents are too optimistic about both costs and risks of nuclear power (further reading). Personally I think this was pretty reasonable in the 70s and 80s. At this point I’d personally prefer to keep existing nuclear running and build solar panels instead of new reactors, though if SMRs worked in a sane regulatory regime that’d be nice too.