Zac Hatfield-Dodds

Karma: 3,665

Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev

Zac Hatfield-Dodds 29 Jan 2026 7:58 UTC
7 points
0
in reply to: koanchuk’s comment on: koanchuk’s Shortform
Note also: the last US-Russia nuclear arms-control treaty expires next week; far from neatly containing the problem we’re watching an ongoing breakdown of decades-old norms. I’m worried.

Zac Hatfield-Dodds 29 Jan 2026 0:19 UTC
32 points
17
on: How Articulate Are the Whales?
It seems to me that there’s a strong case for audibly distinguishable sounds, and that they can’t be formed by high-speed muscle movements, but I’m not convinced that they couldn’t be volitional via another mechanism such as beamforming or other control of interference patterns. Humans can do overtone/throat singing with some practice, for example, and I can imagine something similar plus a click...

Zac Hatfield-Dodds 24 Jan 2026 3:21 UTC
2 points
0
in reply to: TsviBT’s comment on: Daniel Birnbaum’s Shortform
I’ve seen this message, but per our confidentiality policy, no comment. (and don’t take the below as indicative either way, glomar response etc)

I will note though that Dario did not actually talk about recursive self-improvement, nor about superintelligence; commentary on Lesswrong often assumes a shared ontology that just doesn’t exist.

I also continue to endorse the confidentiality policy, and per my red lines I still trust Anthropic’s leadership and think the company is on net good for the world.

Zac Hatfield-Dodds 23 Jan 2026 22:20 UTC
5 points
0
in reply to: JohnWittle’s comment on: Claude’s new constitution
Claude’s Constitution just isn’t designed or optimized as a “public relations” document. We do revise it in response to external reviews or criticism, but because that makes it better for the core purpose of shaping Claude’s character and behavior as an alignment technique.

Zac Hatfield-Dodds 23 Jan 2026 5:05 UTC
36 points
2
in reply to: peterbarnett’s comment on: Claude’s new constitution
Claude’s constitution is a living document! Opus 4.5 was trained on an earlier iteration, and we expect future models will be trained on the then-current version of this constitution.

Claude’s new constitution

Zac Hatfield-Dodds and Drake Thomas

21 Jan 2026 19:37 UTC

174 points

47 comments6 min readLW link

(www.anthropic.com)

Zac Hatfield-Dodds 20 Jan 2026 9:59 UTC
2 points
0
in reply to: Raemon’s comment on: Superbabies: Putting The Pieces Together
I standard-upvoted this at the time, and +1-review-voted more recently. I like it, as a “yes, the work continues” kind of log entry, and it’s nice to be reminded that induced meiosis and superSOX were very recent very big deals. …OK, you got me, two hours later here’s a review

Zac Hatfield-Dodds 20 Jan 2026 9:56 UTC
30 points
0
on: Superbabies: Putting The Pieces Together
A neat progress update, though largely obsoleted by more recent posts (see below).

LessWrong is famously obsessed with the trainable skills and social practice of rationality, and with the prospect of very strong computer intelligence. The biological pathway to improved cognition doesn’t get as much discussion, and the timelines to impact are (probably!) somewhat longer, but I remain convinced of its importance. I also strongly agree with the ethical position that

nobody’s civil rights should be in any way violated on account of their genetic code, and that reasonable precautions should be taken to make sure novel human reproductive tech is safe.

and from early 2026, it’s nice to be reminded that induced meiosis and super-SOX for naive pluripotency were substantial breakthroughs! My impression is that we’re still basically on track to start primate testing in the late 2020s, rather than having basically no idea if or when such things might be possible.

On the other hand, I’m substantially more optimistic than Sarah about the causal predictive power of polygenic scores—at least in principle; don’t take me as endorsing any particular predictor or even current practice in general. This is supported by two key stylized facts:
1. sibling genomes are independent-random when conditioned on the shared parent genomes. ^[1] This means that observing a phenotype-genotype correlation among siblings with a shared environment ^[2] really does tell us about causality.
2. most genes have approximately additive effects, and interaction effects are usually small and almost always positive rather than being tradeoffs! See e.g. the heatmap here. An intuition: this is because we’re only considering traits which vary between currently-living humans, which are recent enough and weakly-selected-enough that they haven’t reached fixation. ^[3] If that didn’t help, ask me in person sometime because I have more thoughts this timebox is too narrow to contain.
Finally, what’s the state of the art in early 2026?
- many of my friends with young children used polygenic screening aka embryo selection. At time of writing, Herasight is a clear leader, though choice of IVF clinic etc. also matters.
- you can pretty easily do a full-genome sequence for less than $1000; at scale you could likely get the cost down to the 100-500 range, and it keeps falling over time.
- every other method involves doing something new, not just ‘getting artificially lucky’, and so there are a bunch of as-yet-unsolved challenges like dealing with DNA methylation and making sure all the cell markers are right—not just right-enough-to-mostly-work. This probably adds a couple of years of fiddling after the core tech works, and then extensive trials, to all the timelines below.
- iterated embryo selection would require deriving gametes from embryonic cells, which seems at least as difficult as iterated meiosis with a donor egg—and may take a few months rather than less than an hour per iteration. I think we’re 3-5 years away from deriving eggs from stem cells, longer for sperm, and probably 3-5 away from induced meiosis.
- editing, especially multiplex editing, combines really nicely with iterative techniques: you can try some edits, then sequence the cell lines and keep those which worked well without off-targets. For selection-only it’s harder, because you have to do the edits before the fertilized egg first divides in order to avoid mosaicism. It’s on a promising trajectory for the longer term, at least.
- [another possibility omitted for reasons]
So concretely: if you want to have kids very soon talk to Herasight; if you want to have them later after-you’re-30 freeze sperm or eggs (not embryos, for flexibility); if you want to get involved as a researcher or a funder send me an email. The kids are gonna be alright :-)
1. ↩︎
  OK, it’s actually a lot more complicated than this, but we can do fancier statistics which compensate for things like variation in the fraction of genome inherited from each parent (+/- ~4%), various constraints at particular loci (e.g. homozygous AA/BB parents will have uniformly AB, violating independence), etc. On the upside, we can also do fancier analyses where relatives outside the direct line of descent like cousins also provide (weaker) evidence about genetic effects.
2. ↩︎
  oh man this one is really tough, even if we account for birth order effects. At some level we’re always conditioning on a particular environment; many of the genes we today (accurately) describe as causally contributing to autoimmune disorders were strongly advantageous during the Black Death. Some relevant parts of the environment are changing pretty fast these days though—as an example, measuring “educational attainment” by degree status will tell you something quite different about my grandparents and their grandchildren. But we will correct for what we can, and downweight what we can’t; in any case this seems more challenging for cognitive traits than health and medical outcomes.
3. ↩︎
  incidentally I think it’s great that modern civilization has things like “vaccines” and “enough food” and “radically lower infant mortality”. ‘Selection pressure’ is an abstract way to describe people dying, or their children dying, etc., and I believe we can do better. As a bonus, polygenic scores are a far more sensitive and powerful optimization proceedure than variation-and-selection.

Zac Hatfield-Dodds 3 Jan 2026 10:28 UTC
5 points
6
on: Re: Anthropic Chinese Cyber-Attack. How Do We Protect Open-source Models?
Bluntly, this cannot possibly work.

Open-weights models will remain useful for general-purpose tasks, including in the common case where earlier context on the situation was not produced by the same model. Breaking the evidence chain is therefore sufficient, and is also easy for the attacker.

Do not confuse desirability for possibility.

Zac Hatfield-Dodds 17 Dec 2025 0:31 UTC
4 points
0
on: What Goes Without Saying
I would be very happy to have this essay introduce the best-of collection, as a worked example of (6).
What links here?
- Deeper Reviews for the top 15 (of the 2024 Review) by Raemon (14 Jan 2026 23:59 UTC; 45 points)

Zac Hatfield-Dodds 8 Dec 2025 9:15 UTC
2 points
1
on: Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)
“Yes, obviously!”

...except that this is apparently not obvious, for example to those who recommend taking a “safety role” but not a “capabilities role” rather than an all-things-considered analysis. That’s harder and often aversive, but solving a different easier problem doesn’t actually help.
What links here?
- Deeper Reviews for the top 15 (of the 2024 Review) by Raemon (14 Jan 2026 23:59 UTC; 45 points)

Zac Hatfield-Dodds 8 Dec 2025 9:09 UTC
4 points
0
on: On attunement
I still think pretty regularly on “green-according-to-blue”. It’s become my concept handle to explain-away the appeal of the common mistake (‘naive green’?), and simultaneously warn against dismissing green on the basis of a straw man.

I read Otherness & Control as they were published, and this was something like the core tension to me:
1. Florishing. Elua. First, to thine own self be true. The utility function is not up for grabs. We need not understand CEV; nor may we desist from pursing it. (I can’t point to this in more than poetry, but this is The Thing.)
2. Centrally, this is Black, and instrumentally Blue-Black. Yin, sure, but because of green-according-to-blue—a warning against the classic mistakes of the think-only-of-cutting school. (the virtues of white and red seem less neglected, less dismissed, here)
3. But… nor may we desist from pursuing it. A niggling feeling that (maybe) I’m missing something; that “nevertheless, green-qua-green” is a tension worth sitting with, if you can avoid making the usual mistakes.
Musing on attunement has, I think, untangled some subtle confusions. I haven’t exactly changed my mind, but I do think I’m a little wiser than I would have been otherwise.
What links here?
- Deeper Reviews for the top 15 (of the 2024 Review) by Raemon (14 Jan 2026 23:59 UTC; 45 points)

Zac Hatfield-Dodds 3 Dec 2025 6:26 UTC
2 points
0
in reply to: Screwtape’s comment on: Change My Mind: The Rationalist Community is a Gift Economy
There just aren’t that many rivalrous goods in OSS—website hosting etc. tends to be covered by large tech companies as a goodwill/marketing/recruiting/supply-chain expense, c.f. the Python Sofware Foundation. Major conferences usually have some kind of scholarship program for students and either routinely pay speakers or cover costs for those who couldn’t attend otherwise; community-organized conferences like PyCon tend to be more generous with those. Honor-system “individual ticket” vs “corporate ticket” prices are pretty common, often with a cheaper student price too.

A key mechanism I think is that lots of people are aware that they have lucrative jobs and/or successful companies because of open source, that being basically the only way to make money off OSS, and therefore are willing to give back either directly or for the brand benefits.

Zac Hatfield-Dodds 3 Dec 2025 6:18 UTC
6 points
4
on: GiveCalc: Open-source tool to calculate the true cost of charitable giving
I’d strongly encourage most donors to investigate donating appreciated assets; especially when subject to LTCG tax treatment you can get a very neat double effect: donate the whole value to charity without subtracting CGT, and then claim a deduction for the full value.

Zac Hatfield-Dodds 1 Dec 2025 19:51 UTC
10 points
0
on: Which planet is closest to the Earth, and why is it Mercury?

We’ll suppose each planet has an independent uniform probability of being at each point in its orbit

This happens to be true of Earth, and a very useful assumption, but I think it’s pretty neat that some systems have a mean-motion resonance (eg Neptune-Pluto, or the Galilean moons, or almost Jupiter-Saturn) which constrains the relative position away from a uniform distribution.

Zac Hatfield-Dodds 30 Nov 2025 10:59 UTC
16 points
22
on: A Blogger’s Guide To The 21st Century
I would love this list (even) more with hyperlinks.

Zac Hatfield-Dodds 29 Nov 2025 9:24 UTC
15 points
4
on: Change My Mind: The Rationalist Community is a Gift Economy
“yes obviously there’s a lot of gift-economy around”, some further observations
- It’s more embedded in the larger social setting and economy than typically-studied gift economies.
- Open-source sofware might be an interesting comparison with both gifts and lots of money around.
- You need some mechanism to prevent extractive behavior; typically that’s either ticketing or private invites
- ‘pass it forward’ often looks similar to ‘making the world and community I want to live in’; both are noble
- The incredibly wide range of incomes and wealth is really unusual and nobody really knows what to do with it, alas.

Zac Hatfield-Dodds 29 Nov 2025 9:07 UTC
49 points
7
on: Unless its governance changes, Anthropic is untrustworthy
I am so very tired of these threads, but I’ll chime in at least for this comment. Here’s last time, for reference.
- I continue to think that working at Anthropic—even in non-safety roles, I’m currently on the Alignment team but have worked on others too—is a great way to contribute to AI safety. Most people I talk to agree that they think the situation would be worse if Anthropic had not been founded or didn’t exist, including MIRI employees.
- I’m not interested in litigating an is-ought gap about whether “we” (human civilization?) “should” be facing such high risks from AI; obviously we’re not in such an ideal world, and so discussions from that implicit starting point are imo useless.
- I have a lot of non-public information too, which points in very different directions to the citations here. Several are from people who I know to have lied about Anthropic in the past; and many more are adversarially construed. For some I agree on an underlying fact and strongly disagree with the framing and implication.
- I continue to have written redlines which would cause me to quit in protest.

Zac Hatfield-Dodds 29 Nov 2025 8:59 UTC
−3 points
9
in reply to: TsviBT’s comment on: Unless its governance changes, Anthropic is untrustworthy
Yes, politics at this level is definitely an area where you need both non-layman expertise, and a lot of specific context.

Zac Hatfield-Dodds 29 Nov 2025 8:49 UTC
6 points
5
in reply to: Greg C’s comment on: Leaving Open Philanthropy, going to Anthropic

I’m disappointed that no one (EA-ish or otherwise) seems do have done anything interesting with that liquidation opportunity.

I’ve spent a lot of time this year on tax-and-donation planning, and helping colleagues with their plans. Some very substantial, largely still confidential, things have indeed been done, and I think they will pay off very nicely starting probably-next-year and scaling up over time.