Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev
Zac Hatfield-Dodds
I still think pretty regularly on “green-according-to-blue”. It’s become my concept handle to explain-away the appeal of the common mistake (‘naive green’?), and simultaneously warn against dismissing green on the basis of a straw man.
I read Otherness & Control as they were published, and this was something like the core tension to me:
Florishing. Elua. First, to thine own self be true. The utility function is not up for grabs. We need not understand CEV; nor may we desist from pursing it. (I can’t point to this in more than poetry, but this is The Thing.)
Centrally, this is Black, and instrumentally Blue-Black. Yin, sure, but because of green-according-to-blue—a warning against the classic mistakes of the think-only-of-cutting school. (the virtues of white and red seem less neglected, less dismissed, here)
But… nor may we desist from pursuing it. A niggling feeling that (maybe) I’m missing something; that “nevertheless, green-qua-green” is a tension worth sitting with, if you can avoid making the usual mistakes.
Musing on attunement has, I think, untangled some subtle confusions. I haven’t exactly changed my mind, but I do think I’m a little wiser than I would have been otherwise.
There just aren’t that many rivalrous goods in OSS—website hosting etc. tends to be covered by large tech companies as a goodwill/marketing/recruiting/supply-chain expense, c.f. the Python Sofware Foundation. Major conferences usually have some kind of scholarship program for students and either routinely pay speakers or cover costs for those who couldn’t attend otherwise; community-organized conferences like PyCon tend to be more generous with those. Honor-system “individual ticket” vs “corporate ticket” prices are pretty common, often with a cheaper student price too.
A key mechanism I think is that lots of people are aware that they have lucrative jobs and/or successful companies because of open source, that being basically the only way to make money off OSS, and therefore are willing to give back either directly or for the brand benefits.
I’d strongly encourage most donors to investigate donating appreciated assets; especially when subject to LTCG tax treatment you can get a very neat double effect: donate the whole value to charity without subtracting CGT, and then claim a deduction for the full value.
We’ll suppose each planet has an independent uniform probability of being at each point in its orbit
This happens to be true of Earth, and a very useful assumption, but I think it’s pretty neat that some systems have a mean-motion resonance (eg Neptune-Pluto, or the Galilean moons, or almost Jupiter-Saturn) which constrains the relative position away from a uniform distribution.
I would love this list (even) more with hyperlinks.
“yes obviously there’s a lot of gift-economy around”, some further observations
It’s more embedded in the larger social setting and economy than typically-studied gift economies.
Open-source sofware might be an interesting comparison with both gifts and lots of money around.
You need some mechanism to prevent extractive behavior; typically that’s either ticketing or private invites
‘pass it forward’ often looks similar to ‘making the world and community I want to live in’; both are noble
The incredibly wide range of incomes and wealth is really unusual and nobody really knows what to do with it, alas.
I am so very tired of these threads, but I’ll chime in at least for this comment. Here’s last time, for reference.
-
I continue to think that working at Anthropic—even in non-safety roles, I’m currently on the Alignment team but have worked on others too—is a great way to contribute to AI safety. Most people I talk to agree that they think the situation would be worse if Anthropic had not been founded or didn’t exist, including MIRI employees.
-
I’m not interested in litigating an is-ought gap about whether “we” (human civilization?) “should” be facing such high risks from AI; obviously we’re not in such an ideal world, and so discussions from that implicit starting point are imo useless.
-
I have a lot of non-public information too, which points in very different directions to the citations here. Several are from people who I know to have lied about Anthropic in the past; and many more are adversarially construed. For some I agree on an underlying fact and strongly disagree with the framing and implication.
-
I continue to have written redlines which would cause me to quit in protest.
-
Yes, politics at this level is definitely an area where you need both non-layman expertise, and a lot of specific context.
I’m disappointed that no one (EA-ish or otherwise) seems do have done anything interesting with that liquidation opportunity.
I’ve spent a lot of time this year on tax-and-donation planning, and helping colleagues with their plans. Some very substantial, largely still confidential, things have indeed been done, and I think they will pay off very nicely starting probably-next-year and scaling up over time.
I’d score these subclaims as complicated, false, and false—complicated because I think Anthropic’s proposals were to move from a strong but non-viable bill towards a weaker but viable approach, which was vetoed anyway.
My goal with Hypothesis is to undermine this whole approach, by making “use PBT” the max-productivity path and setting as difficult a baseline as possible. So far it’s going pretty well!
Why should we have such high credence that self- or successor-improvement capable systems will be well modelled as having a utility function which results in strong value-preservation?
See ‘valley of bad rationality’; of course an incremental move towards rationality is not always ideal (and some moves are not actually towards rationality). But see also generalizing from fictional evidence; empirically it tends to be a good idea.
That “mysterious aspect” might be “due process of law”, traditionally considered an essential constraint on state power, and notably absent from this strike.
The Time article is materially wrong about a bunch of stuff—for example, there is a large difference between incentives and duties; all board members have the same duties but LTBT appointees are likely to have a very different equity stake to whoever is in the CEO board seat.
I really don’t want to get into pedantic details, but there’s no “supposed to” time for LTBT board appointments, I think you’re counting from the first day they were legally able to appoint someone. Also https://www.anthropic.com/company lists five board members out of five seats, and four Trustees out of a maximum five. IMO it’s fine to take a few months to make sure you’ve found the right person!
More broadly, the corporate governance discussions (not just about Anthropic) I see on LessWrong and in the EA community are very deeply frustrating, because almost nobody seems to understand how these structures normally function or why they’re designed that way or the failure modes that occur in practise. Personally, I spent about a decade serving on nonprofit boards, oversight committes which appointed nonprofit boards, and set up the goverance for a for-profit company I founded.
I know we love first-principles thinking around here, but this is a domain with an enormous depth of practice, crystalized from long experience of (often) very smart people in sometimes-adversarial situations.
In any case, I think I’m done with this thread.
I think it is simply false that Anthropic leadership (excluding the LTB Trustees) have control over board appointments. You may argue they have influence, to the extent that the Trustees defer to their impressions or trust their advice, but formal control of the board is a very different thing. The class T shares held by the LTBT are entitled to appoint a majority of the board, and that cannot change without the approval of the LTBT.[1]
Delaware law gives the board of a PBC substantial discretion in how they should balance shareholder profits, impacts on the public, and the mission of the organization. Again, I trust current leadership, but think it is extremely important that there is a legally and practically binding mechanism to avoid that balance being set increasingly towards shareholders rather than the long-term benefit of humanity—even as the years go by, financial stakes rise, and new people take leadership roles.
In addition to appointing a majority of the board, the LTBT is consulted on RSP policy changes (ultimately approved by the LTBT-controlled board), and they receive Capability Reports and Safeguards Reports before the company moves forward with a model release. IMO it’s pretty reasonable to call this meaningful oversight—the LTBT is a backstop to ensure that the company continues to prioritize the mission rather than a day-to-day management group, and I haven’t seen any problems with that.
- ↩︎
or making some extremely difficult amendments to the Trust arrangements; you can read Anthropic’s certificate of incorporation for details. I’m not linking to it here though, because the commentary I’ve seen here previously has misunderstood basic parts like “who has what kind of shares” pretty badly.
- ↩︎
These are personal committments which I wrote down before I joined, or when the topic (e.g. RSP and LTBT) arose later. Some are ‘hard’ lines (if $event happens); others are ‘soft’ (if in my best judgement …) and may say something about the basis for that judgement—most obviously that I won’t count my pay or pledged donations as a reason to avoid leaving or speaking out.
I’m not comfortable giving a full or exact list (cf), but a sample of things that would lead me to quit:
If I thought that Anthropic was on net bad for the world.
If the LTBT was abolished without a good replacement.
Severe or willful violation of our RSP, or misleading the public about it.
Losing trust in the integrity of leadership.
I joined Anthropic in 2021 because I thought it was an extraordinarily good way to help make AI go well for humanity, and I have continued to think so. If that changed, or if any of my written lines were crossed, I’d quit.
I think many of the factual claims in this essay are wrong (for example, neither Karen Hao nor Max Tegmark are in my experience reliable sources on Anthropic); we also seem to disagree on more basic questions like “has Anthropic published any important safety and interpretability research”, and whether commercial success could be part of a good AI Safety strategy. Overall this essay feels sufficiently one-sided and uncharitable that I don’t really have much to say beyond “I strongly disagree, and would have quit and spoken out years ago otherwise”.
I regret that I don’t have the time or energy for a more detailed response, but thought it was worth noting the bare fact that I have detailed views on these issues (including a lot of non-public information) and still strongly disagree.
I recommend carefully reading Taking AI Welfare Seriously; it seems to me that you’re arguing against a position which I haven’t seen anyone arguing for.
“Yes, obviously!”
...except that this is apparently not obvious, for example to those who recommend taking a “safety role” but not a “capabilities role” rather than an all-things-considered analysis. That’s harder and often aversive, but solving a different easier problem doesn’t actually help.