Joe Carlsmith comments on Leaving Open Philanthropy, going to Anthropic

Joe Carlsmith 15 Nov 2025 23:55 UTC
8 points
−12
Hey Adam — thanks for this. I wrote about this kind of COI in the post, but your comment was a good nudge to think more seriously about my take here.
Basically, I care here about protecting two sorts of values. On the one hand, I do think the sort of COI you’re talking about is real. That is, insofar as people at AI companies who have influence over trade-offs the company makes between safety and commercial success hold equity, deciding in favor of safety will cause them to lose money — and potentially, for high-stakes decisions like dropping out of the race, a lot of money. This is true of people in safety-focused roles, but it’s true of other kinds of employees as well — and of course, especially true of leadership, who have both an outsized amount of equity and an outsized amount of influence. This sort of COI can be a source of epistemic bias (e.g. in safety evaluations of the type you’re focused on), but it can also just be a more straightforward misalignment where e.g. what’s best by the lights of an equity-holder might not be best for the world. I really don’t want my decision-making as an Anthropic employee to end up increasing existential risk from AI because of factors like this. And indeed, given that Anthropic’s stated mission is (roughly) to do what’s best for the world re: AI, in some sense it’s in the job description of every employee to make sure this doesn’t happen.^[1] And just refusing to hold equity would indeed go far on this front (though: you can also get similar biases without equity — e.g., maybe you don’t want to put your cash salary at risk by making waves, pissing people off, etc). And even setting aside the reality of a given level of bias/misalignment, there can be additional benefits to it being legible to the world that this kind of bias/misalignment isn’t present (though I am currently much more concerned about the reality of the bias/misalignment at stake).
On the other hand: the amount of money at stake is enough that I don’t turn it down casually. This is partly due to donation potential. Indeed, my current guess is that (depending ofc on values and other views) many EA-ish folks should be glad on net that various employees at Anthropic (including some in leadership, and some who work on safety) didn’t refuse to take any equity in the company, despite the COIs at stake — though it will indeed depend on how much they actually end up donating, and to where. But beyond donation potential, I’m also giving weight to factors like freedom, security, flexibility in future career choices, ability to self-fund my own projects, trading-money-for-time/energy/attention, helping my family, maybe having/raising kids, option value in an uncertain world, etc. Some of these mix in impartially altruistic considerations in important ways, but just to be clear: I care about both altruistic and non-altruistic values; I give weight to both in my decision-making in general; and I am giving both weight here.
I’ll also note a different source of uncertainty for me — namely, what policy/norm would be best to promote here overall. This is a separate question from what *I* should do personally, but insofar as part of the value of e.g. refusing the equity would be to promote some particular policy/norm, it matters to me how good the relevant policy/norm is — and in some cases here, I’m not sure. I’ve put a few more comments on this in footnote.^[2]
Currently, my best-guess plan for balancing these factors is to accept the equity and the corresponding COI for now (at least assuming that I stay at Anthropic long enough for the equity to vest^[3]), but to keep thinking about it, learning more, and talking with colleagues and other friends/advisors as I actually dive into my role at Anthropic — and if I decide later that I should divest/give up the equity (or do something more complicated to mitigate this and other types of COI), to do that. This could be because my understanding of costs/benefits at stake in the current situation changes, or because the situation itself (e.g., my role/influence, or the AI situation more generally) changes.
1. ^
  Which isn’t to say that people will live up to this.
2. ^
  There’s one question whether it would be good (and suitably realistic) for *no* employees at Anthropic, or at any frontier AI company, to hold equity, and to be paid in cash instead (thus eliminating this source of COI in general). There’s another question whether, at the least, safety-focused employees in particular should be paid in cash, as your post here seems to suggest, while making sure that their overall *level* of compensation remains comparable to that of non-safety-focused employees. Then, in the absence of either of these policies, there’s a different question whether safety-focused employees should be paid substantially less than non-safety-focused employees — a policy which would then reduce the attractiveness of these roles relative to e.g. capabilities roles, especially for people who are somewhat interested in safety but who also care a lot about traditional financial incentives as well (I think many strong AI researchers may be in this category, and increasingly so as safety issues become more prominent). And then there’s a final question of whether, in the absence of any changes to how AI companies currently operate, there should be informal pressure/expectation on safety-focused-employees to voluntarily take very large pay cuts (equity is a large fraction of total comp) relative to non-safety-focused employees for the sake of avoiding COI (one could also distribute this pressure/expectation more evenly across all employees at AI companies — but the focus on safety evaluators in your post is more narrow).
3. ^
  And I’ll still have COI in the meantime due to the equity I’d get if I stayed long enough.