I’m going to try to make sure that my lifestyle and financial commitments continue to make me very financially comfortable both with leaving Anthropic, and with Anthropic’s equity (and also: the AI industry more broadly – I already hold various public AI-correlated stocks) losing value, but I recognize some ongoing risk of distorting incentives, here.
Why do you feel comfortable taking equity? It seems to me that one of the most basic precautions one ought ideally take when accepting a job like this (e.g. evaluating Claude’s character/constitution/spec), is to ensure you won’t personally stand to lose huge sums of money should your evaluation suggest further training or deployment is unsafe.
(You mention already holding AI-correlated stocks—I do also think it would be ideal if folks with influence over risk assessment at AGI companies divested from these generally, though I realize this is difficult given how entangled they are with the market as a whole. But I’d expect AGI company staff typically have much more influence over their own company’s value than that of others, so the COI seems much more extreme).
Speaking for myself as someone who works at Anthropic and holds equity: I think I just bite the bullet that this doesn’t affect my decisionmaking that much and the benefits of directing the resources from that equity to good ends are worth it.
(I did think somewhat seriously about finding a way to irrevocably commit all of my equity to donations, or to fully avoid taking possession of it, but mainly for the signaling benefits of there being an employee who was legibly not biased in this particular way in case that was useful when things got crazy; I don’t think it would have done much on the object level.)
Some reasons I think this is basically not a concern for me personally:
Deciding to pledge half my equity to 501(c)3 charities felt like a pretty easy decision; I now think it’s possible this was a mistake because the value of political giving may outweigh the tax advantages and donation match, but I don’t really remember my personal wealth being a driving factor there. And effects on Anthropic-as-a-whole have a way higher ratio of altruistic value to personal wealth than that!
Of course having donation-pledged dollars riding on Anthropic’s success is still a source of bias, but my own equity changes that very little, because my donation preferences are extremely correlated with vastly larger pools of equity from other employees; I already had 99% as much of an altruistic incentive for Anthropic to succeed commercially, and I think most people reading this comment are in a similar boat.
Empirically when I advocate internally for things that would be commercially costly to Anthropic I don’t notice this weighing on my decisionmaking basically at all, like I’m not sure I’ve literally ever thought about it in that setting?
If I were well-modeled as an actor whose equity value steered their actions in significant ways, I think I would be putting much more effort into tax optimization than I do now.
The epistemic distortions from one’s social and professional environment seem vastly larger to me. This isn’t directly an argument that the equity thing isn’t useful on the margin, but it just seems like a weird area of intervention when there’s so much lower hanging fruit. I think decisions like “live in berkeley or SF” have easily an order of magnitude more impact on a person’s orientation to these questions.
Others might vary a lot in how they orient to such things, though; I don’t claim this is universal.
“Empirically when I advocate internally for things that would be commercially costly to Anthropic I don’t notice this weighing on my decisionmaking basically at all, like I’m not sure I’ve literally ever thought about it in that setting?”
With respect, one of the dangers of being a flawed human is the fact that you aren’t aware of every factor that influences your decision making.
I’m not sure that a lack of consciously thinking about financial loss/gain is good empirical evidence that it isn’t affecting your choices.
Yep, I agree that’s a risk, and one that should seem fairly plausible to external readers. (This is why I included other bullet points besides that one.) I’m not sure I can offer something compelling over text that other readers will find convincing, but I do think I’m in a pretty epistemically justified state here even if I don’t think you should think that based on what you know of me.
And TBC, I’m not saying I’m unbiased! I think I am biased in a ton of ways—my social environment, possession of a stable high-status job, not wanting to say something accidentally wrong or hurting people’s feelings, inner ring dynamics of being in the know about things, etc are all ways I think my epistemics face pressure here—but I feel quite sure that “the value of my equity goes down if Anthropic is less commercially successful” contributes a tiny tiny fraction to that state of affairs. You’re well within your rights to not believe me, though.
This is a bit of a random-ass take, but, I think I care more about Joe not taking equity than you not taking equity, because I think Joe is more likely to be a person where it ends up important that he legibly have as little COI as possible (this is maybe making up a bunch of stuff about Joe’s future role in the world, but, it’s where my Joe headcanon is at).
From a pure signaling perspective (the ”legibly” part of ”legibly have as little COI as possible”) there’s also a counter consideration: if someone says that there’s danger, and calls for prioritizing safety, that might be even more credible if that’s going against their financial motivations.
I don’t think this matters much for company-external comms. There, I think it’s better to just be as legibly free of COIs as possible, because listeners struggle to tell what’s actually in the company’s best interests. (I might once have thought differently, but empirically ”they just say that superintelligence might cause extinction because that’s good for business” is a very common take.)
But for company-internal comms, I can imagine that someone would be more persuasive if they could say ”look, I know this isn’t good for your equity, it’s not good for mine either. we’re in the same boat. but we gotta do what’s right”.
Agreed—I do think the case for doing this for signaling reasons is stronger for Joe and I think it’s plausible he should have avoided this for that reason. I just don’t think it’s clear that it would be particularly helpful on the object level for his epistemics, which is what I took the parent comment to be saying.
I’ve made a legally binding pledge to allocate half of it to 501(c)3 charities, the maximum that my employer’s donation match covers; I expect to donate the majority of the remainder but have had no opportunities to liquidate any of it yet.
Thanks, that’s good to hear. What form does the pledge take? Do you have a DAF that contains half your shares? When do you think the next liquidation opportunity might be? (I guess you weren’t eligible for the one in May[1]?)
I’m disappointed that no one (EA-ish or otherwise) seems do have done anything interesting with that liquidation opportunity.
I’ve spent a lot of time this year on tax-and-donation planning, and helping colleagues with their plans. Some very substantial, largely still confidential, things have indeed been done, and I think they will pay off very nicely starting probably-next-year and scaling up over time.
The details are complicated, vary a lot person-to-person, and I’m not sure which are OK to share publicly; the TLDR is that relatively early employees have a 3:1 match on up to 50% of their equity, and later employees a 1:1 match on up to 25%.
I believe that many people eligible for earlier liquidation opportunities used the proceeds from said liquidation to exercise additional stock options, because various tax considerations mean that doing so ends up being extremely leveraged for one’s donation potential in the future (at least if one expects the value of said options to increase over time); I expect that most people into doing interesting impact-maximizing things with their money took this route, which doesn’t produce much in the way of observable consequences right now.
Interesting. I really hope that some of them do something, soon. Time is fast running out. There’s no point being a rich philanthropist (or rich, or a philanthropist) if the world gets destroyed before you deploy your resources.
Sure, but humanity currently has so little ability to measure or mitigate AI risk that I doubt it will be obvious in any given case that the survival of the human race is at stake, or that any given action would help. And I think even honorable humans tend to be vulnerable to rationalization amidst such ambiguity, which (as I model it) is why society generally prefers that people in positions of substantial power not have extreme conflicts of interest.
In a previous discussion about this, an argument mentioned was “having all your friends and colleagues believe in a thing is probably more epistemically compromising than the equity.”
Which seems maybe true. But, I update in the other direction of “you shouldn’t take equity, and, also, you should have some explicit plan for dealing with the biases of ’the people I spend the most time with think this,
(This also applies to AI pessimists to be clear, but I think it’s reasonable to hold people extra accountable about it when they’re working at a company who’s product has double-digit-odds of destroying the world)
Yeah, certainly there are other possible forms of bias besides financial conflicts of interest; as you say, I think it’s worth trying to avoid those too.
Hey Adam — thanks for this. I wrote about this kind of COI in the post, but your comment was a good nudge to think more seriously about my take here.
Basically, I care here about protecting two sorts of values. On the one hand, I do think the sort of COI you’re talking about is real. That is, insofar as people at AI companies who have influence over trade-offs the company makes between safety and commercial success hold equity, deciding in favor of safety will cause them to lose money — and potentially, for high-stakes decisions like dropping out of the race, a lot of money. This is true of people in safety-focused roles, but it’s true of other kinds of employees as well — and of course, especially true of leadership, who have both an outsized amount of equity and an outsized amount of influence. This sort of COI can be a source of epistemic bias (e.g. in safety evaluations of the type you’re focused on), but it can also just be a more straightforward misalignment where e.g. what’s best by the lights of an equity-holder might not be best for the world. I really don’t want my decision-making as an Anthropic employee to end up increasing existential risk from AI because of factors like this. And indeed, given that Anthropic’s stated mission is (roughly) to do what’s best for the world re: AI, in some sense it’s in the job description of every employee to make sure this doesn’t happen.[1] And just refusing to hold equity would indeed go far on this front (though: you can also get similar biases without equity — e.g., maybe you don’t want to put your cash salary at risk by making waves, pissing people off, etc). And even setting aside the reality of a given level of bias/misalignment, there can be additional benefits to it being legible to the world that this kind of bias/misalignment isn’t present (though I am currently much more concerned about the reality of the bias/misalignment at stake).
On the other hand: the amount of money at stake is enough that I don’t turn it down casually. This is partly due to donation potential. Indeed, my current guess is that (depending ofc on values and other views) many EA-ish folks should be glad on net that various employees at Anthropic (including some in leadership, and some who work on safety) didn’t refuse to take any equity in the company, despite the COIs at stake — though it will indeed depend on how much they actually end up donating, and to where. But beyond donation potential, I’m also giving weight to factors like freedom, security, flexibility in future career choices, ability to self-fund my own projects, trading-money-for-time/energy/attention, helping my family, maybe having/raising kids, option value in an uncertain world, etc. Some of these mix in impartially altruistic considerations in important ways, but just to be clear: I care about both altruistic and non-altruistic values; I give weight to both in my decision-making in general; and I am giving both weight here.
I’ll also note a different source of uncertainty for me — namely, what policy/norm would be best to promote here overall. This is a separate question from what *I* should do personally, but insofar as part of the value of e.g. refusing the equity would be to promote some particular policy/norm, it matters to me how good the relevant policy/norm is — and in some cases here, I’m not sure. I’ve put a few more comments on this in footnote.[2]
Currently, my best-guess plan for balancing these factors is to accept the equity and the corresponding COI for now (at least assuming that I stay at Anthropic long enough for the equity to vest[3]), but to keep thinking about it, learning more, and talking with colleagues and other friends/advisors as I actually dive into my role at Anthropic — and if I decide later that I should divest/give up the equity (or do something more complicated to mitigate this and other types of COI), to do that. This could be because my understanding of costs/benefits at stake in the current situation changes, or because the situation itself (e.g., my role/influence, or the AI situation more generally) changes.
There’s one question whether it would be good (and suitably realistic) for *no* employees at Anthropic, or at any frontier AI company, to hold equity, and to be paid in cash instead (thus eliminating this source of COI in general). There’s another question whether, at the least, safety-focused employees in particular should be paid in cash, as your post here seems to suggest, while making sure that their overall *level* of compensation remains comparable to that of non-safety-focused employees. Then, in the absence of either of these policies, there’s a different question whether safety-focused employees should be paid substantially less than non-safety-focused employees — a policy which would then reduce the attractiveness of these roles relative to e.g. capabilities roles, especially for people who are somewhat interested in safety but who also care a lot about traditional financial incentives as well (I think many strong AI researchers may be in this category, and increasingly so as safety issues become more prominent). And then there’s a final question of whether, in the absence of any changes to how AI companies currently operate, there should be informal pressure/expectation on safety-focused-employees to voluntarily take very large pay cuts (equity is a large fraction of total comp) relative to non-safety-focused employees for the sake of avoiding COI (one could also distribute this pressure/expectation more evenly across all employees at AI companies — but the focus on safety evaluators in your post is more narrow).
Why do you feel comfortable taking equity? It seems to me that one of the most basic precautions one ought ideally take when accepting a job like this (e.g. evaluating Claude’s character/constitution/spec), is to ensure you won’t personally stand to lose huge sums of money should your evaluation suggest further training or deployment is unsafe.
(You mention already holding AI-correlated stocks—I do also think it would be ideal if folks with influence over risk assessment at AGI companies divested from these generally, though I realize this is difficult given how entangled they are with the market as a whole. But I’d expect AGI company staff typically have much more influence over their own company’s value than that of others, so the COI seems much more extreme).
Speaking for myself as someone who works at Anthropic and holds equity: I think I just bite the bullet that this doesn’t affect my decisionmaking that much and the benefits of directing the resources from that equity to good ends are worth it.
(I did think somewhat seriously about finding a way to irrevocably commit all of my equity to donations, or to fully avoid taking possession of it, but mainly for the signaling benefits of there being an employee who was legibly not biased in this particular way in case that was useful when things got crazy; I don’t think it would have done much on the object level.)
Some reasons I think this is basically not a concern for me personally:
Deciding to pledge half my equity to 501(c)3 charities felt like a pretty easy decision; I now think it’s possible this was a mistake because the value of political giving may outweigh the tax advantages and donation match, but I don’t really remember my personal wealth being a driving factor there. And effects on Anthropic-as-a-whole have a way higher ratio of altruistic value to personal wealth than that!
Of course having donation-pledged dollars riding on Anthropic’s success is still a source of bias, but my own equity changes that very little, because my donation preferences are extremely correlated with vastly larger pools of equity from other employees; I already had 99% as much of an altruistic incentive for Anthropic to succeed commercially, and I think most people reading this comment are in a similar boat.
Empirically when I advocate internally for things that would be commercially costly to Anthropic I don’t notice this weighing on my decisionmaking basically at all, like I’m not sure I’ve literally ever thought about it in that setting?
If I were well-modeled as an actor whose equity value steered their actions in significant ways, I think I would be putting much more effort into tax optimization than I do now.
The epistemic distortions from one’s social and professional environment seem vastly larger to me. This isn’t directly an argument that the equity thing isn’t useful on the margin, but it just seems like a weird area of intervention when there’s so much lower hanging fruit. I think decisions like “live in berkeley or SF” have easily an order of magnitude more impact on a person’s orientation to these questions.
Others might vary a lot in how they orient to such things, though; I don’t claim this is universal.
“Empirically when I advocate internally for things that would be commercially costly to Anthropic I don’t notice this weighing on my decisionmaking basically at all, like I’m not sure I’ve literally ever thought about it in that setting?”
With respect, one of the dangers of being a flawed human is the fact that you aren’t aware of every factor that influences your decision making.
I’m not sure that a lack of consciously thinking about financial loss/gain is good empirical evidence that it isn’t affecting your choices.
Yep, I agree that’s a risk, and one that should seem fairly plausible to external readers. (This is why I included other bullet points besides that one.) I’m not sure I can offer something compelling over text that other readers will find convincing, but I do think I’m in a pretty epistemically justified state here even if I don’t think you should think that based on what you know of me.
And TBC, I’m not saying I’m unbiased! I think I am biased in a ton of ways—my social environment, possession of a stable high-status job, not wanting to say something accidentally wrong or hurting people’s feelings, inner ring dynamics of being in the know about things, etc are all ways I think my epistemics face pressure here—but I feel quite sure that “the value of my equity goes down if Anthropic is less commercially successful” contributes a tiny tiny fraction to that state of affairs. You’re well within your rights to not believe me, though.
This is a bit of a random-ass take, but, I think I care more about Joe not taking equity than you not taking equity, because I think Joe is more likely to be a person where it ends up important that he legibly have as little COI as possible (this is maybe making up a bunch of stuff about Joe’s future role in the world, but, it’s where my Joe headcanon is at).
From a pure signaling perspective (the ”legibly” part of ”legibly have as little COI as possible”) there’s also a counter consideration: if someone says that there’s danger, and calls for prioritizing safety, that might be even more credible if that’s going against their financial motivations.
I don’t think this matters much for company-external comms. There, I think it’s better to just be as legibly free of COIs as possible, because listeners struggle to tell what’s actually in the company’s best interests. (I might once have thought differently, but empirically ”they just say that superintelligence might cause extinction because that’s good for business” is a very common take.)
But for company-internal comms, I can imagine that someone would be more persuasive if they could say ”look, I know this isn’t good for your equity, it’s not good for mine either. we’re in the same boat. but we gotta do what’s right”.
Agreed—I do think the case for doing this for signaling reasons is stronger for Joe and I think it’s plausible he should have avoided this for that reason. I just don’t think it’s clear that it would be particularly helpful on the object level for his epistemics, which is what I took the parent comment to be saying.
Have you donated any of your equity yet? If not, why not?
I’ve made a legally binding pledge to allocate half of it to 501(c)3 charities, the maximum that my employer’s donation match covers; I expect to donate the majority of the remainder but have had no opportunities to liquidate any of it yet.
Thanks, that’s good to hear. What form does the pledge take? Do you have a DAF that contains half your shares? When do you think the next liquidation opportunity might be? (I guess you weren’t eligible for the one in May[1]?)
I’m disappointed that no one (EA-ish or otherwise) seems do have done anything interesting with that liquidation opportunity.
I’ve spent a lot of time this year on tax-and-donation planning, and helping colleagues with their plans. Some very substantial, largely still confidential, things have indeed been done, and I think they will pay off very nicely starting probably-next-year and scaling up over time.
Good to hear. Look forward to seeing the results!
The details are complicated, vary a lot person-to-person, and I’m not sure which are OK to share publicly; the TLDR is that relatively early employees have a 3:1 match on up to 50% of their equity, and later employees a 1:1 match on up to 25%.
I believe that many people eligible for earlier liquidation opportunities used the proceeds from said liquidation to exercise additional stock options, because various tax considerations mean that doing so ends up being extremely leveraged for one’s donation potential in the future (at least if one expects the value of said options to increase over time); I expect that most people into doing interesting impact-maximizing things with their money took this route, which doesn’t produce much in the way of observable consequences right now.
Interesting. I really hope that some of them do something, soon. Time is fast running out. There’s no point being a rich philanthropist (or rich, or a philanthropist) if the world gets destroyed before you deploy your resources.
(I say this as someone who has already put a lot of their money where their mouth is.)
Feels like something has gone wrong way before when one cares more about money than survival of the human race.
If a man’s judgement is really swayable by equity one cant stop to wonder whether he is the right man for the job in the first place.
Sure, but humanity currently has so little ability to measure or mitigate AI risk that I doubt it will be obvious in any given case that the survival of the human race is at stake, or that any given action would help. And I think even honorable humans tend to be vulnerable to rationalization amidst such ambiguity, which (as I model it) is why society generally prefers that people in positions of substantial power not have extreme conflicts of interest.
In a previous discussion about this, an argument mentioned was “having all your friends and colleagues believe in a thing is probably more epistemically compromising than the equity.”
Which seems maybe true. But, I update in the other direction of “you shouldn’t take equity, and, also, you should have some explicit plan for dealing with the biases of ’the people I spend the most time with think this,
(This also applies to AI pessimists to be clear, but I think it’s reasonable to hold people extra accountable about it when they’re working at a company who’s product has double-digit-odds of destroying the world)
Yeah, certainly there are other possible forms of bias besides financial conflicts of interest; as you say, I think it’s worth trying to avoid those too.
Hey Adam — thanks for this. I wrote about this kind of COI in the post, but your comment was a good nudge to think more seriously about my take here.
Basically, I care here about protecting two sorts of values. On the one hand, I do think the sort of COI you’re talking about is real. That is, insofar as people at AI companies who have influence over trade-offs the company makes between safety and commercial success hold equity, deciding in favor of safety will cause them to lose money — and potentially, for high-stakes decisions like dropping out of the race, a lot of money. This is true of people in safety-focused roles, but it’s true of other kinds of employees as well — and of course, especially true of leadership, who have both an outsized amount of equity and an outsized amount of influence. This sort of COI can be a source of epistemic bias (e.g. in safety evaluations of the type you’re focused on), but it can also just be a more straightforward misalignment where e.g. what’s best by the lights of an equity-holder might not be best for the world. I really don’t want my decision-making as an Anthropic employee to end up increasing existential risk from AI because of factors like this. And indeed, given that Anthropic’s stated mission is (roughly) to do what’s best for the world re: AI, in some sense it’s in the job description of every employee to make sure this doesn’t happen.[1] And just refusing to hold equity would indeed go far on this front (though: you can also get similar biases without equity — e.g., maybe you don’t want to put your cash salary at risk by making waves, pissing people off, etc). And even setting aside the reality of a given level of bias/misalignment, there can be additional benefits to it being legible to the world that this kind of bias/misalignment isn’t present (though I am currently much more concerned about the reality of the bias/misalignment at stake).
On the other hand: the amount of money at stake is enough that I don’t turn it down casually. This is partly due to donation potential. Indeed, my current guess is that (depending ofc on values and other views) many EA-ish folks should be glad on net that various employees at Anthropic (including some in leadership, and some who work on safety) didn’t refuse to take any equity in the company, despite the COIs at stake — though it will indeed depend on how much they actually end up donating, and to where. But beyond donation potential, I’m also giving weight to factors like freedom, security, flexibility in future career choices, ability to self-fund my own projects, trading-money-for-time/energy/attention, helping my family, maybe having/raising kids, option value in an uncertain world, etc. Some of these mix in impartially altruistic considerations in important ways, but just to be clear: I care about both altruistic and non-altruistic values; I give weight to both in my decision-making in general; and I am giving both weight here.
I’ll also note a different source of uncertainty for me — namely, what policy/norm would be best to promote here overall. This is a separate question from what *I* should do personally, but insofar as part of the value of e.g. refusing the equity would be to promote some particular policy/norm, it matters to me how good the relevant policy/norm is — and in some cases here, I’m not sure. I’ve put a few more comments on this in footnote.[2]
Currently, my best-guess plan for balancing these factors is to accept the equity and the corresponding COI for now (at least assuming that I stay at Anthropic long enough for the equity to vest[3]), but to keep thinking about it, learning more, and talking with colleagues and other friends/advisors as I actually dive into my role at Anthropic — and if I decide later that I should divest/give up the equity (or do something more complicated to mitigate this and other types of COI), to do that. This could be because my understanding of costs/benefits at stake in the current situation changes, or because the situation itself (e.g., my role/influence, or the AI situation more generally) changes.
Which isn’t to say that people will live up to this.
There’s one question whether it would be good (and suitably realistic) for *no* employees at Anthropic, or at any frontier AI company, to hold equity, and to be paid in cash instead (thus eliminating this source of COI in general). There’s another question whether, at the least, safety-focused employees in particular should be paid in cash, as your post here seems to suggest, while making sure that their overall *level* of compensation remains comparable to that of non-safety-focused employees. Then, in the absence of either of these policies, there’s a different question whether safety-focused employees should be paid substantially less than non-safety-focused employees — a policy which would then reduce the attractiveness of these roles relative to e.g. capabilities roles, especially for people who are somewhat interested in safety but who also care a lot about traditional financial incentives as well (I think many strong AI researchers may be in this category, and increasingly so as safety issues become more prominent). And then there’s a final question of whether, in the absence of any changes to how AI companies currently operate, there should be informal pressure/expectation on safety-focused-employees to voluntarily take very large pay cuts (equity is a large fraction of total comp) relative to non-safety-focused employees for the sake of avoiding COI (one could also distribute this pressure/expectation more evenly across all employees at AI companies — but the focus on safety evaluators in your post is more narrow).
And I’ll still have COI in the meantime due to the equity I’d get if I stayed long enough.