I passed up an invitation to invest in Anthropic in the initial round which valued it at $1B (it’s now planning a round at $170B valuation), to avoid contributing to x-risk. (I didn’t want to signal that starting another AI lab was a good idea from a x-safety perspective, or that I thought Anthropic’s key people were likely to be careful enough about AI safety. Anthropic had invited a number of rationalist/EA people to invest, apparently to gain such implicit endorsements.)
This idea/plan seems to legitimize giving founders and early investors of AGI companies extra influence on or ownership of the universe (or just extremely high financial returns, if they were to voluntarily sell some shares to the public as envisioned here), which is hard for me to stomach from a fairness or incentives perspective, given that I think such people made negative contributions to our civilizational trajectory by increasing x-risk.
I suspect that others will have other reasons (from other political or ethical perspectives) to object to granting or legitimizing a huge windfall to this small group of people, and it seems amiss that the post/paper is silent on the topic.
My dilemma is also a real-world counter-example/analogy for Strategy Stealing: if a misaligned AI does something that’s unethical from my perspective to benefit itself, how am I supposed to copy its strategy?
An alternative hypothesis I considered was that Anthropic was looking for oversight from or accountability to x-safety-conscious people, but IIRC the investment was structured through an SPV and people like me would have no voting rights, which would instead be held by the SPV’s manager (who was not known as someone very concerned about AI x-safety). Their explanation was that this is common in tech startups, which I believe is technically correct, but obviously did nothing to make me less worried about their safety/governance views given that alternatives like pass-through voting are also available and sometimes used.
I always thought it was totally crazy for people to lump Nick Bostrom and Marc Andreessen together into TESCREAL and criticize them in the same breath, but this post plays right into such criticism. (This is one of the “other political or ethical perspectives” I alluded to.) Maybe it is still wrong or unfair, but given the apparent alignment between the OP’s position and Andreessen’s interests, I would have upgraded such criticism from “totally crazy” to “worth addressing”. (I’m also forced to mentally assign some credit to such critics for apparently recognizing or predicting such alignment, that I’m personally surprised by, and which now undeniably exists at least at a surface level.)
I always thought it was totally crazy for people to lump Nick Bostrom and Marc Andreessen together into TESCREAL and criticize them in the same breath, but this post plays right into such criticism.
it seems fine to invest and then publicly state your views, including that it should not be interpreted as an endorsement. your investment (and that of other people who decide similarly) is trivial in size compared to the other sources of funding, such that it’s not counterfactual. you’re not going to cause the founders of anthropic to get any less of a windfall. the decision process for the vast majority of possible investors does not take into account whether or not you invested.
i think you’ve already sufficiently signaled your genuineness, for all practical purposes. i don’t think it’s healthy to have a purity spiral.
There are like 4 reasons why I think this logic doesn’t check out:
Now there are a lot of investors interested, but early investors are much more counterfactual and make a substantial difference
Most of Anthropic’s early talent worked there because it seemed to be endorsed by safety people, and so that endorsement is the basis of a very large fraction of Anthropic’s valuation, and marginally more investment from safety people would have caused more of this
I don’t think you could have just publicly stated that Anthropic was bad for the world and then invest anyways. My model of how these situations work is that saying bad things about an organization as an investor does just cause you to be excluded at the very least in future funding rounds, and you are generally asked implicitly to not say anything bad about the organization.
Being an investor in a leading lab like this is a huge moral hazard to yourself. Saying bad things about the organization or lobbying for regulation that would hurt Anthropic’s valuation now comes at huge financial damage to yourself, and you are also exposing yourself to a social context where people will target you specifically with large amounts of pressure and attempts at manipulating you into being on Anthropic’s side.
I don’t think it’s impossible to work these out, and think there is at least one case of an investor in Anthropic and other capability companies where I think it is plausible they made the right choice in doing so, but the vast majority of people didn’t do anything to counteract the issues above and did indeed just end up causing harm this way.
Do you have any older comment indicating proof of this? (That the actual reason you turned it down was x-risk and not, let’s say, because you thought the investment was not rewarding enough.) Seems very important to me if true, and will cause me take your claims more seriously in general in future.
I think this 2023 comment is the earliest instance of me talking about turning down investing in Anthropic due to x-risk. If you’re wondering why I didn’t talk about it even earlier, it’s because I formed my impression of Dario Amodei’s safety views from a private Google Doc of his (The Big Blob of Compute, which he has subsequently talked about in various public interviews), and it seemed like bad etiquette to then discuss those views in public. By 2023 I felt like it was ok to talk about since the document had become a historical curiosity and there was plenty of public info available about Anthropic’s safety views from other sources. But IIRC, “The Big Blob of Compute” was one of the main triggers for me writing Why is so much discussion happening in private Google Docs? in 2019.
I have done a lot of thinking about punishment for systemically harmful actors. In general, I have landed on the principle that justice is about prevention of future harm more than exacting vengeance and some kind of “eye for an eye” justice. As satisfying as it seems, most of history is fairly bleak on the prospects of using executions and other forms of violent punishment to deter future people from endangering society. This is quite difficult to stomach, however, in the face of people who are seemingly recklessly leading us in a dance on the edge of a volcano. I also don’t really buy the whole “give the universe to Sam Altman/POTUS and then hope he leaves everyone else some scraps” model of universal governance.
I think, in light of this, that the open investment model could work, on two conditions:
A) Regulatory intervention happens to ensure that most of the investment is reinvested in the company’s safety R&D efforts rather than to enrich its owners e.g. with stock buybacks. There is precedent for this, Amazon famously reinvested lots of money into improving its infrastructure to the point of making a loss for decades.
B) The ownership shares of existing shareholders are massively diluted or redistributed to prevent concentration of voting rights in a few early stakeholders.
If these companies are as critical to humanity’s future as we say they are, we should start acting like it.
This idea/plan seems to legitimize giving founders and early investors of AGI companies extra influence on or ownership of the universe (or just extremely high financial returns, if they were to voluntarily sell some shares to the public as envisioned here), which is hard for me to stomach from a fairness or incentives perspective, given that I think such people made negative contributions to our civilizational trajectory by increasing x-risk.
One question is whether a different standard should be applied in this case than elsewhere in our capitalist economy (where, generally, the link between financial rewards and positive or negative contributions to xrisk reduction is quite tenuous). One could argue that this is the cooperative system we have in place, and that there should be a presumption against retroactively confiscating people who invested their time or money on the basis of the existing rules. (Adjusting levels of moral praise in light of differing estimations of the nature of somebody’s actions or intentions may be a more appropriate place for this type of consideration to feed in. Though it’s perhaps also worth noting that the prevailing cultural norms at the time, and still today, seem to favor contributing to the development more advanced AI technologies.)
Furthermore, it would be consistent with the OGI model for governments (particularly the host government) to take some actions to equalize or otherwise adjust outcomes. For example, many countries, including the U.S., have a progressive taxation system, and one could imaging adding some higher tax brackets beyond those that currently exist—such as an extra +10% marginal tax rate for incomes or capital gains exceeding 1 trillion dollars, or exceeding 1% of GDP, or whatever. (In the extreme, if taxation rates began approaching 100%, this would become confiscatory and would be incompatible with the OGI model; but there is plenty of room below that for society to choose some level of redistribution.)
I’m unsure whether a different standard is needed. Foom Liability, and other such proposals, may be enough.
For those who haven’t read the post, a bit of context. AGI companies may create huge negative externalities. We fine/sue folks for doing so in other cases. So we can set up some sort of liability. In this case, we might expect a truly huge liability in plausible worlds where we get near misses from doom. Which may be more than AGI companies can afford. When entities plausibly need to pay out more than they can afford, like in health, we may require they get insurance.
What liability ahead of time would result in good incentives to avoid foom doom? Hanson suggests:
Thus I suggest that we consider imposing extra liability for certain AI-mediated harms, make that liability strict, and add punitive damages according to the formulas D= (M+H)*F^N. Here D is the damages owed, H is the harm suffered by victims, M>0,F>1 are free parameters of this policy, and N is how many of the following eight conditions contributed to causing harm in this case: self-improving, agentic, wide scope of tasks, intentional deception, negligent owner monitoring, values changing greatly, fighting its owners for self-control, and stealing non-owner property.
If we could agree that some sort of cautious policy like this seems prudent, then we could just argue over the particular values of M,F.
The proposal could work even if countries were to only buy stocks of publicly traded companies in highly efficient secondary markets (and exclude IPOs and secondary public offerings), so that we do not affect the stock price or how much capital the company has at hand and thus doesn’t speed up AI progress.
Microsoft, Google, Amazon, Nvidia have quite a bit of exposure to Anthropic, DeepMind, OpenAI, and xAI.
At the same time, the unfairness of early frontier lab founders getting rich seems to me like a very acceptable downside given that open investment could solve a lot of issues and the bleakness of many other paths forward.
Couldn’t we just… set up a financial agreement where the first N employees don’t own stock and have a set salary?
My main concern is that they’ll have enough power to be functionally wealthy all-the-same, or be able to get it via other means (e.g. Altman with his side hardware investment / company).
Couldn’t we just… set up a financial agreement where the first N employees don’t own stock and have a set salary?
Maybe, could be nice… But since the first N employees usually get to sign off on major decisions, why would they go along with such an agreement? Or are you suggesting governments should convene to force this sort of arrangement on them?
My main concern is that they’ll have enough power to be functionally wealthy all-the-same, or be able to get it via other means (e.g. Altman with his side hardware investment / company).
I’m not sure I understand this part actually, could you elaborate? Is this your concern with the OGI model or with your salary-only for first-N employees idea?
But since the first N employees usually get to sign off on major decisions, why would they go along with such an agreement?
I’m imagining a world where a group of people step forward to take a lot of responsibility for navigating humanity through this treacherous transition, and do not want themselves to be corrupted by financial incentives (and wish to accurately signal this to the external world). I’ll point out that this is not unheard of, Altman literally took no equity in OpenAI (though IMO was eventually corrupted by the power nonetheless).
To help with the incentives and coordination, instead having the first frontier AI megaowner step forward and unconditionally relinquish some of their power, they could sign on to a conditional contract to do so. It would only activate if other megaowners did the same.
I’ll point out that this is not unheard of, Altman literally took no equity in OpenAI (though IMO was eventually corrupted by the power nonetheless).
He may have been corrupted later by power later. Alternatively, he may have been playing the long game, knowing that he would have that power eventually even if he took no equity.
I’m not sure I understand this part actually, could you elaborate? Is this your concern with the OGI model or with your salary-only for first-N employees idea?
I think that if you have a knack for ordinary software development, one application of that is to work at a tech company whose product already has or eventually obtains widespread adoption. This provides you with a platform where there is a straightforward path towards helping improve the lives of hundreds of millions of people worldwide by a small amount. Claude has around ~20-50 million monthly active users, and for most users it appears to be beneficial overall, so I believe that this criterion is met by Anthropic.
If you capture a small fraction of the value that you generate as a competent member of a reasonably effective team, then that often leads to substantial financial returns, and I think this is fair since the skillset and focus required to successfully plan and execute on such projects is quite rare. The bar for technical hires at a frontier lab is highly competitive, which commands equally competitive compensation in a market economy. You almost certainly had to clear a relatively higher bar (though one with less legible criteria) to be invited as an early investor. Capital appreciation is the standard reward for backing the production of a reliable and valuable service that others depend upon.
If you buy into the opportunity in AGI deployment, even the lower bounds of mundane utility can be one of the most leveraged ways to do good in the world. Given the dangers of ASI development, improvements to the safety and alignment of AGI systems can prevent profound harm, and the importance of this cannot be understated. Even in the counterfactual scenario where Anthropic was never founded, the urgency of such work would still be critical. There is some established precedent for handling a profitable industry with negative externalities (tobacco, petroleum, advertising) and it would be consistent to include the semiconductor industry in this category. I agree that existing frameworks are insufficient for making reasonable decisions about catastrophic risks. These worries have shaped my career working in AI safety, and a majority of the people here share your concerns.
However, I’m uncertain whether vilifying any small group of people would be the right move to achieve the strategic goals of the AI safety community. For example, Igor Babushchkin’s recent transition from xAI to Babuschkin Ventures could have been complicated by an attitude of hostility towards the founders and early investors of AGI companies. Since nuanced communication doesn’t work at scale, adopting this as our public position might inadvertently increase the likelihood of pivotal acts being committed by rogue threat actors, with inevitable media backlash identifying rationalist/EA people as culpable for “publishing radicalizing material”. But taken seriously, that would be a fully general argument against the distribution of online material warning of existential risks from advanced AI, and being dumb enough to be vulnerable to making that sort of error tends to exclude you from being in positions where your failures can cause any real damage, so I think my real contention with such objections is not on strategy, but on principle.
I’d be much more comfortable with accountability falling to the level of the faceless corporate entity rather than on individual members of the organization, because even senior employees with a lot of influence on paper might have limited agency in carrying out the demands of their role, and I think it would be best to follow the convention set by criticism such as Anthropic is Quietly Backpedalling on its Safety Commitments and ryan_greenblatt’s Shortform which doesn’t single out executives or researchers as responsible for the behavior of the system as a whole.
I have made exceptions to this rule in the past, but it’s almost always degraded the quality of the discussion. When asked about my opinion on this essay Dario Amodei — The Urgency of Interpretability at an AI Safety Social, I said that I thought it was hypocritical since a layoff at Anthropic UK had affected the three staff comprising their entire London interpretability team, which contradicts the top-level takeaway that labs should invest in interpretability efforts since if that was what was happening then you’d ideally be growing headcount on those teams instead of letting people go. But it’s entirely possible Dario had no knowledge of this when writing the article, or that the hiring budget was reallocated to the U.S branch of the interp team, or even that offering relocation to other positions at the company wasn’t practical for boring-and-complex H.R/accounting reasons. It doesn’t seem like the pace of great interpretability research coming out of Anthropic has slowed down, so they’re clearly still invested in it as a company. My hypothesis is that the extremely high financial returns are more of a side effect of operating at that caliber of performance instead of serving as a primary motivator for talent. If they didn’t get rich from Anthropic, they’d get rich at a hedge fund or startup. The stacks of cash are not the issue here. The ambiguous future of the lightcone is.
It’s possible that investors might be more driven by money, but I have less experience talking to them or watching how they work behind the scenes so I can’t claim to know much about what makes them tick.
I passed up an invitation to invest in Anthropic in the initial round which valued it at $1B (it’s now planning a round at $170B valuation), to avoid contributing to x-risk. (I didn’t want to signal that starting another AI lab was a good idea from a x-safety perspective, or that I thought Anthropic’s key people were likely to be careful enough about AI safety. Anthropic had invited a number of rationalist/EA people to invest, apparently to gain such implicit endorsements.)
This idea/plan seems to legitimize giving founders and early investors of AGI companies extra influence on or ownership of the universe (or just extremely high financial returns, if they were to voluntarily sell some shares to the public as envisioned here), which is hard for me to stomach from a fairness or incentives perspective, given that I think such people made negative contributions to our civilizational trajectory by increasing x-risk.
I suspect that others will have other reasons (from other political or ethical perspectives) to object to granting or legitimizing a huge windfall to this small group of people, and it seems amiss that the post/paper is silent on the topic.
A few more related thoughts:
My dilemma is also a real-world counter-example/analogy for Strategy Stealing: if a misaligned AI does something that’s unethical from my perspective to benefit itself, how am I supposed to copy its strategy?
An alternative hypothesis I considered was that Anthropic was looking for oversight from or accountability to x-safety-conscious people, but IIRC the investment was structured through an SPV and people like me would have no voting rights, which would instead be held by the SPV’s manager (who was not known as someone very concerned about AI x-safety). Their explanation was that this is common in tech startups, which I believe is technically correct, but obviously did nothing to make me less worried about their safety/governance views given that alternatives like pass-through voting are also available and sometimes used.
I always thought it was totally crazy for people to lump Nick Bostrom and Marc Andreessen together into TESCREAL and criticize them in the same breath, but this post plays right into such criticism. (This is one of the “other political or ethical perspectives” I alluded to.) Maybe it is still wrong or unfair, but given the apparent alignment between the OP’s position and Andreessen’s interests, I would have upgraded such criticism from “totally crazy” to “worth addressing”. (I’m also forced to mentally assign some credit to such critics for apparently recognizing or predicting such alignment, that I’m personally surprised by, and which now undeniably exists at least at a surface level.)
I’m also bald...
it seems fine to invest and then publicly state your views, including that it should not be interpreted as an endorsement. your investment (and that of other people who decide similarly) is trivial in size compared to the other sources of funding, such that it’s not counterfactual. you’re not going to cause the founders of anthropic to get any less of a windfall. the decision process for the vast majority of possible investors does not take into account whether or not you invested.
i think you’ve already sufficiently signaled your genuineness, for all practical purposes. i don’t think it’s healthy to have a purity spiral.
There are like 4 reasons why I think this logic doesn’t check out:
Now there are a lot of investors interested, but early investors are much more counterfactual and make a substantial difference
Most of Anthropic’s early talent worked there because it seemed to be endorsed by safety people, and so that endorsement is the basis of a very large fraction of Anthropic’s valuation, and marginally more investment from safety people would have caused more of this
I don’t think you could have just publicly stated that Anthropic was bad for the world and then invest anyways. My model of how these situations work is that saying bad things about an organization as an investor does just cause you to be excluded at the very least in future funding rounds, and you are generally asked implicitly to not say anything bad about the organization.
Being an investor in a leading lab like this is a huge moral hazard to yourself. Saying bad things about the organization or lobbying for regulation that would hurt Anthropic’s valuation now comes at huge financial damage to yourself, and you are also exposing yourself to a social context where people will target you specifically with large amounts of pressure and attempts at manipulating you into being on Anthropic’s side.
I don’t think it’s impossible to work these out, and think there is at least one case of an investor in Anthropic and other capability companies where I think it is plausible they made the right choice in doing so, but the vast majority of people didn’t do anything to counteract the issues above and did indeed just end up causing harm this way.
Do you have any older comment indicating proof of this? (That the actual reason you turned it down was x-risk and not, let’s say, because you thought the investment was not rewarding enough.) Seems very important to me if true, and will cause me take your claims more seriously in general in future.
I think this 2023 comment is the earliest instance of me talking about turning down investing in Anthropic due to x-risk. If you’re wondering why I didn’t talk about it even earlier, it’s because I formed my impression of Dario Amodei’s safety views from a private Google Doc of his (The Big Blob of Compute, which he has subsequently talked about in various public interviews), and it seemed like bad etiquette to then discuss those views in public. By 2023 I felt like it was ok to talk about since the document had become a historical curiosity and there was plenty of public info available about Anthropic’s safety views from other sources. But IIRC, “The Big Blob of Compute” was one of the main triggers for me writing Why is so much discussion happening in private Google Docs? in 2019.
I have done a lot of thinking about punishment for systemically harmful actors. In general, I have landed on the principle that justice is about prevention of future harm more than exacting vengeance and some kind of “eye for an eye” justice. As satisfying as it seems, most of history is fairly bleak on the prospects of using executions and other forms of violent punishment to deter future people from endangering society. This is quite difficult to stomach, however, in the face of people who are seemingly recklessly leading us in a dance on the edge of a volcano. I also don’t really buy the whole “give the universe to Sam Altman/POTUS and then hope he leaves everyone else some scraps” model of universal governance.
I think, in light of this, that the open investment model could work, on two conditions:
A) Regulatory intervention happens to ensure that most of the investment is reinvested in the company’s safety R&D efforts rather than to enrich its owners e.g. with stock buybacks. There is precedent for this, Amazon famously reinvested lots of money into improving its infrastructure to the point of making a loss for decades.
B) The ownership shares of existing shareholders are massively diluted or redistributed to prevent concentration of voting rights in a few early stakeholders.
If these companies are as critical to humanity’s future as we say they are, we should start acting like it.
One question is whether a different standard should be applied in this case than elsewhere in our capitalist economy (where, generally, the link between financial rewards and positive or negative contributions to xrisk reduction is quite tenuous). One could argue that this is the cooperative system we have in place, and that there should be a presumption against retroactively confiscating people who invested their time or money on the basis of the existing rules. (Adjusting levels of moral praise in light of differing estimations of the nature of somebody’s actions or intentions may be a more appropriate place for this type of consideration to feed in. Though it’s perhaps also worth noting that the prevailing cultural norms at the time, and still today, seem to favor contributing to the development more advanced AI technologies.)
Furthermore, it would be consistent with the OGI model for governments (particularly the host government) to take some actions to equalize or otherwise adjust outcomes. For example, many countries, including the U.S., have a progressive taxation system, and one could imaging adding some higher tax brackets beyond those that currently exist—such as an extra +10% marginal tax rate for incomes or capital gains exceeding 1 trillion dollars, or exceeding 1% of GDP, or whatever. (In the extreme, if taxation rates began approaching 100%, this would become confiscatory and would be incompatible with the OGI model; but there is plenty of room below that for society to choose some level of redistribution.)
I’m unsure whether a different standard is needed. Foom Liability, and other such proposals, may be enough.
For those who haven’t read the post, a bit of context. AGI companies may create huge negative externalities. We fine/sue folks for doing so in other cases. So we can set up some sort of liability. In this case, we might expect a truly huge liability in plausible worlds where we get near misses from doom. Which may be more than AGI companies can afford. When entities plausibly need to pay out more than they can afford, like in health, we may require they get insurance.
What liability ahead of time would result in good incentives to avoid foom doom? Hanson suggests:
The proposal could work even if countries were to only buy stocks of publicly traded companies in highly efficient secondary markets (and exclude IPOs and secondary public offerings), so that we do not affect the stock price or how much capital the company has at hand and thus doesn’t speed up AI progress.
Microsoft, Google, Amazon, Nvidia have quite a bit of exposure to Anthropic, DeepMind, OpenAI, and xAI.
Appreciate your integrity in doing that!
At the same time, the unfairness of early frontier lab founders getting rich seems to me like a very acceptable downside given that open investment could solve a lot of issues and the bleakness of many other paths forward.
Couldn’t we just… set up a financial agreement where the first N employees don’t own stock and have a set salary?
My main concern is that they’ll have enough power to be functionally wealthy all-the-same, or be able to get it via other means (e.g. Altman with his side hardware investment / company).
Maybe, could be nice… But since the first N employees usually get to sign off on major decisions, why would they go along with such an agreement? Or are you suggesting governments should convene to force this sort of arrangement on them?
I’m not sure I understand this part actually, could you elaborate? Is this your concern with the OGI model or with your salary-only for first-N employees idea?
I’m imagining a world where a group of people step forward to take a lot of responsibility for navigating humanity through this treacherous transition, and do not want themselves to be corrupted by financial incentives (and wish to accurately signal this to the external world). I’ll point out that this is not unheard of, Altman literally took no equity in OpenAI (though IMO was eventually corrupted by the power nonetheless).
To help with the incentives and coordination, instead having the first frontier AI megaowner step forward and unconditionally relinquish some of their power, they could sign on to a conditional contract to do so. It would only activate if other megaowners did the same.
Ok yes, that would be great.
He may have been corrupted later by power later. Alternatively, he may have been playing the long game, knowing that he would have that power eventually even if he took no equity.
This is a concern I am raising with my own idea.
I think that if you have a knack for ordinary software development, one application of that is to work at a tech company whose product already has or eventually obtains widespread adoption. This provides you with a platform where there is a straightforward path towards helping improve the lives of hundreds of millions of people worldwide by a small amount. Claude has around ~20-50 million monthly active users, and for most users it appears to be beneficial overall, so I believe that this criterion is met by Anthropic.
If you capture a small fraction of the value that you generate as a competent member of a reasonably effective team, then that often leads to substantial financial returns, and I think this is fair since the skillset and focus required to successfully plan and execute on such projects is quite rare. The bar for technical hires at a frontier lab is highly competitive, which commands equally competitive compensation in a market economy. You almost certainly had to clear a relatively higher bar (though one with less legible criteria) to be invited as an early investor. Capital appreciation is the standard reward for backing the production of a reliable and valuable service that others depend upon.
If you buy into the opportunity in AGI deployment, even the lower bounds of mundane utility can be one of the most leveraged ways to do good in the world. Given the dangers of ASI development, improvements to the safety and alignment of AGI systems can prevent profound harm, and the importance of this cannot be understated. Even in the counterfactual scenario where Anthropic was never founded, the urgency of such work would still be critical. There is some established precedent for handling a profitable industry with negative externalities (tobacco, petroleum, advertising) and it would be consistent to include the semiconductor industry in this category. I agree that existing frameworks are insufficient for making reasonable decisions about catastrophic risks. These worries have shaped my career working in AI safety, and a majority of the people here share your concerns.
However, I’m uncertain whether vilifying any small group of people would be the right move to achieve the strategic goals of the AI safety community. For example, Igor Babushchkin’s recent transition from xAI to Babuschkin Ventures could have been complicated by an attitude of hostility towards the founders and early investors of AGI companies. Since nuanced communication doesn’t work at scale, adopting this as our public position might inadvertently increase the likelihood of pivotal acts being committed by rogue threat actors, with inevitable media backlash identifying rationalist/EA people as culpable for “publishing radicalizing material”. But taken seriously, that would be a fully general argument against the distribution of online material warning of existential risks from advanced AI, and being dumb enough to be vulnerable to making that sort of error tends to exclude you from being in positions where your failures can cause any real damage, so I think my real contention with such objections is not on strategy, but on principle.
I’d be much more comfortable with accountability falling to the level of the faceless corporate entity rather than on individual members of the organization, because even senior employees with a lot of influence on paper might have limited agency in carrying out the demands of their role, and I think it would be best to follow the convention set by criticism such as Anthropic is Quietly Backpedalling on its Safety Commitments and ryan_greenblatt’s Shortform which doesn’t single out executives or researchers as responsible for the behavior of the system as a whole.
I have made exceptions to this rule in the past, but it’s almost always degraded the quality of the discussion. When asked about my opinion on this essay Dario Amodei — The Urgency of Interpretability at an AI Safety Social, I said that I thought it was hypocritical since a layoff at Anthropic UK had affected the three staff comprising their entire London interpretability team, which contradicts the top-level takeaway that labs should invest in interpretability efforts since if that was what was happening then you’d ideally be growing headcount on those teams instead of letting people go. But it’s entirely possible Dario had no knowledge of this when writing the article, or that the hiring budget was reallocated to the U.S branch of the interp team, or even that offering relocation to other positions at the company wasn’t practical for boring-and-complex H.R/accounting reasons. It doesn’t seem like the pace of great interpretability research coming out of Anthropic has slowed down, so they’re clearly still invested in it as a company. My hypothesis is that the extremely high financial returns are more of a side effect of operating at that caliber of performance instead of serving as a primary motivator for talent. If they didn’t get rich from Anthropic, they’d get rich at a hedge fund or startup. The stacks of cash are not the issue here. The ambiguous future of the lightcone is.
It’s possible that investors might be more driven by money, but I have less experience talking to them or watching how they work behind the scenes so I can’t claim to know much about what makes them tick.