I would like Anthropic to prepare for a world where the core business model of scaling to higher AI capabilities is no longer viable because pausing is needed. This looks like having a comprehensive plan to Pause (actually stop pushing the capabilities frontier for an extended period of time, if this is needed). I would like many parts of this plan to be public. This plan would ideally cover many aspects, such as the institutional/governance (who makes this decision and on what basis, e.g., on the basis of RSP), operational (what happens), and business (how does this work financially).
To speak to the business side: Currently, the AI industry is relying on large expected future profits to generate investment. This is not a business model which is amenable to pausing for a significant period of time. I would like there to be minimal friction to pausing. One way to solve this problem is to invest heavily (and have a plan to invest more if a pause is imminent or ongoing) in revenue streams which are orthogonal to catastrophic risk, or at least not strongly positively correlated. As an initial brainstorm, these streams might include:
Making really cheap weak models.
AI integration in low-stakes domains or narrow AI systems (ideally combined with other security measures such as unlearning).
Selling AI safety solutions to other AI companies.
A plan for the business side of things should also include something about “what do we do about all the expected equity that employees lose if we pause, and how do we align incentives despite this”, it should probably include a commitment to ensure all investors and business partners understand that a long term pause may be necessary for safety and are okay with that risk (maybe this is sufficiently covered under the current corporate structure, I’m not sure, but those sure can change).
It’s all good and well to have an RSP that says “if X we will pause”, but the situation is probably going to be very messy with ambiguous evidence, crazy race pressures, crazy business pressures from external investors, etc. Investing in other revenue streams could reduce some of this pressure, and (if shared) potentially it could enable a wider pause. e.g., all AI companies see a viable path to profit if they just serve early AGIs for cheap, and nobody has intense business pressure to go to superintelligence.
Second, I would like Anthropic to invest in its ability to make credible commitments about internal activities and model properties. There is more about this in Miles Brundage’s blog post and my paper, as well as FlexHEGs. This might include things like:
cryptographically secured audit trails (version control for models). I find it kinda crazy that AI companies sometimes use external pre-deployment testers and then change a model in completely unverifiable ways and release it to users. Wouldn’t it be so cool if OpenAI couldn’t do that, and instead when their system card comes out there are certificates verifying which model was evaluated and how the model was changed from evaluation to deployment? That would be awesome!,
whistleblower programs, declaring and allowing external auditing of what compute is used for (e.g., differentiating training vs. inference clusters in a clear and relatively unspoofable way),
using TEEs and certificates to attest that the same model is evaluated as being deployed to users, and more.
I think investment/adoption in this from a major AI company could be a significant counterfactual shift in the likelihood of national or international regulation that includes verification. Many of these are also good for being-a-nice-company reasons, like I think it would be pretty cool if claims like Zero Data Retention were backed by actual technical guarantees rather than just trust (which it seems like is the status quo).
I would like Anthropic to prepare for a world where the core business model of scaling to higher AI capabilities is no longer viable because pausing is needed. This looks like having a comprehensive plan to Pause (actually stop pushing the capabilities frontier for an extended period of time, if this is needed). I would like many parts of this plan to be public. This plan would ideally cover many aspects, such as the institutional/governance (who makes this decision and on what basis, e.g., on the basis of RSP), operational (what happens), and business (how does this work financially).
To speak to the business side: Currently, the AI industry is relying on large expected future profits to generate investment. This is not a business model which is amenable to pausing for a significant period of time. I would like there to be minimal friction to pausing. One way to solve this problem is to invest heavily (and have a plan to invest more if a pause is imminent or ongoing) in revenue streams which are orthogonal to catastrophic risk, or at least not strongly positively correlated. As an initial brainstorm, these streams might include:
Making really cheap weak models.
AI integration in low-stakes domains or narrow AI systems (ideally combined with other security measures such as unlearning).
Selling AI safety solutions to other AI companies.
A plan for the business side of things should also include something about “what do we do about all the expected equity that employees lose if we pause, and how do we align incentives despite this”, it should probably include a commitment to ensure all investors and business partners understand that a long term pause may be necessary for safety and are okay with that risk (maybe this is sufficiently covered under the current corporate structure, I’m not sure, but those sure can change).
It’s all good and well to have an RSP that says “if X we will pause”, but the situation is probably going to be very messy with ambiguous evidence, crazy race pressures, crazy business pressures from external investors, etc. Investing in other revenue streams could reduce some of this pressure, and (if shared) potentially it could enable a wider pause. e.g., all AI companies see a viable path to profit if they just serve early AGIs for cheap, and nobody has intense business pressure to go to superintelligence.
Second, I would like Anthropic to invest in its ability to make credible commitments about internal activities and model properties. There is more about this in Miles Brundage’s blog post and my paper, as well as FlexHEGs. This might include things like:
cryptographically secured audit trails (version control for models). I find it kinda crazy that AI companies sometimes use external pre-deployment testers and then change a model in completely unverifiable ways and release it to users. Wouldn’t it be so cool if OpenAI couldn’t do that, and instead when their system card comes out there are certificates verifying which model was evaluated and how the model was changed from evaluation to deployment? That would be awesome!,
whistleblower programs, declaring and allowing external auditing of what compute is used for (e.g., differentiating training vs. inference clusters in a clear and relatively unspoofable way),
using TEEs and certificates to attest that the same model is evaluated as being deployed to users, and more.
I think investment/adoption in this from a major AI company could be a significant counterfactual shift in the likelihood of national or international regulation that includes verification. Many of these are also good for being-a-nice-company reasons, like I think it would be pretty cool if claims like Zero Data Retention were backed by actual technical guarantees rather than just trust (which it seems like is the status quo).