Suppose that (A) alignment risks do not become compelling-to-almost-all-lab-people and (B) with 10-30x AIs, solving alignment takes like 1-3 years of work with lots of resources.
Claim: Safety work during takeoff is crucial. (There will be so much high-quality AI labor happening!)
Corollary: Crucial factors are (1) efficiency-of-converting-time-and-compute-into-safety-work-during-takeoff (research directions, training and eliciting AIs to be good at safety work, etc.) and (2) time-and-compute-for-safety-work-during-takeoff.
Corollary: A crucial factor (for (2)) is whether US labs are racing against each other or the US is internally coordinated and thinks slowing others is a national security priority and lead time is largely spent on reducing misalignment risk. And so a crucial factor is US government buy-in for nonproliferation. (And labs can increase US willingness to pay for nonproliferation [e.g. demonstrating importance of US lead and of mitigating alignment risk] and decrease the cost of enforcing nonproliferation [e.g. boosting the US military].) With good domestic coordination, you get a surprisingly good story. With no domestic coordination, you get a bad story where the leading lab probably spends ~no lead time focused on alignment.
(There are other prioritization implications: security during takeoff is crucial (probably, depending on how exactly the nonproliferation works), getting useful safety work out of your models—and preventing disasters while running them—during takeoff is crucial, idk.)
I feel like this is important and underappreciated. I also feel like I’m probably somewhat confused about this. I might write a post on this but I’m shipping it as a shortform because (a) I might not and (b) this might elicit feedback.
with 10-30x AIs, solving alignment takes like 1-3 years of work … so a crucial factor is US government buy-in for nonproliferation
Those AIs might be able to lobby for nonproliferation or do things like write a better IABIED, making coordination interventions that oppose myopic racing. Directing AIs to pursue such projects could be a priority comparable to direct alignment work. Unclear how visibly asymmetric such interventions will prove to be, but then alignment vs. capabilities work might be in a similar situation.
My point is that the 10-30x AIs might be able to be more effective at coordination around AI risk than humans alone, in particular more effective than currently seems feasible in the relevant timeframe (when not taking into account the use of those 10-30x AIs). Saying “labs” doesn’t make this distinction explicit.
Those AIs might be able to lobby for nonproliferation or do things like write a better IABIED, making coordination interventions that oppose myopic racing.
Indeed, the book might be more effective by virtue of having an AI as its author!
security during takeoff is crucial (probably, depending on how exactly the nonproliferation works)
I think you’re already tracking this but to spell out a dynamic here a bit more: if the US maintains control over what runs on its datacenters and has substantially more compute on one project than any other actor, then it might still be OK for adversaries to have total visibility into your model weights and everything else you do: you just work on a mix of AI R&D and defensive security research with your compute (at a faster rate than they can work on RSI+offense with theirs) until you become protected against spying, and then your greater compute budget means you can do takeoff faster and they only reap the benefits of your models up to a relatively early point. Obviously this is super risky and contingent on offense/defense balance and takeoff speeds and is a terrible position to be in, but I think there’s a good chance it’s kinda viable.
(Also there are some things you can do to differentially advantage yourself even during the regime in which adversaries can see everything you do and steal all your results. Eg your AI does research into a bunch of optimization tricks that are specific to a model of chip the US has almost all of, or studies techniques for making a model that you can’t finetune to pursue different goals without wrecking its capabilities and implements them on the next generation.)
You still care enormously about security over things like “the datacenters are not destroyed” and “the datacenters are running what you think they’re running” and “the human AI researchers are not secretly saboteurs” and so on, of course.
Suppose that (A) alignment risks do not become compelling-to-almost-all-lab-people and (B) with 10-30x AIs, solving alignment takes like 1-3 years of work with lots of resources.
Claim: Safety work during takeoff is crucial. (There will be so much high-quality AI labor happening!)
Corollary: Crucial factors are (1) efficiency-of-converting-time-and-compute-into-safety-work-during-takeoff (research directions, training and eliciting AIs to be good at safety work, etc.) and (2) time-and-compute-for-safety-work-during-takeoff.
Corollary: A crucial factor (for (2)) is whether US labs are racing against each other or the US is internally coordinated and thinks slowing others is a national security priority and lead time is largely spent on reducing misalignment risk. And so a crucial factor is US government buy-in for nonproliferation. (And labs can increase US willingness to pay for nonproliferation [e.g. demonstrating importance of US lead and of mitigating alignment risk] and decrease the cost of enforcing nonproliferation [e.g. boosting the US military].) With good domestic coordination, you get a surprisingly good story. With no domestic coordination, you get a bad story where the leading lab probably spends ~no lead time focused on alignment.
(There are other prioritization implications: security during takeoff is crucial (probably, depending on how exactly the nonproliferation works), getting useful safety work out of your models—and preventing disasters while running them—during takeoff is crucial, idk.)
I feel like this is important and underappreciated. I also feel like I’m probably somewhat confused about this. I might write a post on this but I’m shipping it as a shortform because (a) I might not and (b) this might elicit feedback.
Those AIs might be able to lobby for nonproliferation or do things like write a better IABIED, making coordination interventions that oppose myopic racing. Directing AIs to pursue such projects could be a priority comparable to direct alignment work. Unclear how visibly asymmetric such interventions will prove to be, but then alignment vs. capabilities work might be in a similar situation.
Yep, this is what I meant by “labs can increase US willingness to pay for nonproliferation.” Edited to clarify.
My point is that the 10-30x AIs might be able to be more effective at coordination around AI risk than humans alone, in particular more effective than currently seems feasible in the relevant timeframe (when not taking into account the use of those 10-30x AIs). Saying “labs” doesn’t make this distinction explicit.
Indeed, the book might be more effective by virtue of having an AI as its author!
I think you’re already tracking this but to spell out a dynamic here a bit more: if the US maintains control over what runs on its datacenters and has substantially more compute on one project than any other actor, then it might still be OK for adversaries to have total visibility into your model weights and everything else you do: you just work on a mix of AI R&D and defensive security research with your compute (at a faster rate than they can work on RSI+offense with theirs) until you become protected against spying, and then your greater compute budget means you can do takeoff faster and they only reap the benefits of your models up to a relatively early point. Obviously this is super risky and contingent on offense/defense balance and takeoff speeds and is a terrible position to be in, but I think there’s a good chance it’s kinda viable.
(Also there are some things you can do to differentially advantage yourself even during the regime in which adversaries can see everything you do and steal all your results. Eg your AI does research into a bunch of optimization tricks that are specific to a model of chip the US has almost all of, or studies techniques for making a model that you can’t finetune to pursue different goals without wrecking its capabilities and implements them on the next generation.)
You still care enormously about security over things like “the datacenters are not destroyed” and “the datacenters are running what you think they’re running” and “the human AI researchers are not secretly saboteurs” and so on, of course.