FYI, I was (and remain to this day) confused by AI R&D 4 being called an “ASL-4” threshold. AFAICT as an outsider, ASL-4 refers to a set of deployment and security standards that are now triggered by dangerous capability thresholds, and confusingly, AI R&D 4 corresponds to the ASL-3 standard.
AI R&D 5, on the other hand, corresponds to ASL-4, but only on the security side (nothing is said about the deployment side, which matters quite a bit given that Anthropic includes internal deployment here and AI R&D 5 will be very tempting to deploy internally)
I’m also confused because the content of both AI R&D 4 and AI R&D 5 is seemingly identical to the content of the nearest upcoming threshold in the October 2024 policy (which I took to be the ASL-3 threshold). A rough sketch of what I think happened:
A rough sketch of my understanding of the current policy:
When I squint hard enough at this for a while, I think I can kind of see the logic: the model likely to trigger the CBRN threshold requiring ASL-3 seems quite close, whereas we might be further from the very-high threshold that was the October AI R&D threshold (now AI R&D 4), so the October AI R&D threshold was just bumped to the next level (and the one after that since causing dramatic scaling of effective compute is even harder than being a entry-level remote worker… maybe) with some confidence that we were still somewhat far away from it and thus it can be treated effectively as today’s upcoming + to-be-defined (what would have been called n+1) threshold.
I just get lost when we call it an ASL-4 threshold (it’s not, it’s an ASL-3 threshold), and also it mostly makes me sad that these thresholds are so high because I want Anthropic to get some practice reps in implementing the RSP before it’s suddenly hit with an endless supply of fully automated remote workers (plausibly the next threshold, AI R&D 4, requiring nothing more than the deployment + security standards Anthropic already put in place as of today).
I wish today’s AI R&D 4 threshold had been set at what, in the October policy, was called a “checkpoint” on the way to ASL-3: completing 2-8 hour SWE tasks. It looks like we’re about there, and it also looks like we’re about at CBRN-4, and ASL-3 seems like a reasonable set of precautions for both milestones. I do not think ASL-3 will be appropriate when we truly get endless parallelized drop-in Anthropic researchers, even if they have not yet been shown to dramatically increase the rate of effective scaling.
FYI, I was (and remain to this day) confused by AI R&D 4 being called an “ASL-4” threshold. AFAICT as an outsider, ASL-4 refers to a set of deployment and security standards that are now triggered by dangerous capability thresholds, and confusingly, AI R&D 4 corresponds to the ASL-3 standard.
AI R&D 5, on the other hand, corresponds to ASL-4, but only on the security side (nothing is said about the deployment side, which matters quite a bit given that Anthropic includes internal deployment here and AI R&D 5 will be very tempting to deploy internally)
I’m also confused because the content of both AI R&D 4 and AI R&D 5 is seemingly identical to the content of the nearest upcoming threshold in the October 2024 policy (which I took to be the ASL-3 threshold). A rough sketch of what I think happened:
A rough sketch of my understanding of the current policy:
When I squint hard enough at this for a while, I think I can kind of see the logic: the model likely to trigger the CBRN threshold requiring ASL-3 seems quite close, whereas we might be further from the very-high threshold that was the October AI R&D threshold (now AI R&D 4), so the October AI R&D threshold was just bumped to the next level (and the one after that since causing dramatic scaling of effective compute is even harder than being a entry-level remote worker… maybe) with some confidence that we were still somewhat far away from it and thus it can be treated effectively as today’s upcoming + to-be-defined (what would have been called n+1) threshold.
I just get lost when we call it an ASL-4 threshold (it’s not, it’s an ASL-3 threshold), and also it mostly makes me sad that these thresholds are so high because I want Anthropic to get some practice reps in implementing the RSP before it’s suddenly hit with an endless supply of fully automated remote workers (plausibly the next threshold, AI R&D 4, requiring nothing more than the deployment + security standards Anthropic already put in place as of today).
I wish today’s AI R&D 4 threshold had been set at what, in the October policy, was called a “checkpoint” on the way to ASL-3: completing 2-8 hour SWE tasks. It looks like we’re about there, and it also looks like we’re about at CBRN-4, and ASL-3 seems like a reasonable set of precautions for both milestones. I do not think ASL-3 will be appropriate when we truly get endless parallelized drop-in Anthropic researchers, even if they have not yet been shown to dramatically increase the rate of effective scaling.