Based on previous data, it’s plausible like CCP AGI will perform worse on safety benchmarks than US AGI. Take Cisco Harmbench evaluation results:
DeepSeek R1: Demonstrated a 100% failure rate in blocking harmful prompts according to Anthropic’s safety tests.
OpenAI GPT-4o: Showed an 86% failure rate in the same tests, indicating better but still concerning gaps in safety measures.
Meta Llama-3.1-405B: Had a 96% failure rate, performing slightly better than DeepSeek but worse than OpenAI.
Though, if it was just CCP making AGI or just US making AGI it might be better because it’d reduce competitive pressures.
But, due to competitive pressures and investments like Stargate, the AGI timeline is accelerated, and the first AGI model may not perform well on safety benchmarks.
You have conflated two separate evaluations, both mentioned in the TechCrunch article.
The percentages you quoted come from Cisco’s HarmBench evaluation of multiple frontier models, not from Anthropic and were not specific to bioweapons.
Dario Amondei stated that an unnamed DeepSeek variant performed worst on bioweapons prompts, but offered no quantitative data. Separately, Cisco reported that DeepSeek-R1 failed to block 100% of harmful prompts, while Meta’s Llama 3.1 405B and OpenAI’s GPT-4o failed at 96 % and 86 %, respectively.
When we look at performance breakdown by Cisco, we see that all 3 models performed equally badly on chemical/biological safety.
Based on previous data, it’s plausible like CCP AGI will perform worse on safety benchmarks than US AGI. Take Cisco Harmbench evaluation results:
DeepSeek R1: Demonstrated a 100% failure rate in blocking harmful prompts according to Anthropic’s safety tests.
OpenAI GPT-4o: Showed an 86% failure rate in the same tests, indicating better but still concerning gaps in safety measures.
Meta Llama-3.1-405B: Had a 96% failure rate, performing slightly better than DeepSeek but worse than OpenAI.
Though, if it was just CCP making AGI or just US making AGI it might be better because it’d reduce competitive pressures.
But, due to competitive pressures and investments like Stargate, the AGI timeline is accelerated, and the first AGI model may not perform well on safety benchmarks.
You have conflated two separate evaluations, both mentioned in the TechCrunch article.
The percentages you quoted come from Cisco’s HarmBench evaluation of multiple frontier models, not from Anthropic and were not specific to bioweapons.
Dario Amondei stated that an unnamed DeepSeek variant performed worst on bioweapons prompts, but offered no quantitative data. Separately, Cisco reported that DeepSeek-R1 failed to block 100% of harmful prompts, while Meta’s Llama 3.1 405B and OpenAI’s GPT-4o failed at 96 % and 86 %, respectively.
When we look at performance breakdown by Cisco, we see that all 3 models performed equally badly on chemical/biological safety.
Thanks, updated the comment to be more accurate