If you read the Anthtropic System Card carefully (all 120 pages of it :-) ), you’ll find that your proposal is exactly what they did, for biological risks (and they acknowledge that they didn’t do it for chemical): test and measure how much the LLM’s asistance (specifically, a helpful-but-not-harmless variant of it with no filters or similar mitigations, so comparable to what a good jailbreaker might be able to get the model to do but without the jailbreak overhead) increased the productivity of a group of people with Biological degrees but no specialized knowledge in making bio-weapons to complete practical tasks relating to making bio-weapons, compared to a team that didn’t have AI assistance but could, for example, read Wikipedia pages.
If you read the Anthtropic System Card carefully (all 120 pages of it :-) ), you’ll find that your proposal is exactly what they did, for biological risks (and they acknowledge that they didn’t do it for chemical): test and measure how much the LLM’s asistance (specifically, a helpful-but-not-harmless variant of it with no filters or similar mitigations, so comparable to what a good jailbreaker might be able to get the model to do but without the jailbreak overhead) increased the productivity of a group of people with Biological degrees but no specialized knowledge in making bio-weapons to complete practical tasks relating to making bio-weapons, compared to a team that didn’t have AI assistance but could, for example, read Wikipedia pages.