Whether training compute is a sufficient proxy for generality or risk, and what better metrics might exist.
How to define and detect emergent capabilities that warrant reclassification or new evaluations.
What kinds of model evaluations or interpretability audits should qualify as systemic risk mitigation.
How downstream fine-tuning workflows (RLHF, scaffolding, etc.) may create latent alignment risk even at moderate compute scales.
How the Code of Practice could embed meaningful safety standards beyond documentation. E.g., through commitments to research mechanistic transparency, continuous monitoring, and post-deployment control mechanisms.
This is where I’d personally hope that anyone giving feedback focuses their attention. Even just to bring the right strategies to the attention of policy spheres.
Please, be thorough. I’ve provided a breakdown of the main points in the Targeted consultation document, but I’d recommend looking at the Safety and Security section of the Third draft of the General-Purpose AI Code of Practice.
This is where I’d personally hope that anyone giving feedback focuses their attention.
Even just to bring the right strategies to the attention of policy spheres.
Please, be thorough. I’ve provided a breakdown of the main points in the Targeted consultation document, but I’d recommend looking at the Safety and Security section of the Third draft of the General-Purpose AI Code of Practice.
It can be downloaded here: Third Draft of the General-Purpose AI Code of Practice published, written by independent experts | Shaping Europe’s digital future