Ariel_ comments on The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

Ariel_ 24 Apr 2025 13:50 UTC
3 points
0
I’d suggest updating the language in the post to clarify things and not overstate :)
Regarding the 3rd draft—opinions varied between people I work with but we are generally happy. Loss of Control is included in the selected systemic risks, as well as CBRN. Appendix 1.2 also has useful things, though some valid concerns got raised there on compatibility with the AI Act language that still need tweaking (possobly merging parts of 1.2 into selected systemic risks). As far as interpretability—the code is meant to be outcome based, and the main reason evals are mentioned is that they are in the act. Prescribing interpretability isn’t something the code can do, and also probably shouldn’t as these techniques arent good enough yet to be prescribed as mandatory for mitigating systemic risks.
- Katalina Hernandez 30 Apr 2025 17:29 UTC
  1 point
  0
  Parent
  I was reading “The Urgency of Interpretability” by Dario Amodei, and the following part made me think about our discussion.
  
  “Second, governments can use light-touch rules to encourage the development of interpretability research and its application to addressing problems with frontier AI models. Given how nascent and undeveloped the practice of “AI MRI” is, it should be clear why it doesn’t make sense to regulate or mandate that companies conduct them, at least at this stage: it’s not even clear what a prospective law should ask companies to do. But a requirement for companies to transparently disclose their safety and security practices (their Responsible Scaling Policy, or RSP, and its execution), including how they’re using interpretability to test models before release, would allow companies to learn from each other while also making clear who is behaving more responsibly, fostering a “race to the top”.”
  Although I agree with you that (at this stage), regulatory requirement to disclose interpretability techniques used to test models before release would not be very useful for for outcome-based CoPs.
  But I hope that there is a path forward for this approach in the near future.
- Katalina Hernandez 24 Apr 2025 14:51 UTC
  1 point
  0
  Parent
  Thanks a lot for your follow up. I’d love to connect on LinkedIn if that’s okay, I’m very grateful for your feedback!
  I’d say: “I believe that more feedback from alignment and interpretability researchers is needed” instead. Thoughts?
  - Ariel_ 24 Apr 2025 23:39 UTC
    2 points
    0
    Parent
    Sure! and yeah regarding edits—I have not gone through the full request for feedback yet, I expect to have a better sense late next week of which contributions are most needed and how to prioritize. I mainly wanted to comment first on obvious things that stood out to me from the post.
    There is also an Evals workshop in Brussels on Monday where we might learn more. I’ve know of some some non-EU based technical safety researchers who are attending, which is great to see.