Musings from a Lawyer turned AI Safety researcher (ShortForm)
I will use the Shortform to link my posts and Quick Takes.
LongForm post for debate: The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety
The European AI Office is currently finalizing Codes of Practice to regulate how general-purpose AI (GPAI) models be governed under the EU AI Act. They are explicitly asking for global expert input, and feedback is open to anyone, not just EU citizens.
The guidelines in development will shape:
The definition of “systemic risk”
How training compute triggers obligations
When fine-tuners or downstream actors become legally responsible
What counts as sufficient transparency, evaluation, and risk mitigation
Major labs (OpenAI, Anthropic, Google DeepMind) have already expressed willingness to sign the upcoming Codes of Practice. These codes will likely become the default enforcement standard across the EU and possibly beyond.
So far, AI safety perspectives are seriously underrepresented.
Without strong input from AI Safety researchers and technical AI Governance experts, these rules could lock in shallow compliance norms (mostly centered on copyright or reputational risk) while missing core challenges around interpretability, loss of control, and emergent capabilities.
I’ve written a detailed Longform post breaking down exactly what’s being proposed, where input is most needed, and how you can engage.
Even if you don’t have policy experience, your technical insight could shape how safety is operationalized at scale.
📅 Feedback is open until 22 May 2025, 12:00 CET
🗳️ Submit your response here
Happy to connect with anyone individually for help drafting meaningful feedback.
Quick Take: “Alignment with human intent” explicitly mentioned in European law
The AI alignment community had a major victory in the regulatory landscape, and it went unnoticed by many.
The EU AI Act explicitly mentions “alignment with human intent” as a key focus area in relation to regulation of systemic risks.
As far as I know, this is the first time “alignment” has been mentioned by a law, or major regulatory text.
It’s buried in Recital 110, but it’s there.
And it also makes research on AI Control relevant:
“International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent”.
The EU AI Act also mentions alignment as part of the Technical documentation that AI developers must make publicly available.
This means that alignment is now part of the EU’s regulatory vocabulary.
LongForm post for debate: For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance
𝐓𝐋;𝐃𝐑
I understand that Safety and Security are two sides of the same coin.
But if we don’t clearly articulate 𝐭𝐡𝐞 𝐢𝐧𝐭𝐞𝐧𝐭 𝐛𝐞𝐡𝐢𝐧𝐝 𝐀𝐈 𝐬𝐚𝐟𝐞𝐭𝐲 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧𝐬, we risk misallocating stakeholder responsibilities when defining best practices or regulatory standards.
For instance, a provider might point to adversarial robustness testing as evidence of “safety” compliance: when in fact, the measure only hardens the model against 𝐞𝐱𝐭𝐞𝐫𝐧𝐚𝐥 𝐭𝐡𝐫𝐞𝐚𝐭𝐬 (security), without addressing the internal model behaviors that could still cause harm to users.
𝐈𝐟 𝐫𝐞𝐠𝐮𝐥𝐚𝐭𝐨𝐫𝐬 𝐜𝐨𝐧𝐟𝐥𝐚𝐭𝐞 𝐭𝐡𝐞𝐬𝐞, 𝐡𝐢𝐠𝐡-𝐜𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐥𝐚𝐛𝐬 𝐦𝐢𝐠𝐡𝐭 “𝐦𝐞𝐞𝐭 𝐭𝐡𝐞 𝐥𝐞𝐭𝐭𝐞𝐫 𝐨𝐟 𝐭𝐡𝐞 𝐥𝐚𝐰” 𝐰𝐡𝐢𝐥𝐞 𝐛𝐲𝐩𝐚𝐬𝐬𝐢𝐧𝐠 𝐭𝐡𝐞 𝐬𝐩𝐢𝐫𝐢𝐭 𝐨𝐟 𝐬𝐚𝐟𝐞𝐭𝐲 𝐚𝐥𝐭𝐨𝐠𝐞𝐭𝐡𝐞𝐫.
Opinion Post: Scaling AI Regulation: Realistically, what Can (and Can’t) Be Regulated?
Should we even expect regulation to be useful for AI safety?
Is there a version of AI regulation that wouldn’t be performative?
How do you see the “Brussels effect” playing out for AI Safety?
Are regulatory sandboxes a step in the right direction?
The UN General Assembly just passed a resolution to set up an Independent Scientific Panel on Artificial Intelligence and an annual Global Dialogue on AI Governance.
40 experts will be chosen by an independent appointment committee, half nominated by UN states and half appointed by the Secretary-General.
As of 27 Aug 2025, the UN says it will run an open call for nominations and then the Secretary-General will recommend 40 names to the General Assembly. No names have been announced [1] yet.
Two caveats jump out:
The mandate is explicitly non-military only, so the domain where AI systems are most likely to be deployed recklessly is left out...
The outputs are to be “policy-relevant but non-prescriptive”, i.e. climate-style summaries rather than hard mandates.
Still, the commitment to “issue evidence-based scientific assessments synthesizing and analysing existing research” leaves a narrow window of hope.
If serious AI-safety experts are appointed, this panel can be a real venue for risk awareness and cross-border coordination on x-risk mitigation.
On the contrary, without clear guardrails on composition and scope, it risks becoming a “safety”-branded accelerator for capabilities.
I’d expect Yoshua Bengio to be a top suggestion already, (among other reasons) as he recently led the Safety & Security chapter of the EU General Purpose AI Code of Practice.
Item 3 has some constraints on members (emphasis mine):
This means only two each from US, UK, China, etc. I wonder what the geographic and gender balance will actually look like; these will significantly influence the average expertise type and influence of members.
My guess is that x-risk mitigation will not be the primary focus at first just because over half of the experts are American and British men and there are so many other interests to represent. Nor would industry be heavily represented because it skews too American (and the document mentioned CoIs, and the 7 goals of the Dialogue are mostly not about frontier capabilities). But in the long term, unless takeoff is fast, developing countries will realize the US is marching towards DSA and interesting things could happen.
Edit: my guess for the composition would be similar to the existing High-level Advisory Body on Artificial Intelligence, with 39 members, of which
19 men, 20 women
15 academia/research (including 10 professors), 10 government, 4 from big tech and scaling labs, 10 other
17 Responsibility / Safety / Policy oversight positions, 22 other positions
Nationalities:
9 Americas (incl. 4 US), 11 Europe, 11 Asia (incl. 2 China), 1 Oceania, 5 Africa. They heavily overweighted Europe, which has 28% of the seats but only 9% of world population
19 high income countries, 24 LMIC
gpt5 isn’t sure about some of the dual nationality cases though
1 big name in x-risk (Jaan Tallinn)
On the flip side, this means that if we do know people who are AI experts but not from the US/EU/China, forwarding this information to them so that they can apply with a higher chance of being accepted might be valuable.
Agree, especially from developing countries without a strong preexisting stance on AI, where choices could be less biased towards experts who already have lots of prestige, and more weighted on merits + lobbying.
Huh that is a really good point. There are way too many people with US/UK backgrounds to easily differentiate between the expert pretenders and the really substantial experts. It’s even getting harder to do so on LW for many topics as karma becomes less and less meaningful.
And I can’t imagine the secretary general’s office will have that much time to scrutinize each proposed candidate, so it might even be a positive thing overall.
Exactly! Thank you for highlighting this.
Yes, these are the usual selection criteria constrains for policy panels. And I agree that the vast majority of big names are US (some UK) based and male. But hey, there are lesser known voices in EU policy that care about AI Safety. But I do share your concern. I’ll have the opportunity to ask about this at CAIDP at some point soon (Centre for AI and Digital Policy). I think many people would agree that it’s a good opportunity to talk about AIS awareness in less involved members states…
The European AI Office is finalizing Codes of Practice that will define how general-purpose AI (GPAI) models are governed under the EU AI Act.
They are explicitly asking for global expert input, and feedback is open to anyone, not just EU citizens.
The guidelines in development will shape:
The definition of “systemic risk”
How training compute triggers obligations
When fine-tuners or downstream actors become legally responsible
What counts as sufficient transparency, evaluation, and risk mitigation
I believe that more feedback from alignment and interpretability researchers is needed.
Without strong input from AI Safety researchers and technical AI Governance experts, these rules could lock in shallow compliance norms (mostly centered on copyright or reputational risk) while missing core challenges around interpretability, loss of control, and emergent capabilities.
I’ve written a detailed Longform post breaking down exactly what’s being proposed, where input is most needed, and how you can engage.
Even if you don’t have policy experience, your technical insight could shape how safety is operationalized at scale.
📅 Feedback is open until 22 May 2025, 12:00 CET
🗳️ Submit your response here
Happy to connect with anyone individually for help drafting meaningful feedback.
Thanks for posting and bringing attention to this! I have forwarded to my friend who works in AI safety.
Thank you!!
The European Commission Seeks Experts for Its AI Scientific Panel
The European Commission is now accepting applications for the Scientific Panel of Independent Experts, focusing on general-purpose AI (GPAI). This panel will support the enforcement of the AI Act, and forms part of the institutional scaffolding designed to ensure that GPAI oversight is anchored in technical and scientific legitimacy.
The panel will advise the EU AI Office and national authorities on:
Systemic risks associated with GPAI models
Classification of GPAI models (including systemic-risk designation)
Evaluation methods (benchmarks, red-teaming, risk analysis)
Cross-border market surveillance
Enforcement tools and alerts for emerging risks
This is the institutional embodiment of what many in this community have been asking for: real technical expertise informing regulatory decision-making.
Who Should Apply?
The Commission is selecting 60 experts for a renewable 24-month term.
Members are appointed in a personal capacity and must be fully independent of any GPAI provider (i.e. no employment, consulting, or financial interest). Names and declarations of interest will be made public.
Relevant expertise must include at least one of the following:
Model evaluation and risk analysis: Red-teaming, capability forecasting, adversarial testing, human uplift studies, deployment evaluations
Risk methodologies: Risk taxonomies, modeling, safety cases
Mitigations and governance: Fine-tuning protocols, watermarking, guardrails, safety processes, incident response, internal audits
Misuse and systemic deployment risks: CBRN risk, manipulation of populations, discrimination, safety/security threats, societal harms
Cyber offense risks: Autonomous cyber operations, zero-day exploitation, social engineering, threat scaling
Provider-side security: Prevention of leakage, theft, circumvention of safety controls
Emergent model risks: Deception, self-replication, miscoordination, alignment drift, long-horizon planning
Compute and infrastructure: Compute auditing, resource thresholds, verification protocols
Eligibility:
PhD in a relevant field OR equivalent track record
Proven scientific impact in AI/GPAI research or systemic-risk analysis
Must demonstrate full independence from GPAI providers
Citizenship: At least 80% of panel members must be from EU/EEA countries. The other 20% can be from anywhere, so international researchers are eligible.
Deadline: 14 September 2025
Application link: EU Survey – GPAI Expert Panel
Contact: EU-AI-SCIENTIFIC-PANEL@ec.europa.
Why It’s Worth Considering
Even if you’re selected for the Scientific Panel, you are still allowed to carry out your own independent research as long as it is not for GPAI providers and you comply with confidentiality obligations.
This isn’t a full-time, 40-hour-per-week commitment. Members are assigned specific tasks on a periodic basis, with deadlines. If you’d like to know more before applying, I can direct you to the Commission’s contacts for clarification.
Serving on the panel enhances your visibility, sharpens your policy-relevant credentials, and helps fund your own work without forcing you to “sell your soul” or abandon independent projects.
Note that the documentation says they’ll aim to recruit 1-3 nationals from each EU country (plus Norway, Iceland and Liechtenstein). As far as I understood, it does not require them to be living in their home country at the time of applying. Therefore, people from small European countries have especially good odds.
Note also that the gender balance goal would also increase the chance for any women applying.
Thanks for sharing! I’ve been sharing it with people working in AI Safety across Europe.
Do you know if anyone is doing a coordinated project to reach out to promising candidates from all EU countries and encourage them to apply? I’m worried that great candidates may not hear about it in time.
Hi, Lucie! Not to my knowledge. I have only seen this advertised by people like Risto Uuk or Jonas Schuett in their newsletters, and informally mentioned in events by people who currently work in the AI Office. But I am not aware of efforts to reach out to specific candidates.
The EU AI Act explicitly mentions “alignment with human intent” as a key focus area in relation to regulation of systemic risks.
As far as I know, this is the first time “alignment” has been mentioned by a law, or major regulatory text.
It’s buried in Recital 110, but it’s there. And it also makes research on AI Control relevant:
“International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent”.
The EU AI Act also mentions alignment as part of the Technical documentation that AI developers must make publicly available.
This means that alignment is now part of the EU’s regulatory vocabulary.
But here’s the issue: most AI governance professionals and policymakers still don’t know what it really means, or how your research connects to it.
I’m trying to build a space where AI Safety and AI Governance communities can actually talk to each other.
If you’re curious, I wrote an article about this, aimed at the corporate decision-makers that lack literacy on your area.
Would love any feedback, especially from folks thinking about how alignment ideas can scale into the policy domain.
Here is the Substack link (I also posted it on LinkedIn):
https://open.substack.com/pub/katalinahernandez/p/why-should-ai-governance-professionals?utm_source=share&utm_medium=android&r=1j2joa
My intuition says that this was a push from Future of Life Institute.
Thoughts? Did you know about this already?
I did not know about this already.
I don’t think it’s been widely discussed within AI Safety forums. Do you have any other comments, though? Epistemic pessimism is welcomed XD. But I did think that this was at least update-worthy.
I did not know about this either. Do you know whether the EAs in the EU Commission know about it?
Hi Lucie, thanks so much for your comment!
I’m not very involved with the Effective Altruism community myself, though I did post the same Quick Take on the EA Forum today, but I haven’t received any responses there yet. So I can’t really say for sure how widely known this is.
For context: I’m a lawyer working in AI governance and data protection, and I’ve also been doing independent AI safety research from a policy angle. That’s how I came across this, just by going through the full text of the AI Act as part of my research.
My guess is that some of the EAs working closely on policy probably do know about it, and influenced this text too! But it doesn’t seem to have been broadly highlighted or discussed in alignment forums so far. Which is why I thought it might be worth flagging.
Happy to share more if helpful, or to connect further on this.
Would a safety-focused breakdown of the EU AI Act be useful to you?
The Future of Life Institute published a great high-level summary of the EU AI Act here: https://artificialintelligenceact.eu/high-level-summary/
What I’m proposing is a complementary, safety-oriented summary that extracts the parts of the AI Act that are most relevant to AI alignment researchers, interpretability work, and long-term governance thinkers.
It would include:
Provisions related to transparency, human oversight, and systemic risks.
Notes on how technical safety tools (e.g. interpretability, scalable oversight, evals) might interface with conformity assessments, or the compliance exemptions available for research work.
Commentary on loopholes or compliance dynamics that could shape industry behavior.
What the Act doesn’t currently address from a frontier risk or misalignment perspective.
Target length: 3–5 pages, written for technical researchers and governance folks who want signal without wading through dense regulation.
If this sounds useful, I’d love to hear what you’d want to see included, or what use cases would make it most actionable.
And if you think this is a bad idea, no worries. Just please don’t downvote me into oblivion, I just got to decent karma :).
Thanks in advance for the feebdack!
I guess this should wait for the final draft of the GPAI Code of Practice to be released?
I’ll try to make it as helpful as possible so, yes. But I thought I’d gather feedback from now ☺️.
Two days ago, I published a Substack article called “The Epistemics of Being a Mudblood: Stress Testing intellectual isolation”. I wasn’t sure whether to cross-post it here, but a few people encouraged me to at least share the link.
By background I’m a lawyer (hybrid Legal-AI Safety researcher), and I usually write about AI Safety to spread awareness among tech lawyers and others who might not otherwise engage with the field.
This post, though, is more personal: a reflection on how “deep thinking” and rationalist habits have shaped my best professional and personal outputs, even through long phases of intellectual isolation. Hence the “mudblood” analogy, which (to my surprise) resonated with more people than I expected.
Sharing here in case it’s useful. Obviously very open to criticism and feedback (that’s why I’m here!), but also hoping it’s of some help. :)