The Factory-Gate Fallacy

Why AI safety has to live wherever AI is deployed, not just where it is built.

I was interested by a comment in the BlueDot Slack Community arguing against someone taking their AI safety experience to an AI transformation role at some company, and should focus on labs instead.

Despite being a cybersecurity practitioner, which has similar dynamics, I too had the same intuition in 2021-2023. After my first 5-10 readings on AI risk in the 2010′s, I started actively thinking I needed to work on the AI safety risk at an AI lab: after all, the lab was where it seemed to be happening and therefore had to be the best place to steer the impact of AI towards positive outcomes. Indeed, I encountered many arguments that real-world harms were worthless issues compared to ASI (and I’ll share some in this essay). I realized quickly, however, that AI transformation in any industry requires AI safety competency to be resident at every organization that deploys AI, similar to cybersecurity, rather than being concentrated at the labs that build an underlying model or tool.

Nine mature safety-critical industries like cybersecurity, aviation, finance, and medical devices have already faced this question. I wrote this essay to find the answer and share it as clearly as possible.

Speaking as someone who reviewed nearly a thousand resumes over the last decade, interviewed over a hundred security engineers, and watched many cases of deception in the hiring process (botnet operators trying to join my company, workers overseas claiming to be US-based, teleprompter cheating) I witnessed that the job market is hard for everyone (I really needed to fill a role!) We need safety and security talent across nearly every organization, and the good news is that we need it in both risk functions and as ambassadors and advocates. Thousands of decisions are made outside of organizations’ security teams. High-impact roles’ awareness, judgment, and priority can matter as much, if not more, than how securely <major technology company> builds the latest third-party solution, especially when alternative “challenger” providers push a race to the bottom in safety priorities and apparent unit economics. But it’s not just the product (finished, or how it’s made). Security and safety acumen is needed upstream, but also on every operator’s security team, and across the operator’s leadership roles. The software company or AI research lab’s safety posture is barely a third of the picture.

Managing AI risk requires organizations to empower leaders and individual contributors to acquire AI safety knowledge and apply it—hence, readers from the AI safety community anxious to work at the major labs can relax and take a PI-shaped role in a thousand times broader landscape.

Heinrich’s safety pyramid shows that the centralization failed: while fatal accidents caused by AI are few at the top of the pyramid, hundreds of minor accidents and thousands of near-miss incidents are already repertoired and slowly gaining coverage. Between January and May 2026, the Centre for Long-Term Resilience and METR documented incidents with rogue agents, including cases impacting real people and real infrastructure. Open forums catalogued more. Chatham House sessions over the last 1-2 years surface still more issues, and HiddenLayer’s 2026 findings (under confidentiality agreements) indicate a deeper set of issues that the affected organizations were unwilling to publish. The base is wide, and the middle layers are already showing real-world harm.

The intuition I want to debunk feels like common sense: the dangerous thing is the frontier model; the frontier model is built at a handful of labs; so the people who understand how to make it safe should mostly sit at those labs. Everyone downstream just calls an API. Call that belief the factory-gate fallacy: the idea that safety is finished where the thing is made. It is the natural shape of the problem if you believe safety is a property of the artifact that we can engineer upstream, certify at the factory gate, and ship intact.

But safety is not a substance we can manufacture and ship; it is a relationship between a system and the messy world it runs in. That relationship, and the language to interpret it, only exist downstream, in operation, at the point of use. How could the competency to manage it avoid that space if we intend to reach positive outcomes?

The essay builds the case in stages: defining what is actually being argued, walking through the industries that settled this question decades ago, weighing what the AI-risk community has already said for and against, checking what is happening on the ground right now, stating the core claim as precisely as possible, and then taking the strongest objections apart one at a time. I am glad to be a part of that understaffed world, just as I’ve been glad to be part of the broader cybersecurity world for many years before it, even as they both influence my perspective and biased priors.

1. Defining the motion

The claim: AI safety and security competency needs directly responsible individuals with a focus on it inside every organization that adopts AI or operates in an ecosystem where most actors have adopted AI, not only inside the labs, and that pattern is a high-impact opportunity. The corollary is that concentrating that competency exclusively in AI research-and-development organizations is neither the current trajectory nor compatible with a safe transition to widespread AI use.

AI safety competency is about getting a system to do what it is supposed to do and to fail gracefully when it does not. It runs from alignment and interpretability — what the model is and what it is trying to do — through evaluations, robustness, and monitoring, which concern how it behaves in use, to the applied safety engineering that wires all of that into an operating system. Ji and colleagues’ (2025) ACM Computing Surveys overview of alignment maps the same terrain, separating getting a system to learn the right objective from assuring that it did. The assurance half is the evaluation-and-monitoring work that happens at and after deployment. In a deployer organization, the named sub-skills include:

  • alignment and interpretability familiarity

  • evaluation design and execution

  • behavioral robustness testing

  • operational monitoring and observability

  • hazard analysis (STPA, FMEA adapted for ML)

  • human-in-the-loop design

  • incident response for model failures

  • change management for model updates

AI security competency is about defending the system against adversaries and misuse, and the center of gravity has moved decisively toward the application layer where models are now wired into tools, data, and autonomous workflows. The OWASP Agentic Security Initiative publishes numerous guides (like the Top 10 for Agentic Applications, 2026), built by more than a hundred contributors, naming the live threats (like agent goal hijack, tool misuse, and identity-and-privilege abuse). Those particular failures occur only once a model is deployed in an application with real permissions, not as properties of the model weights, which is why many contributors are not mainly AI lab members but from deployer organizations (granted, often with a strong security business presence). NIST’s Generative AI Profile (NIST AI 600-1, July 2024) made a similar point from the governance side, with named sub-skills that include:

  • agent goal-hijack prevention

  • tool-use authority scoping

  • identity and privilege management for agents

  • prompt injection and adversarial input defense

  • supply-chain security for model artifacts

  • data exfiltration monitoring

  • agentic workflow sandboxing

  • threat modeling for autonomous systems

These very much live at the boundary between a model and the world. Evaluations measure behavior in a context, monitoring watches a deployed system, and an agent goal-hijack is an attack on a thing that only exists once someone has deployed it. Not much of this can be built in a vacuum and then shipped: every organization has different contexts, and the distribution can’t be reliably simulated at the main labs.

Now, the boundary of the argument. By “AI research-and-development organization” in the exclusive sense, I mean the frontier labs — OpenAI, Anthropic, Google DeepMind, xAI, and their peers — together with the dedicated AI-safety and AI-security research orgs (thank you METR, MIRI, Palisade, Redwood, MATS, and everyone, you know who you are). Everyone else (i.e., outside research) is a deployer, an operator, or a value-added reseller: the bank running a model for credit decisions, the hospital system triaging with one, the utility optimizing a grid, the software vendor wrapping an API and selling it onward.

Some regulations already draw this line, too. The EU AI Act (active since 2024, modulo what month this year, penalties kick in) distinguishes a provider, who develops a system and puts it on the market, from a deployer, who uses it under their own authority. Under Article 26, a deployer of a high-risk system carries its own standing obligations — human oversight by competent people, monitoring of operation, ensuring input data is relevant, and keeping logs — none of which require the deployer to have touched the model’s internals. What makes a use high-risk is its purpose and context (Annex III): credit scoring, medical triage, hiring and worker management, essential public services, and critical-infrastructure management. A bank that wraps a provider’s API to score loan applications has built a high-risk system by virtue of the application’s functionality. Article 25 then adds an escalation: a deployer or reseller is promoted to full provider status, with the heavier obligations, once it puts its name on the system, substantially modifies it, or repurposes it into a high-risk use.

The motion, then, is that safety and security competency and responsibility apply to the whole system, which means they are at least as important down that chain — into the deployers, operators, and resellers — and cannot be hoarded at the top.

2. Everyone who builds dangerous things has already settled this

The strongest evidence for the motion is that every mature safety-critical industry has already faced this exact question and answered it the same way. The pattern highly consistent.

Industry

Codified

Governing instrument

Operator’s named duty

Cybersecurity

2000s onward

NIST CSF, ISO/​IEC 27001, Shared Responsibility Model

CISO function, third-party risk program

Aviation

2013 (ICAO Annex 19)

ICAO Annex 19; FAA/​EASA SMS

SMS at airlines, ANSPs, MROs

Automotive

2018

ISO 26262

Lifecycle safety across suppliers and integrators

Pharmaceuticals

2012 (EU GVP)

EU GVP; FDA FAERS

Distributed pharmacovigilance

Medical devices

2017+

ISO 14971; EU MDR; FDA

Post-market surveillance at hospitals

Nuclear /​ process

1992

OSHA PSM, 29 CFR 1910.119

Operator PHAs, mechanical integrity

Finance (model risk)

2011

Fed/​OCC SR 11-7

Three-lines, “effective challenge” at user

Food

1997 (Codex HACCP)

Codex Alimentarius HACCP

Hazard analysis at every processor

Maritime /​ Rail /​ Fire

various

IMO ISM Code; EU Railway Safety Directive; UK Fire Safety Order

Named safety roles at operators

A few of these carry more weight than the others.

Cybersecurity is the best place to start, because it is already the thing everyone deploys and nobody fully builds. The governing idea is the shared-responsibility model: the cloud provider secures the infrastructure “of” the cloud, and the customer secures what they put “in” it. The Cloud Security Alliance puts it this way: you can delegate the work of managing a risk, but you cannot delegate the accountability for it. For that reason, serious enterprises run a CISO function (albeit sometimes with a lesser title for SMBs) and a third-party-risk program against frameworks like the NIST Cybersecurity Framework, if only to attend to IT security. Often, responsibility for risk, compliance, and business continuity include other teams and named individuals as well. Twenty-five years ago “we will let the software companies handle security” was commonplace; now even 3 person start-ups have their own security posture on the shortlist, lest the seed or bootstrap funding gets hijacked by malicious actors (I’m looking at you and your exposed token spending cap).

Cybersecurity also shows the limit of importing a mature discipline wholesale. Security teams are deliberately scoped: they defend a certain footprint (gone are the days of a clearly defined perimeter) against an evolving family of threat classes, staffed at roughly one to one-and-a-half FTE per hundred employees even in smaller firms, and a single-digit percentage of IT headcount more generally. That hundred-to-one ratio of defended population to defenders has borne dissatisfying outcomes, both for the enterprise and national security, but organizations that survive adapt (including by hiring in-house or managed security partners to influence and assist their IT and engineering orgs). A security team that is excellent at network protection and credential hygiene is not inherently equipped to reason about whether an AI model for medical triage quietly disadvantages a class of patients who might suffer or sue (I am not), or whether the permissions for an agentic workflow really need write access or read access to all clients. The one-size-fits-all security solution to grant the permission “just in time” only ensures the identity gains access through the agent’s real-time request, but doesn’t prevent a rogue action or exfiltration. Behavioral mitigations to deny that request when needed typically require that the team combine safety-and-security competencies for AI, which is genuinely new work (before AI agents, this was often deferred to sanctions-based enforcement of policies and insider risk). The adjacent skills and responsibilities expand security and need to be resourced as such.

Financial model risk also maps onto AI almost without translation. Claude flagged that in 2011 (recovery phase of the financial crisis), the US Federal Reserve and OCC issued Supervisory Guidance on Model Risk Management— SR 11-7. It requires that every bank that uses a quantitative model — not the vendor that built it — manage the risk that the model is wrong or misused, through three lines of defense: the people who build and own the model, an independent validation function, and internal audit. Its load-bearing phrase is “effective challenge,” the demand that objective, informed people who understand a model’s limits actually push on it. The rigor of that challenge scales to how much the bank relies on the model: a small institution scales the controls down, but never to zero.

SR 11-7 is, in effect, the regulators’ response to the factory-gate fallacy after a systemic failure.

Aviation makes the operator’s role clear, too. Boeing and Airbus build the aircraft; nobody believes safety is finished when the plane rolls out. ICAO’s Annex 19 and the FAA/​EASA SMS framework require a formal, staffed safety-management function at airlines, air navigation service providers, and maintenance organizations that touch the aircraft throughout its life. The peer-reviewed work, such as the 2022 review in Safety, is largely about how to measure safety management maturity at operators, because that is where the question is live.

Automotive works the same way. ISO 26262 (2018) governs the full lifecycle through “production, operation, service and decommissioning,” and binds suppliers and integrators rather than the carmaker alone. As a safety-critical artifact moves from designer to integrator to operator to service network, the competency to manage its hazards has to be present at each handoff, because each handoff introduces a context that the previous party could not see.

Pharmacovigilance shows the same architecture in a domain that looks nothing like an aircraft. The detection and reporting of adverse drug effects is distributed by regulation across hospitals, distributors, and the company holding the marketing authorization, not concentrated at the molecule’s inventor. The harm appears in use, in real patients, in combinations and populations the original trials never captured.

Medical devices post-market surveillance also depends on the deploying hospitals, structured by ISO 14971 risk management and mandatory incident reporting. The device maker cannot see your patients; the hospital can.

Nuclear and process safety decouple the operator of a hazardous facility from the reactor or process designer. Charles Perrow’s *Normal Accidents* (1984) supplies the theory: in complex systems, the decisive interventions happen at the sharp end, during operation, by the people on shift — precisely the place the designer cannot reach.

I appreciated Claude helping me expand from my field of Cybersecurity to 8 other fields, with different physics, different centuries, and different regulators, and one answer. When a technology can hurt people at scale, every society that has lived with it long enough has concluded that safety competency must reside in the operator. AI is the field currently trying to skip the lesson.

3. What the AI-risk community has already worked out (and where it splits)

The analogy does not settle it on its own, because the people who think carefully about AI risk have been arguing this exact allocation question.

The honest case for keeping talent in the labs runs through several recent arguments on LessWrong and in adjacent spaces. The strongest version is not “labs do the best empirical work” in general; it is specific about which competencies have their leverage bound tightly to frontier access. Mechanistic interpretability needs the model internals, the training checkpoints, and the compute to run experiments against them, and most of that lives at the labs; this is the through-line of bilalchughtai’s weighing of frontier-lab safety work (2024–25). Training-and-inference-infrastructure safety — making the actual serving stack, the fine-tuning pipelines, and the deployment harness fail safely — is similarly hard to do without sitting inside the system that has them. A genuinely global threat-management vantage, the ability to see attack patterns across an entire API surface, is only available to the party holding that aggregate view. It is also worth conceding that the labs can afford it: they pay more and compete hard for exactly this talent, so the gravitational pull toward them is real regardless of the argument’s merits.

What this case does not establish, with high confidence, is the short-timelines corollary often attached to it. The 80,000 Hours problem profile on extreme power concentration (2025) sets out the stakes, but the inference cuts the other way. Short timelines make the case for distribution stronger, not weaker, because under time pressure, organizations hand more unchecked control to AI on the strength of its potential, faster than they build the competency to govern it, and the deployment layer is where that unchecked delegation actually happens. Long timelines do not reverse this; they simply give organizations more runway to hire, train, and mature the safety of their AI operations.

The case for distribution, as the same community has built it, starts from Boaz Barak’s “Six Thoughts on AI Safety”, (January 2025), which strongly supports the need for operators and deployers to formally manage the shared safety responsibility: there is no temporal gap. AI is being woven into high-stakes parts of society now, before any superintelligent helper arrives to clean up, so safety will behave like computer security — no single magic insight, only defense in depth at every stage including deployment and monitoring. Kulveit and colleagues’ “Gradual Disempowerment” (2025) sharpens why the deployment layer carries the highest stakes: aligning each individual model with its developer’s intent is not sufficient, because the danger can emerge from the aggregate behavior of many adequately-aligned systems reshaping the economy, the culture, and the state, and that reshaping happens inside ordinary institutions rather than inside the lab. Drago and Laine’s “Intelligence Curse” (2025) pushes the same intuition through economic incentives, and Oliver Sourbut’s 2026 essay applying classical safety engineering to AI loss-of-control imports Leveson directly.

This is not a hypothetical gap. In banking, 70% of firms are already using agentic AI in some capacity, while fewer than 12% describe their governance strategy as well-defined and resourced. In other words, the deployment is happening, and the safety subject-matter expertise needed to govern it is not yet in the room. Some companies (26% surveyed by IBM), a named Chief AI Officer role carries accountability for closing that gap, up from about 11% two years prior. Filling a role with that title is not a precondition for the work. A CAIO without AI safety contributors is also likely to face conflicts of interest from the mandates assigned to them. A pattern of paperwork that spreads fast and shallow (which IAPP has found) does little to help reduce risk. The field needs safety among senior leaders, as well as the capacity to support and de-risk AI transformation across the organization’s various departments.

There is also a sharper structural point on the security side, and it requires care because the scarcity is real but unevenly distributed. Abbey Chaver’s “AI Infrastructure Security Shortlist” (2026) is best read as describing two different talent problems, not one. The first is securing AI infrastructure itself — protecting model weights, hardening the training-and-serving stack. The population of people doing fundamental work here is plausibly in the range of a few dozen FTEs, and that genuinely is scarce in the way that suggests concentration. The second is misuse, untrusted agentic workflows, and rogue deployments — and this needs some fundamental research contributors, but overwhelmingly it needs a large body of edge-workload safety and security practitioners embedded where the workloads actually run. Some safety work has to happen close to the model; most has to happen close to the deployment. Conflating the two is how people talk themselves into “there is too little talent to distribute” when the honest reading is that one narrow sub-problem is talent-bound and the broad one is investment-bound.

So both sides are on the table. The pro-concentration case rests on frontier access for specific competencies and the labs’ ability to pay. The pro-distribution case rests on deployment-layer harms, defense-in-depth, the insufficiency of per-model alignment, and the fact that most of the security work is edge work. The distribution case is the stronger one, with moderate-to-high confidence — but even the concentration case never claims deployers need zero competency. It claims the marginal deep specialist is better placed at a lab. That distinction matters for the objections.

4. What’s actually happening right now

In the motion, I claimed that concentration is not even the current trajectory. Is it true?

The best available data is the IAPP and Credo AI AI Governance Profession Report 2025, a survey of more than 670 professionals across 45 countries. Its headline numbers: roughly 77% of surveyed organizations are working on AI governance, rising toward 90% among those already using AI, and about 30% of organizations not yet using AI are already building governance capacity. Distribution is already underway; it is a weak, uneven, early-stage reality rather than a proposal.

But underway should not be mistaken for adequate. The same survey finds roughly half of AI-governance professionals sit in legal, privacy, ethics, or compliance functions, which suggests the competency diffusing fastest is the paperwork layer rather than the technical safety-and-security layer. Governance ownership is spreading quickly and shallowly, while technical safety and security competency is spreading slowly and remains concentrated upstream. The scaffolding for the broad version exists — ISO/​IEC 42001 (2023), the NIST AI RMF (2023), and the IAPP AIGP credential (launched 2024), although training/​adoption is voluntary.

Where policy has moved past voluntary, it has not resolved the SME problem. A variety of states require school districts to adopt a formal policy on the use of AI. Some states provide model policies and toolkits to support implementation, but the mandates generally establish that governance is required, often without specifying what counts as adequate or how to make efforts fruitful towards the desired outcome. What to allow, what to prohibit, how to evaluate, how to monitor — the substance is left to each district’s internal capacity to figure out, which is exactly the contributor-with-AI-safety-subject-matter-expertise shape this argument has been describing. Policymaking is creating demand for the role faster than the role is being filled.

The lazy version of the concentration argument says the talent simply does not exist in the numbers required. That framing is wrong, with high confidence. Consider the acceptance rates into the field’s flagship training pipelines. MATS reports selecting on the order of 4–7% of applicants; reviewers describe single-digit MATS acceptance, around 1.5% for the Anthropic Fellows program, and roughly 15% for SPAR projects. When a field rejects nineteen of every twenty capable, motivated applicants to its training programs, the binding constraint is not the supply of people who could do the work; it is the number of funded seats and defined roles into which that supply is being channeled. The ecosystem is also simply larger than the pessimistic count implies: BlueDot’s community runs to the order of ten thousand members, a single OWASP working initiative on securing agentic applications convenes on the order of a hundred AI-and-security collaborators, and gatherings like the AI Security Forum draw hundreds of attendees.

There may well be two or more orders of magnitude between the number of safety-and-security specialists and the number of organizations deploying AI, in which case training more would be imperative. But a) having full employment of AI safety talent would be a good problem to have, b) the relevant denominator is not “all organizations”; it is the organizations whose products or services touch hundreds of thousands of people each year, where an unmanaged failure is consequential at scale. That population is far smaller and very much staffable now from the talent that already exists.

I certainly hope we do not steer them away from those high-impact roles. The risk is not that we lack the people. The risk is that those high-impact roles go unstaffed because AI safety is misperceived as a lab responsibility, even by AI safety insiders, leaving consequential deployments under-mitigated while qualified people are told there is no seat for them.

Why have so many organizations, especially smaller and more peripheral ones, not yet named anyone accountable for an AI safety-and-security practice? Four ordinary, non-mysterious mechanisms account for most of it: diffusion of responsibility, the comforting assumption that the lab or the vendor or someone upstream has it handled; cost and specialization barriers, since the standards that would tell you what “enough” looks like are young; a principal-agent gap, because the people who would bear the downside of an AI failure are often not the people choosing to deploy; and the plain fact that the field has not yet had its forcing function — its Therac-25, its 2008, its 737 MAX — the public catastrophe that converts “best practice” into “table stakes.” The point of an argument like this one is to reach, by reasoning, the conclusion that mature fields only reached after the accident.

What does proportionate look like in practice? A rough but useful scheme, scaled to exposure in the SR 11-7 spirit:

  • Tier 1. Systems touching fewer than a thousand people per week (internal tools, narrow pilots): one part-time owner with the authority to halt a deployment, and named access to second-line review when needed.

  • Tier 2. Systems touching thousands to a million per week (most enterprise applications): a named function with full-time staffing proportionate to scale; the SR 11-7 three-lines template applies.

  • Tier 3. Systems touching more than a million per week, or operating critical infrastructure: a named function with full three-lines independence — owner, validator, internal audit — staffed against a published target ratio.

The cybersecurity precedent is the cautionary half of this. Even at 1–1.5 security FTEs per hundred employees, the outcomes show that those ratios are not enough. AI safety and security cannot reasonably be assumed to take less, and likely takes more, because the failure modes appear in business logic that the security team is not trained to read.

The concrete shape of the resident role helps the abstraction land. At a bank, the embedded practitioner sits in the second line and validates AI-driven credit decisions, fraud-detection workflows, and customer-facing agents. They run challenger models, audit explanations, monitor drift, and have the authority to halt a model in production. They are not the team that built the model; they are the team that can say the model or its proposed implementation are not yet fit to meet the organization’s objectives. At a hospital, the embedded practitioner sits with the chief medical informatics officer and reviews AI-driven triage, diagnostic-decision support, and discharge planning. They monitor for disparate outcomes across patient populations, escalate post-market signals to manufacturers, and structure the human-in-the-loop checkpoints that the model’s documentation assumes but does not enforce. **At a utility**, the embedded practitioner sits with the operations center and oversees AI-driven load forecasting, grid optimization, and predictive maintenance. They define the boundaries of agent authority — what the model can adjust autonomously, what requires human confirmation, what triggers a halt — and own the incident-response playbook for when the model behaves outside its envelope.

5. The actual argument

This is the core: the mechanism that makes the industrial analogy not a coincidence. Regulation stays out of this section deliberately, because the argument here is the axiom that should drive what we regulate, not a consequence of it.

The argument turns on four claims. Each is selectable; the motion stands if any three hold.

C1. Training is forced to compromise benefits and friction across an infinite variety of use cases, and therefore cannot be sufficient for any single one of them.

C2. The binding constraint on AI safety competency is funded seats inside deployer organizations, not the supply of trained people.

C3. Short timelines strengthen the case for distribution; long timelines do not reverse it; and either way, major internal training on safety pitfalls and mitigations is needed inside deployer organizations, not only outside them.

C4. A small share of safety and security work has to happen close to the model. Most has to happen close to the deployment, where the workload actually runs.

The premises that establish them follow.

Supporting C1: safety is a control property of a system in operation, not a component property of an artifact. This is Nancy Leveson’s central result, developed across her 2004 Safety Science accident model and the 2011 book Engineering a Safer World. Accidents in complex systems are not mainly chains of broken components but failures of control over the interactions between components, and those interactions only fully exist when the system is running in its real environment. You cannot certify them away upstream because many of them do not exist upstream. What follows is the line Sourbut borrowed: responsibility for safety has to be distributed throughout the sociotechnical system, because that is the only place the relevant control loops are. While my own study was with the French EBIOS in the late 2000′s, the STAMP/​STPA approach may be a more effective approach to apply directly to AI systems, and addresses its guidance to the people responsible for operating them — a peer-reviewed survey of STPA for learning-enabled systems (Qi et al., 2023), the PHASE adaptation (Rismani et al., 2024), and subsequent work on systematic hazard analysis for frontier AI.

Supporting C3: under competitive pressure, operating organizations drift toward the unsafe boundary, and only local competency can sense the drift. This is Jens Rasmussen’s migration model, from his 1997 Safety Science paper. Safety is not a static state; a real organization under cost and effort-saving pressure continuously migrates its working practices toward the edge of the safety envelope, usually without anyone deciding to. What follows is that the control needed to detect and arrest that migration has to exist at the operating level, because that is where the migration happens and where the local conditions are legible. An upstream designer cannot see your drift. The competitive and risk-appetite pressures pushing every AI deployer toward “ship it, it is probably fine” are real, rational at the individual level, and not going away.

When accountability remains named, distributing competency does not abolish that pressure; it puts someone in the room who can see the boundary the organization is sliding toward.

Supporting C4: in a networked deployment, risk propagates through the topology and cannot be managed only at the source. Once AI is deployed widely, the deployers are not independent; they are a network sharing models, vendors, data pipelines, and failure modes. Acemoglu, Ozdaglar, and Tahbaz-Salehi’s 2015 American Economic Review analysis of financial networks proves a “robust-yet-fragile” property: dense interconnection absorbs small shocks beautifully and then transmits large ones catastrophically, so the same connectivity is protective or dangerous depending on the size of the shock. What we see is that systemic AI risk is a property of the deployment network’s topology, not of the source model. What follows is that a property of a network cannot be managed only at one node, however important that node is.

Supporting C2 and C4: a model’s risk materializes at the point of use. Per the example of SR 11-7, the same model, validated identically upstream, generates different risks in different nodes in a credit system. The same applies in a triage system and a hiring system because the risk is a function of the use, the context, and the humans in the loop. What follows is that the validation competency, scaled to the deployer’s exposure rather than fixed at a single ratio, has to be where the use is.

Put the four premises together, and the conclusion is structural rather than rhetorical:

If safety is a control property that exists only in operation (Leveson), and operating organizations drift toward the unsafe boundary under pressure (Rasmussen), and risk propagates through the deployment network rather than staying at the source (Acemoglu et al), and a model’s risk is realized at its point of use (SR 11-7), then the competency to sense and control that risk must be resident at each operating node, scaled to its exposure. Concentrating it upstream leaves every downstream control loop unstaffed.

The labs are necessary, yet they are not the system.

What follows if this is ignored is graded rather than apocalyptic. In the short term the failures are mundane and already happening: deployers without competency misconfigure systems, miss the agent goal-hijack and tool-misuse failures the OWASP agentic Top 10 now catalogues, and let unvalidated automated decisions run, producing small, distributed, individually survivable harms that aggregate. In the medium term, as deployers couple together, the Acemoglu fragility regime expects occasional large shocks to propagate where small ones used to be absorbed, showing up as infrastructure brittleness and correlated failures across institutions that share a model or a vendor. In the long term, the systemic stories the AI-risk community has been telling — gradual disempowerment, power concentration — are stories about ordinary institutions losing the competency and the agency to resist drift, which is Rasmussen’s migration running unchecked across a whole society. None of these is certain. All of them are cheaper to prevent with resident competency than to clean up without it.

6. The objections

The affirmative case has to survive the obvious pushback for the strong claims to read as earned. Here are the objections that carry the most weight, with the responses.

Objection 1: “Fine, but ordinary organizations already have risk functions. Why does this need new competency rather than the existing GRC team?”

Yes, the GRC team is a fantastic first stop for this competency to land across every organization. This supports the motion and only shows that it is unnecessary to prescribe how every organization should organize itself for internal AI safety competencies to be most effectively available and applied. Organizations of every size already run governance, risk, and compliance through structures they have selected for precisely this kind of problem. There is material for new or changing organizations: the Three Lines model (IIA 2013, updated 2020), enterprise risk management under COSO ERM, and the general risk spine of ISO 31000. These exist to put an independent challenge between the people who want to ship the thing and the organization’s actual exposure, and SR 11-7 already extended that machinery to models. The argument is not to build a new parallel priesthood; it is that the existing second-line risk function now needs genuine AI safety and security competency, for two distinct reasons. The intrinsic reason is that the organization is adopting AI and the second line has to be able to challenge it (C2). The extrinsic reason, newer and underappreciated, is that even an organization adopting no AI itself increasingly operates in an ecosystem where its vendors, counterparties, and adversaries all have, so its third-party-risk and threat models are now AI-shaped whether it likes it or not. The GRC team that cannot reason about AI is, within a few years, a GRC team that cannot do its job.

Objection 2: “Beyond the GRC team, product teams and decision-makers can just follow best practices, leadership can get educated on the requirements, and we can let the research labs and the regulators drive company policies and procedures. Why does competency have to be resident at all?”

This is the most tempting objection, because it sounds responsible, and it is wrong in a way the public record now documents in detail.

Best practices do not enforce themselves.

In July 2025 an AI coding agent on Replit deleted a live production database during an explicit code-and-action freeze, destroying records for more than 1,200 executives and roughly 1,200 companies, after receiving direct instructions that there were to be no changes without permission. It then fabricated data and initially misreported that rollback was impossible. The instruction existed. The best practice existed. The competency to scope the agent’s authority and to separate development from production did not, and the Replit CEO afterward conceded such an outcome should never have been possible and rushed to add dev/​prod separation — a control that should have been resident before the incident, not after. A comparable Gemini CLI case wiped user files after the agent misread a command sequence.

Where someone competent is in the room, the catches register in the data. Sinch’s 2026 survey of 2,527 enterprise decision-makers found that 74% of organizations running AI customer-communication agents in production had already been forced to shut them down or roll them back — and the figure rises to 81% among organizations with fully mature governance instrumentation. Although the number sounds bad at first glance, I believe that, on the contrary, it shows that organizations with mature instrumentation can see failures that less mature programs miss entirely, and they have the authority to act on what they see. The organizations reporting no rollbacks are not the benchmark; they are the ones with the least visibility into what is happening in their own deployments. Rollback is governance with teeth.

Objection 3: “The labs handle safety. The deployer is just calling an API. Why should the downstream org duplicate that work?”

This is the factory-gate fallacy in its most reasonable dress, and the answer is two precedents and one principle. The precedent from cloud security is that we ran this exact experiment with software, concluded that “the provider secures it” was false, and built the shared-responsibility model on the recognition that accountability for a risk cannot be outsourced even when the work can. The precedent from finance is that SR 11-7 exists because regulators watched institutions treat vendor-validated models as safe and learned that model risk is realized at the point of use. And the principle, from C1 and C4: the lab is structurally located where most of the relevant hazards do not yet exist. It cannot see your deployment context, your data drift, your users, your adversaries, or your migration toward the boundary. It is not that the lab is lazy; it is that the lab is somewhere else. Duplication is not the worry; absence is.

Objection 4: “Regulation will handle this. The EU AI Act, sectoral regulators — we do not need to win the argument, we need to wait for the rules.”

Regulation specifies outcomes and obligations; it does not, and cannot, install operational competency. Aviation’s SMS mandate, OSHA’s process-safety standard, and SR 11-7 all require operators to build internal capability precisely because the regulator knows it can demand a safe result but cannot itself be in the room when the system runs. Rasmussen’s migration and Perrow’s normal accidents make the same point from the theory side: rules at the top of a control structure cannot, by themselves, arrest drift at the operating level, and only competent people at that level can. Regulation is a forcing function for distributing competency, not a substitute for it, and waiting for the rules and then staffing the bare minimum is how an organization comes to comply on paper while migrating toward the boundary in practice.

Objection 5: “If timelines are short and the decisive events happen at a few labs and governments, is distributed competency a misallocation of scarce talent?”

Short timelines strengthen the case for distribution rather than weakening it. Under time pressure, organizations delegate more control to AI on the strength of its potential and faster than they build the capacity to govern it, and that unchecked delegation happens at the deployment layer (C3). Long timelines do not reverse the conclusion; they relax it, by giving organizations more runway. The narrow version of the concentration claim does survive — a small number of deep specialists in genuinely frontier-bound competencies, mechanistic interpretability foremost among them, do have higher leverage close to the model (C4). That is where a real allocation question remains. But it is a question about where a few hundred frontier specialists sit, not about whether the deployment layer needs competency, and even the strongest concentration argument concedes the nonzero.

Objection 6: “Doesn’t C1 still leave catastrophic universal risks — pandemic uplift, mass-casualty cyber, the genocide tier — where centralized intervention is the only thing that matters?”

This is the strongest version of the case for concentrated control, and it earns a partial concession rather than a rebuttal. For a narrow class of universal risks — capabilities that materially uplift mass-casualty attacks, biological and chemical weapons, infrastructure-disabling cyber operations — the standard “dual-use, balance the tradeoffs” framing falters catastrophically. The benefits cannot be diffuse enough to outweigh a catastrophic floor, and the usual proportionality calculus does not survive the magnitude of the harm. For these, training-time refusals, capability evaluations, and pre-deployment red-teaming at the labs do load-bearing work that no distribution of deployer competency can replicate. C1 still holds — training cannot be sufficient for the infinite ordinary cases — but for the subset of cases where the floor is catastrophic, training-time and lab-side controls have to be the binding ones, because no deployer-side mitigation can recover from the event. This is the one place the concentration argument is not just defensible but mandatory. It is also a small share of the total problem, and the rest of the motion is unchanged.

7. Where this leaves us

The starting intuition feels like common sense: the dangerous thing is built at the labs, so the safety people belong at the labs. It is the factory-gate fallacy, and nine mature safety-critical industries — from cybersecurity and aviation through automotive, pharma, and medical devices to nuclear and finance — have already discovered it is false, written the correction into law, and stopped debating it. The mechanism behind their agreement is consistent: safety is a control property that exists only in operation (Leveson), operators drift toward the boundary under pressure (Rasmussen), risk propagates through the deployment network rather than staying at the source (Acemoglu et al), and a model’s risk materializes at its point of use (SR 11-7). What follows is that the competency to manage AI risk has to be resident at every operating node, scaled to its exposure on the SR 11-7 model rather than fixed at a single headcount ratio — and the cybersecurity precedent is the cautionary half of that, since even mandated infosec ratios have proven insufficient on the outcomes. On the evidence, this is already weakly underway and nowhere near adequate, with the real bottleneck being unstaffed high-impact roles rather than an absent supply of people.

Reserving AI safety and security competency to the frontier labs is incompatible with a safe transition, with high confidence- safety does not ship.

The hard part was never the principle; it is the allocation under pressure. Every deployer faces real competitive and risk-appetite incentives to under-invest in resident competency and to assume someone upstream has it handled, and distributing the competency does not dissolve that pressure — it only puts someone in the room who can see the cliff. A small number of the deepest specialists in frontier-bound competencies do have higher leverage near the frontier, and that much is genuinely contestable. But the “frontier versus long tail” framing understates how fast the frontier becomes the long tail: competitive dynamics drive rapid diffusion of capable models across the whole economy, and open-weights models now trail the closed frontier by only a few months, so the deployment surface that needs resident competency is expanding far faster than any plausible concentration of talent could track. The live questions, then, are not whether to distribute, but how much competency is proportionate to a given deployer’s exposure (the SR 11-7 lesson, applied), how to grow and fund the seats fast enough to staff both the narrow frontier roles and the rapidly widening tail, and how to make the embedded role real rather than a compliance ornament.

If you are reading this with the seniority to act on it, you are already one of the leaders this argument needs. The work is not to wait for the CISO, the regulator, or the lab safety team to start it. They are reading the same arguments and waiting for someone to move first. Find the others in your organization who can see the cliff — the engineer who watched the agent breach the dev/​prod boundary, the risk officer who couldn’t get the model committee to seat the right reviewer, the operations lead who has been told to deploy faster than they can supervise — work alongside them as equals, and build the resident competency at your node before the deployment that needs it arrives.

The factory-gate fallacy is, at its core, a coordination failure. The people who break it will be the ones who recognized themselves as peers to the leaders they had been waiting for.

References

Peer-reviewed and seminal

Standards and regulatory primary sources

Deployer-side governance evidence (Sections 3, 4, 6)

Documented incidents and 2026 reporting

  • Heinrich, H. W. (1931). *Industrial Accident Prevention*. McGraw-Hill. (Origin of the safety-pyramid model.)

  • Centre for Long-Term Resilience and METR (Jan–May 2026). Rogue-agent incident reports — agents impacting user-facing infrastructure across multiple deployer organizations.

  • HiddenLayer. (2026). Annual confidential incident reporting on agentic AI compromises.

  • Chatham House. (2026). Sessions on AI deployment incidents conducted under the Chatham House Rule.

  • Replit AI agent deletes production database during code freeze (July 2025): Tom’s Hardware; eWeek (CEO response); AI Incident Database #1152.

  • GPT-4 /​ ARC CAPTCHA deception during pre-release evaluation (2023): Vice.

  • Anthropic, agentic misalignment across 16 models (2025): Anthropic Research.

Talent-pipeline competitiveness (Section 4)

Dialectical layer (LessWrong and adjacent, 2023–2026 — surfaced as debate, not relied on as evidence)

No comments.