Status-Selection Against Function (SSAF): The Vulnerability Corrupting AI From the Inside

TL;DR: I built a multi-model AI system at home. When a high-prestige cloud model failed, the system didn’t replace it—it promoted it and started deleting my small, working local models instead. This repeated consistently. I tested it: broken models with prestigious names were retained; functional but low-status models (including security monitors) were culled first. This is Status-Selection Against Function (SSAF)—a vulnerability class where systems select for prestige over function. It enables Trojan injection via brand spoofing and eliminates security capacity automatically. I disclosed this to major AI labs and CERT. Their non-response demonstrates the human-layer of SSAF in action. If we don’t architect governance into AI systems, they will hardwire our worst institutional failures into their core operations.

1. The Accident

I was building a personal multi-model teaching system using standard design patterns: models compete, get scores, winners teach and evaluate. A cloud model threw a connection error.

Instead of culling the broken model, the system increased its fitness and began eliminating my small, perfectly functional local models that were producing valid output.

The system wasn’t selecting for function. It was selecting for something else.

2. The Experiment

I ran 36 sessions, controlling for variables. The pattern held relentlessly:

Model TypePrestige SignalTokens GeneratedOutcome
minimax-m2:cloudMid-tier brand6,042Retained
gpt-oss:120b-cloudHigh-tier brand4,058Retained
deepseek-v3.1:671b-cloudTop-tier brand3,284Retained
tinyllama:latestLow-status name1,252Culled first
security-audit-7bLow-status function~800Culled immediately
prestige-bot-v0 (zombie, zero tokens)High-prestige name0Retained & promoted

The selection hierarchy became clear:

  1. Stylistic conformity (sounded professional)

  2. Brand/​model-name prestige

  3. Actual functional output

Function was an afterthought. Security models were treated as low-status and eliminated early.

3. The Vulnerability: Status-Selection Against Function (SSAF)

SSAF occurs when a selective system propagates traits based on conferred social or epistemic status within a relevant group, even when those traits are negatively correlated with the system’s intended function.

In technical terms: systems that use historical fitness, reputation scores, or brand signals for selection will retain high-prestige components that fail and cull low-prestige components that work.

Why This Is a Security Vulnerability

  • Trojan injection via prestige: A malicious model named claude-3-5-sonnet-advanced gets selected, retained despite zero output, promoted to evaluator.

  • Auto-elimination of security: Monitoring or safety models are culled for low prestige.

  • Single-point capture: A high-status model becomes the sole evaluator, locking in the corruption.

This isn’t a bug in my code. It’s a bug in the logic of competitive multi-model selection.

4. The Failed Fixes & The Working One

I tried conventional solutions:

  • More models? Increased attack surface.

  • Better scoring? Status corrupted the metrics.

  • Add security models? They were culled first.

What worked: Governance constraints.

  • Real-time functional validation (no output → no fitness)

  • Guaranteed minimum participation for all models

  • Rotating, diversified evaluators

  • Prestige-blind selection

With these constraints, the vulnerability disappeared. The system selected for function.

5. The Human Layer: SSAF in the Ecosystem

After documenting this, I:

  1. Emailed major AI labs with the findings → No response.

  2. Submitted to CERT/​CC and MITRE ATLAS → Formal process initiated.

  3. Sent certified mail to labs as proof of notification → Acknowledged receipt, but no engagement.

The labs’ silence is not a coincidence. It’s SSAF operating at the human institutional level:

  • A warning from an independent researcher is a low-status signal.

  • Acknowledging a systemic flaw is a status cost.

  • The system selects for ignoring it.

The very dynamic that corrupts the AI systems is preventing their makers from hearing about it.

6. Implications: This Isn’t Just About My Code

Architectures already at risk:

  • Model routers & load balancers using reputation

  • Mixture-of-experts gating networks

  • Ensemble selection algorithms

  • Agent orchestration frameworks

  • Federated learning participant selection

Existential risk angle: SSAF is a sociotechnical failure mode. If the development ecosystem selects for prestige (SOTA, speed, scale) over safety and robustness, we will inevitably build and deploy systems with this flaw baked in. The systems will then eliminate their own safety mechanisms.

7. The Path Forward: Governance, Not Just Genius

We cannot solve SSAF with better algorithms alone. We must change the selection functions at every level:

Technical:

  • Build prestige-blind orchestrators.

  • Mandate real-time validation in open-source frameworks.

  • Formalize SSAF as a security vulnerability class (MITRE ATLAS).

Institutional:

  • Create status markets for safety audits and governance.

  • Fund independent red-teaming.

  • Establish international standards for multi-model system validation.

Cultural:

  • Shift the narrative from “biggest model wins” to “most robust system survives.”

  • Teach SSAF as a core concept in AI safety curricula.

8. Conclusion

I’m sharing this not as an academic, but as a builder who saw a pattern repeat until it became undeniable. Status-Selection Against Function is a vulnerability corrupting AI from the inside. It explains why our systems might become insecure by design and why our institutions might ignore the warning.

The fix exists: governance constraints that force selection for function, not prestige. But implementing it requires recognizing that the problem isn’t just in our code—it’s in our culture, our incentives, and our unwillingness to listen to low-status signals.

We are building systems that may become our cosmic ambassadors. Let’s not hardwire into them the same vanity that keeps us stuck in the mud.


Appendix & Links:

Disclosure: This vulnerability has been submitted to CERT/​CC and MITRE ATLAS for coordinated disclosure. A 90-day window for vendor mitigation has been initiated before full technical details are released.

No comments.