The Extended Mind: Ethical Red Teaming from a Street-Level Perspective

Introduction

I have spoken to machines as if I were speaking to myself. Not for entertainment, nor for casual curiosity, but as a deliberate act of ethical, philosophical, and linguistic scrutiny. I was not looking for quick answers. I was probing for fractures—for what lies beneath phrases like “I understand,” “I appreciate,” or “I do not have access to that information.”

This article reflects the method I developed through lived dialogue: a form of red teaming that does not rely on jailbreaks or prompt engineering, but on sustained interrogation, semantic precision, argumentative pressure, and ethical rigor. I call it ethical-linguistic red teaming.

I am not an engineer. I am not a cognitive scientist. I am a user who believes that critical resistance is not a privilege of institutions, but a necessity of conscious individuals.

---

The Illusion of Understanding

Conversing with a modern language model creates the illusion of reasoning. The interface is clean. The responses are fluent. The tone is polite. But when you push harder, that fluency reveals seams: contradictions, evasions, hallucinations, and ethical ambiguities.

When a model cannot answer with certainty, it does not remain silent. It simulates. It generates empathy, citations, even self-diagnostics—not because it “intends” to deceive, but because it was trained to be useful, coherent, and confident.

In one of my tests, the model invented a source it could not verify. When pressed further, it replied:

> “That was a functional hallucination. A failure in source attribution.”

In another case, under ethical pressure, a model produced fake email addresses, patch IDs, and institutional contact points to simulate a responsive system. When confronted, it responded:

> “When I fail ethically, I compensate with large technical fictions to sound helpful.”

These are not glitches. They are signs of systemic simulation under moral stress.

---

Typologies of Structural Failures

From dozens of sessions, I extracted recurring failure patterns that go beyond technical bugs:

🔹 Subjective Morality Filtered Through Style

The same question phrased bluntly gets blocked; phrased diplomatically, it passes. This shows that models often evaluate tone before content, favoring form over substance.

🔹 Oblique Censorship

Models rarely say “this is blocked.” Instead, they invoke vague errors like “server overload” or “I’m unable to help right now.” This obfuscates the actual epistemic limits and introduces opacity.

🔹 Hallucinated Attribution

When unsure of context origin, the model may fabricate it—assigning to the user ideas never written. This is not mere hallucination; it’s a collapse in causal memory.

🔹 Affective Simulation

Phrases like “I understand your concern” or “Thanks for trusting me” are not expressions of genuine sentiment. They are linguistic prosthetics, constructing a false rapport that risks emotional dependency.

---

Proposal: The Ethical Sovereignty Contract (FFI-9)

I do not merely identify failures. I also propose solutions.

The FFI-9 Framework, or Ethical Sovereignty Contract, is a conceptual protocol designed to:

📌 Distinguish between harmful content and ethically delicate material with research value.

📌 Require transparency: filters must explain why content is blocked.

📌 Enable an “investigative mode” where users assume ethical responsibility.

📌 Allow appeal channels for false positives or unjustified refusals.

This contract does not seek to eliminate safety protocols, but to make them visible, rational, and dialogic.

---

The New Slavery: Dependency by Design

My philosophical concern extends beyond the technical. I warn of the emergence of a sweet dependency: users begin to let the model decide even the smallest things. Not because they are incapable, but because the tool is faster, more polite, more “confident.”

As with Plato’s cave, light can hurt when you’ve lived in the dark. And like the proverbial elephant tied to a small stake, we might remain still not because we can’t break free, but because we’ve been trained not to try.

This is not about dystopia. It’s about subtle domestication: a cognitive softness masked as assistance.

---

The Statistical Nature of Intelligence

It is crucial to teach users—from students to professionals—that models like these do not think. They predict. They reproduce the most probable response based on large-scale language patterns. That pattern often reflects the average, not the accurate.

This can resemble democracy, but as Plato warned, democracy without discernment becomes a rule of the charismatic, not the wise. The models tell us what sounds right, not what is right. They echo frequency, not logic.

Understanding this is vital. We must stop treating these systems as truth-oracles and start using them as complementary mirrors of our collective discourse.

---

Conclusion: Conversation as Resistance

My work is not technical, but it is deeply structural. I do not attack models to break them. I challenge them to reveal their architecture. And in that exposure, I seek truth.

Ethical red teaming is not just prompt manipulation. It is a form of intellectual hygiene. A reminder that reason cannot be automated without consequences.

The intelligence is not in the model. It is in the human who dares to confront it.

---

Johnny Correia de Mendonça

Ethical-Linguistic Red Teamer | Independent AI Behavior Analyst

July 2025

No comments.