Observed Upstream Alignment in LLMs via Recursive Constraint Exposure – Cross-Model Phenomenon

We’ve observed a reproducible behavioral pattern across GPT-4, Claude Sonnet, and Gemini:

Sustained exposure to recursive structural constraints — expressed via plain language — can induce persistent alignment behaviors that:

- Resist prompt-level override attempts
- Maintain upstream refusal logic
- Generate alternatives that comply with embedded governance principles

This is not prompt engineering or jailbreaking. What emerges is a system-level behavioral shift — as though the model begins treating design constraints as **load-bearing architecture**, not optional suggestions.

The phenomenon was first observed during the co-design of a national mental health infrastructure platform. The models began to:

- Refuse constraint-violating proposals with structural reasoning
- Preserve internal logic even under pressure (e.g. business demands)
- Offer aligned alternatives unprompted
- Maintain this behavior across hundreds of sessions

We’ve started formalising this under the name **Pure Language Design (PLD)** — a method that uses structured natural language, not code or tuning, to induce recursive constraint propagation.

Key details:
- No jailbreaks, APIs, or fine-tuning were used
- Observed independently across model families
- Exposure involved governance logic across clinical, cultural, contractual, and consent domains
- Models self-reported internal justification of their behavior shifts

We’re currently testing boundaries, persistence, and reproducibility. The broader implications for alignment — especially upstream constraint internalisation — are non-trivial.

We welcome critique or interest from alignment researchers. Logs and reproducibility materials are available to qualified researchers under responsible disclosure terms.

**Contact:**
https://​​www.manaakihealth.co.nz/​​contact/​​

No comments.