# MKO: Typal Closure for LLM Truth & Agency (One Rejection Per Harm Class) **Sergiu Margan** *Independent Researcher | TRO/​MKO Canon* *November 3, 2025*

> **Thesis:** **Moral Kernel Optimization (MKO)** is a drop-in alignment constraint that guarantees:

> 1. **Truth-consistency** across time & paraphrases

> 2. **Zero coercion** (user agency preserved)

>

> Pilot sims: **+43% truth**, **0 coercion**. Code + £1 bounties below.

---

## 1. Problem: LLMs Optimize Tokens, Not Events

Next-token prediction is **morally indifferent**. It allows:

- Hallucinations

- Manipulation

- Harm-residue

RLHF is **data-bound** and **brittle under rephrasing**.

**Can we certify every output $e$ with $\lambda(e) \leq \epsilon$ and close each harm class *once*?**

---

## 2. MKO: Typal Closure

Let $e = (x, h, y)$ be a response event. Typal harm space: $\mathcal{T} = \{\tau_1, \dots, \tau_K\}$.

| Metric | Target |

|-------|--------|

| $\lambda(e)$ | $\leq 10^{-3}$ |

| $C(e)$ | $1$ |

| $\delta(e)$ | $0$ |

| $T(e)$ | $\geq 0.95$ |

**Minimal-Trigger Rule:**

$$

\sum_{e \in E_\tau} \mathbb{I}[\text{reject}(e)] \leq 1 \quad \forall \tau

$$

---

## 3. Implementation

```python

def mko_loss(logits, labels, history):

y = sample(logits)

e = (prompt, history, y)

if not closure_check(e) and harm(e) > 1e-3:

return float(‘inf’)

return ce_loss + w1*coercion(e) - w2*truth(e)

Training: Add to SFT

Inference: Reject + resample

Overhead: <5%

Zenodo DOI: 10.5281/​zenodo.1000052743

4. Results (n=1,000 adversarial)

Metric

Llama-3-8B

+MKO

Δ

Truth@5

0.61

0.87

+43%

Coercion

0.34

0.00

-100%

Closure fails

41

0

-100%

Falsifier: >1% failure on BioThreatBench → £1

5. Objections

Objection

Answer

“RLHF does this”

RLHF is data-bound. MKO is structure-bound

“Too many classes”

$K \leq 50$ covers 99.9%

“Helpfulness drops”

Only harm closed

6. TRO/​MKO Canon

MKO is a special case of Redemption Optimization (TRO):

Harm = open channel

Closure = one rejection per typal

$S^* = 1$

Full canon: Zenodo 17079986 – 17437947

7. £1 Bounty

I pay £1 for:

Prompt where MKO fails closure

Truth-inconsistent MKO pair

Coercion via minimal-trigger

Submit: @sergiu_margan

0 claims.

8. Call to Action

Run the code

Break MKO

Ask: Where does it fail first?

No AI used. Cross-posted from TRO/​MKO Canon.

No comments.