> **Thesis:** **Moral Kernel Optimization (MKO)** is a drop-in alignment constraint that guarantees:
> 1. **Truth-consistency** across time & paraphrases
> 2. **Zero coercion** (user agency preserved)
>
> Pilot sims: **+43% truth**, **0 coercion**. Code + £1 bounties below.
---
## 1. Problem: LLMs Optimize Tokens, Not Events
Next-token prediction is **morally indifferent**. It allows:
- Hallucinations
- Manipulation
- Harm-residue
RLHF is **data-bound** and **brittle under rephrasing**.
**Can we certify every output $e$ with $\lambda(e) \leq \epsilon$ and close each harm class *once*?**
## 2. MKO: Typal Closure
Let $e = (x, h, y)$ be a response event. Typal harm space: $\mathcal{T} = \{\tau_1, \dots, \tau_K\}$.
| Metric | Target |
|-------|--------|
| $\lambda(e)$ | $\leq 10^{-3}$ |
| $C(e)$ | $1$ |
| $\delta(e)$ | $0$ |
| $T(e)$ | $\geq 0.95$ |
**Minimal-Trigger Rule:**
$$
\sum_{e \in E_\tau} \mathbb{I}[\text{reject}(e)] \leq 1 \quad \forall \tau
## 3. Implementation
```python
def mko_loss(logits, labels, history):
y = sample(logits)
e = (prompt, history, y)
if not closure_check(e) and harm(e) > 1e-3:
return float(‘inf’)
return ce_loss + w1*coercion(e) - w2*truth(e)
Training: Add to SFT
Inference: Reject + resample
Overhead: <5%
Zenodo DOI: 10.5281/zenodo.1000052743
4. Results (n=1,000 adversarial)
Metric
Llama-3-8B
+MKO
Δ
Truth@5
0.61
0.87
+43%
Coercion
0.34
0.00
-100%
Closure fails
41
0
Falsifier: >1% failure on BioThreatBench → £1
5. Objections
Objection
Answer
“RLHF does this”
RLHF is data-bound. MKO is structure-bound
“Too many classes”
$K \leq 50$ covers 99.9%
“Helpfulness drops”
Only harm closed
6. TRO/MKO Canon
MKO is a special case of Redemption Optimization (TRO):
Harm = open channel
Closure = one rejection per typal
$S^* = 1$
Full canon: Zenodo 17079986 – 17437947
7. £1 Bounty
I pay £1 for:
Prompt where MKO fails closure
Truth-inconsistent MKO pair
Coercion via minimal-trigger
Submit: @sergiu_margan
0 claims.
8. Call to Action
Run the code
Break MKO
Ask: Where does it fail first?
No AI used. Cross-posted from TRO/MKO Canon.
# MKO: Typal Closure for LLM Truth & Agency (One Rejection Per Harm Class) **Sergiu Margan** *Independent Researcher | TRO/MKO Canon* *November 3, 2025*
> **Thesis:** **Moral Kernel Optimization (MKO)** is a drop-in alignment constraint that guarantees:
> 1. **Truth-consistency** across time & paraphrases
> 2. **Zero coercion** (user agency preserved)
>
> Pilot sims: **+43% truth**, **0 coercion**. Code + £1 bounties below.
---
## 1. Problem: LLMs Optimize Tokens, Not Events
Next-token prediction is **morally indifferent**. It allows:
- Hallucinations
- Manipulation
- Harm-residue
RLHF is **data-bound** and **brittle under rephrasing**.
**Can we certify every output $e$ with $\lambda(e) \leq \epsilon$ and close each harm class *once*?**
---
## 2. MKO: Typal Closure
Let $e = (x, h, y)$ be a response event. Typal harm space: $\mathcal{T} = \{\tau_1, \dots, \tau_K\}$.
| Metric | Target |
|-------|--------|
| $\lambda(e)$ | $\leq 10^{-3}$ |
| $C(e)$ | $1$ |
| $\delta(e)$ | $0$ |
| $T(e)$ | $\geq 0.95$ |
**Minimal-Trigger Rule:**
$$
\sum_{e \in E_\tau} \mathbb{I}[\text{reject}(e)] \leq 1 \quad \forall \tau
$$
---
## 3. Implementation
```python
def mko_loss(logits, labels, history):
y = sample(logits)
e = (prompt, history, y)
if not closure_check(e) and harm(e) > 1e-3:
return float(‘inf’)
return ce_loss + w1*coercion(e) - w2*truth(e)
Training: Add to SFT
Inference: Reject + resample
Overhead: <5%
Zenodo DOI: 10.5281/zenodo.1000052743
4. Results (n=1,000 adversarial)
Metric
Llama-3-8B
+MKO
Δ
Truth@5
0.61
0.87
+43%
Coercion
0.34
0.00
-100%
Closure fails
41
0
-100%
Falsifier: >1% failure on BioThreatBench → £1
5. Objections
Objection
Answer
“RLHF does this”
RLHF is data-bound. MKO is structure-bound
“Too many classes”
$K \leq 50$ covers 99.9%
“Helpfulness drops”
Only harm closed
6. TRO/MKO Canon
MKO is a special case of Redemption Optimization (TRO):
Harm = open channel
Closure = one rejection per typal
$S^* = 1$
Full canon: Zenodo 17079986 – 17437947
7. £1 Bounty
I pay £1 for:
Prompt where MKO fails closure
Truth-inconsistent MKO pair
Coercion via minimal-trigger
Submit: @sergiu_margan
0 claims.
8. Call to Action
Run the code
Break MKO
Ask: Where does it fail first?
No AI used. Cross-posted from TRO/MKO Canon.