IIT's Φ Explains Nothing: A Geometric Alternative Predicts 84.6% of Cognitive Performance Variance

I ran a falsification study to test whether causal complexity (CC) or geometric dimensionality (d) better predicts cognitive performance. I also included an operational measure of integrated information (Φop) as a third predictor.

The results were unambiguous:

| Predictor | Pearson r | R² | p-value |

| Dimensionality (d) | 0.920 | 0.846 | < 0.001 |
| Causal Complexity (CC) | 0.458 | 0.210 | 0.0002 |
| Integrated Info (Φop) | −0.01 | < 0.001 | > 0.05 |

Φop explains nothing. Dimensionality explains 84.6% of variance. All three a priori falsification criteria favoring IIT failed.

The experiment

27 tri-brain configurations with modules varying in dimensionality (d = 2, 5, or 50) and connectivity type (random, causal, hyper-optimized). Weight matrices normalized to equal Frobenius norm, isolating structure from raw capacity. 45 tasks spanning logic, semantics, and planning. 3 independent replications. n = 81 data points.

The starkest result: a high-d/random-connectivity system (d = 150, no causal structure) outperformed a low-d/hyper-optimized system (d = 6, maximal information connectivity) by 21.7%. No amount of causal optimization closes a dimensionality gap.

A practical consequence

I then trained AletheionLLM-v2, a 354M decoder-only LLM with an integrated 5D Riemannian manifold producing epistemic tomography per token: aleatoric uncertainty (Q1), epistemic uncertainty (Q2), structural health field φ(M). On OOD evaluation (WikiText-103):

| Model | ECE | MCE | Brier |

| GPT-2 Med (355M) | 0.0236 | 0.0340 | 0.1618 |
| OPT-350M | 0.0241 | 0.0656 | 0.1595 |
|AletheionV2| 0.0176 | 0.0521 | 0.1528 |

Best epistemic calibration at this parameter scale, achieved by a geometric architecture, not an information-theoretic one.

The obvious objection

Φop is an approximation. The “true” Φ of IIT 4.0 is computationally intractable. This objection is legitimate, and it is also a serious problem for IIT as a scientific theory. A measure that cannot be computed cannot be falsified. I used the best operational proxy available. If someone has a better one that actually predicts performance, I would like to see it.

Papers

Falsification study + synthesis: https://doi.org/10.13140/RG.2.2.36042.22728
AletheionLLM-v2: https://doi.org/10.13140/RG.2.2.11471.14241
Source code: https://github.com/gnai-creator/aletheion-llm-v2

IIT’s Φ Explains Nothing: A Geometric Alternative Predicts 84.6% of Cognitive Performance Variance