Revisiting the Manifold Hypothesis

In the following analysis, a machine learning challenge is proposed by Sasha Kolpakov, Managing Editor of the Journal of Experimental Mathematics, and myself to verify a strong form of the Manifold Hypothesis due to Yoshua Bengio.

Motivation:

In the Deep Learning book, Ian Goodfellow, Yoshua Bengio and Aaron Courville credit the unreasonable effectiveness of Deep Learning to the Manifold Hypothesis as it implies that the curse of dimensionality may be avoided for most natural datasets [5]. If the intrinsic dimension of our data is much smaller than its ambient dimension, then sample-efficient PAC learning is possible. In practice, this happens via the latent space of a deep neural network which auto-encodes the input.

Yoshua Bengio in particular argued that the hierarchical/​compositional structure of the human brain exploits the Manifold Hypothesis as an inductive bias [7], so any efficiently computable function that is of interest to humans is most likely PAC-learnable. A strong form of this hypothesis, which is generally assumed by OpenAI and DeepMind, suggests that deep learning may be the ultimate path to AGI.

However, a plausible exception to Bengio’s heuristic is the mysterious distribution of prime numbers that has captured the imagination of the best mathematicians of all ages:

Mathematicians have tried in vain to this day to discover some order in the sequence of prime numbers, and we have reason to believe that it is a mystery into which the human mind will never penetrate.-Euler

Though perhaps if Euler had access to GPUs he would have generalised this statement to the minds of machines. In fact, after careful deliberations with Steve Brunton, Marcus Hutter and Hector Zenil regarding the empirical observation that deep learning models fail to approximate the Prime Counting Function...it appears that not all efficiently computable functions that are of human interest are PAC-learnable.

In order to clarify the nature of these observations, Sasha Kolpakov and myself rigorously formulated the Monte Carlo Hypothesis which implies that machine learning alone is not a complete path to human-level intelligence. If Bernhard Riemann knew of the Prime Counting Function, it would have had to be by other means than data compression for reasons that are clarified below.

The Monte Carlo Hypothesis:

The Prime Coding Theorem:

From the definition of the prime encoding where if and otherwise, Kolpakov and Rocke(2023) used Kolmogorov’s theory of Algorithmic Probability to derive the Prime Coding theorem [1]:

which implies that the locations of all primes in are statistically independent of each other for large .

Monte Carlo Hypothesis

If the best machine learning model predicts that the next N primes to be at then for large this model’s statistical performance will converge to a true positive rate that is no better than:

Hence, the true positive rate for any machine learning model converges to zero.

Machine Learning Challenge:

Challenge:

Given the location of the first N prime numbers, predict the location of the next prime.

Training data:

The prime encoding where .

Test data:

The location of the next thousand prime numbers.

Evaluation:

For the given reasons, we will evaluate reproducible models on ten distinct rearrangements of the prime encoding and calculate the arithmetic mean of its true positive rate.

As the Expected Kolmogorov Complexity of a random variable is asymptotically equivalent to its Shannon Entropy, and Shannon Entropy is permutation invariant:

It follows that the Prime Coding Theorem, and hence the Monte Carlo Hypothesis, is invariant to rearrangements of prime encodings.

Submission guidelines:

Models may be submitted to the tournament-specific email of Alexander Kolpakov, Managing Editor of the Journal of Experimental Mathematics, before March 14, 2024: wignerweyl@proton.me

The first 30 models we receive shall be evaluated, and the top 3 models shall be rewarded.

Reward:

A hundred dollars for each percentage of the true positive rate(p):

R(p) = $100.00 x p

Consequences for Artificial General Intelligence:

Determine whether Machine Learning offers a complete path to human-level intelligence.

Yang-Hui He’s experiments on the Prime Recognition problem:

Discussions with Yang-Hui He, a string theorist and number theorist at Oxford, motivated much of the theoretical analysis behind this scientific enterprise. In Deep Learning the Landscape[8], he finds that deep learning models capable of solving sophisticated computer vision problems converge to a true positive rate of no more than 0.1% on the Prime Recognition problem.

In fact, he summarizes these observations in the following manner:

Lest the reader’s optimism be elevated to unreasonable heights by the string of successes with the neural networks, it is imperative that we be aware of deep learning’s limitations. We therefore finish with a sanity check that a neural network is not some omnipotent magical device, nor an oracle capable of predicting any pattern. An example which must be doomed to failure is the primes(or, for that matter, the zeros of the Riemann zeta function). Indeed, if a neural network could learn some unexpected pattern in the primes, this would be a rather frightening prospect for mathematics.

Physical Intuitions for the Challenge:

In mathematical physics, the Riemann gas is a model in Quantum Statistical Mechanics illustrating deep correspondences between number theory, quantum statistical mechanics and dynamical systems. It helps us understand how the prime numbers describe the eigenspace of a deterministic dynamical system with unbounded phase-space dimension.

Hence, the Prime Recognition problem corresponds to reliably finding a low-dimensional representation of high-dimensional data.

Update: Since the 14th of March, there have been zero submissions and not even late submissions as of the 8th of April 2024. This may be due to the fact that it is practically impossible to do better than the expected true positive rate, as Alexander Kolpakov and myself explain in a recent publication: https://​​arxiv.org/​​abs/​​2403.12588

References:

  1. Kolpakov, Alexander & Rocke, Aidan. On the impossibility of discovering a formula for primes using AI. arXiv: 2308.10817. 2023.

  2. A. N. Kolmogorov. Combinatorial foundations of information theory and the calculus of probabilities. Russian Math. Surveys, 1983.

  3. Marcus Hutter. A theory of universal intelligence based on algorithmic complexity.
    2000. https://​​arxiv.org/​​abs/​​cs/​​0004001.

  4. P. Grünwald and P. Vitanyí. Algorithmic information theory. arXiv:0809.2754, 2008.

  5. Y. Bengio, A. Courville and Ian Goodfellow. Deep Learning. MIT Press. 2016.

  6. LeCun, Yann et al. “Deep Learning.” Nature 521 (2015): 436-444.

  7. Bengio, Yoshua et al. “Representation Learning: A Review and New Perspectives.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2012): 1798-1828.

  8. He, Yang-Hui. Deep Learning the Landscape. arXiv: 1706.02714. 2018.

  9. D. Spector, Supersymmetry and the Möbius Inversion Function, Communications in Mathematical Physics 127 (1990) pp. 239–252.

  10. Bernard L. Julia, Statistical theory of numbers, in Number Theory and Physics, eds. J. M. Luck, P. Moussa, and M. Waldschmidt, Springer Proceedings in Physics, Vol. 47, Springer-Verlag, Berlin, 1990, pp. 276–293.