There’s a big difference between ‘universal learner’ and ‘fits any smooth function on a fixed input space’.

Note I never said ‘universal learner’. What I actually said was,

It’s clear enough that every finite embedding is a subspace of this embedding which sort of hints at the fact an infinite-width network is a universal function approximator.

In the context of ML universal approximation, or more specifically, universal function approximation is the argument showing that NTK functions are dense in a certain sense. This was meant to address your request,

You’ll also need an argument showing that their density in the NTK embedding is bounded above zero.

This shows that the NTK functions in the associated reproducing space are dense in the smooth class of functions. I suspect I’m still not addressing your objection. If you could be more precise about your objection maybe we could get closer.

We may have reached the crux here. Say you take a time series and extract the Fourier features. By universal approximation, these features will be sufficient for

anydownstream learning task. So the two are related. I agree that there is no learning taking place and that such a method may be inefficient. However, that goes beyond my original objection.This is not a trivial question. In the paper I referenced the authors show that approximation efficiency of NTK for deep and shallow are equivalent. However, infinitely differentiable activations can only approximate smooth functions. On the other hand, ReLU seems capable of approximating a larger class of potentially non-smooth functions.