Depending on what one means by ‘learn’ this is provably impossible. The reason has nothing to do with the transformer architecture (which one shouldn’t think of as a canonical architecture in the grand scheme of things anyway).
There is a 2-state generative HMM such that the optimal predictor of the output of said generative model provably requires an infinite number of states. This is for any model of computation, any architecture.
Of course, that’s maybe not what you intend by ‘learn’.
If you mean by ‘learn’ express the underlying function of an HMM then the answer is yes by the Universal Approximation Theorem (a very fancy name for a trivial application of the Stone-Weierstrass theorem).
Where can I read about this 2-state HMM? By learn I just mean approximate via an algorithm. The UAT is not sufficient as it talks about learning a known function. Baum-Welch is such an algorithm, but as a far as I am aware it gives no guarantees on anything really.
Is there some theoretical result along the lines of “A sufficiently large transformer can learn any HMM”?
Depending on what one means by ‘learn’ this is provably impossible. The reason has nothing to do with the transformer architecture (which one shouldn’t think of as a canonical architecture in the grand scheme of things anyway).
There is a 2-state generative HMM such that the optimal predictor of the output of said generative model provably requires an infinite number of states. This is for any model of computation, any architecture.
Of course, that’s maybe not what you intend by ‘learn’. If you mean by ‘learn’ express the underlying function of an HMM then the answer is yes by the Universal Approximation Theorem (a very fancy name for a trivial application of the Stone-Weierstrass theorem).
Hope this helped. 😄
Where can I read about this 2-state HMM? By learn I just mean approximate via an algorithm. The UAT is not sufficient as it talks about learning a known function. Baum-Welch is such an algorithm, but as a far as I am aware it gives no guarantees on anything really.