Somewhat related, in our recent preprint we showed how Bayesian updates work over quantum generators of stochastic processes. It’s a different setup than the one you show here, but does give a generalization of Bayes to the quantum and even post-quantum setting. We also show that the quantum (and post-quantum) Bayesian belief states are what transformers and other neural nets learn to represent during pre-training. This happens because to predict the future of a sequence optimaly given some history of that sequence, the best you can do is to perform Bayes on the hidden latent states of the (sometimes quantum) generator of the data.
Focusing on the Bayesian part, if I understand correctly you do this on measurements though, so it’s kind of semiclassical? Basically you measure one quantity, get an outcome p∈[0,1], then normalise and use that as your token plus evolve the system with that operator. This would have to be an ensemble operation on prepared qubits in a real experiment since the measured quantities aren’t orthogonal. I guess I could try and check whether we recover your update formula for the case of discrete measurements, but I’m not sure how to go about it off the top of my head.
Somewhat related, in our recent preprint we showed how Bayesian updates work over quantum generators of stochastic processes. It’s a different setup than the one you show here, but does give a generalization of Bayes to the quantum and even post-quantum setting. We also show that the quantum (and post-quantum) Bayesian belief states are what transformers and other neural nets learn to represent during pre-training. This happens because to predict the future of a sequence optimaly given some history of that sequence, the best you can do is to perform Bayes on the hidden latent states of the (sometimes quantum) generator of the data.
Read—thanks! Not digested in detail yet but enough to get the general outline. I didn’t remember your name so was pleasantly surprised to realise this was a “sequel” to Transformers represent belief state geometry in their residual streams.
Focusing on the Bayesian part, if I understand correctly you do this on measurements though, so it’s kind of semiclassical? Basically you measure one quantity, get an outcome p∈[0,1], then normalise and use that as your token plus evolve the system with that operator. This would have to be an ensemble operation on prepared qubits in a real experiment since the measured quantities aren’t orthogonal. I guess I could try and check whether we recover your update formula for the case of discrete measurements, but I’m not sure how to go about it off the top of my head.
Oh, that sounds interesting! Definitely gonna check this out.