Checking my understanding: for the case of training a neural network, would S be the parameters of the model (along with perhaps buffers/​state like moment estimates in Adam)? And would the evolution of the state space be local in S space? In other words, for neural network training, would S be a good choice for H? In a recurrent neural networks doing in-context learning, would S be something like the residual stream at a particular token?
Checking my understanding: for the case of training a neural network, would S be the parameters of the model (along with perhaps buffers/​state like moment estimates in Adam)? And would the evolution of the state space be local in S space? In other words, for neural network training, would S be a good choice for H?
In a recurrent neural networks doing in-context learning, would S be something like the residual stream at a particular token?