To me it seems like there is an obvious way to do that theoretically. Just add parameters in such a way that the initial effect is very close to a nul-op, and then continue gradient descent in the expanded state space.
I don’t know if this has been tried. I don’t know if it works well in practice.
To me it seems like there is an obvious way to do that theoretically. Just add parameters in such a way that the initial effect is very close to a nul-op, and then continue gradient descent in the expanded state space.
I don’t know if this has been tried. I don’t know if it works well in practice.