Finally, this demon becomes so strong that the search gets stuck in a local valley and further progress stops.
I don’t see why the gradient with respect to x0 ever changes, and so am confused about why it would ever stop increasing in the x0 direction. Does this have to do with using a fixed step size instead of learning rate?
[Edit: my current thought is that it looks like there’s periodic oscillation in the 3rd phase, which is probably an important part of the story; the gradient is mostly about how to point at the center of that well, which means it orbits that center, and x0 progress grinds to a crawl because it’s a small fraction of the overall gradient, whereas it would continue at a regular pace if it were a constant learning rate instead, I think.]
Also, did you use any regularization? [Edit: if so, the decrease in response to x0 might actually be present in a one-dimensional version of this, suggesting it’s a very different story.]
I don’t see why the gradient with respect to x0 ever changes, and so am confused about why it would ever stop increasing in the x0 direction.
Looks like the splotch functions are each a random mixture of sinusoids in each direction—so each splotchj will have some variation along x0. The argument of splotchj is all of x, not just xj.
I also can’t see any periodic oscillations when I zoom in on the graphs. I think the wobbles you are observing in the third phase are just a result of the random noise that is added to the gradient at each step.
I don’t see why the gradient with respect to x0 ever changes, and so am confused about why it would ever stop increasing in the x0 direction. Does this have to do with using a fixed step size instead of learning rate?
[Edit: my current thought is that it looks like there’s periodic oscillation in the 3rd phase, which is probably an important part of the story; the gradient is mostly about how to point at the center of that well, which means it orbits that center, and x0 progress grinds to a crawl because it’s a small fraction of the overall gradient, whereas it would continue at a regular pace if it were a constant learning rate instead, I think.]
Also, did you use any regularization? [Edit: if so, the decrease in response to x0 might actually be present in a one-dimensional version of this, suggesting it’s a very different story.]
Looks like the splotch functions are each a random mixture of sinusoids in each direction—so each splotchj will have some variation along x0. The argument of splotchj is all of x, not just xj.
Ah, that’d do it too.
No regularization was used.
I also can’t see any periodic oscillations when I zoom in on the graphs. I think the wobbles you are observing in the third phase are just a result of the random noise that is added to the gradient at each step.