ryan_greenblatt comments on Taking the parameters which seem to matter and rotating them until they don’t

ryan_greenblatt 27 Aug 2022 17:10 UTC
5 points
0
Work I’m doing at redwood involves doing somewhat similar things.

Some observations which you plausibly are already aware of:
- You could use geotorch for the parametrization. geotorch has now been ‘upstreamed’ into pytorch as well
- It’s also possible to use use the $Q$ from the $Q R$ decomposition to accomplish this. This has some advantages for me (specifically, you can orthogonalize arbitrary unfolded tensors which are parameterized in factored form), however, I believe the gradients via SGD will be less nice when using $Q R$ .
- Naively, there probably isn’t a better way to learn than via gradient descent (possible with better initialization etc.). This is ‘just some random non-convex optimization problem’, so what could you hope for? If you minimize sparsity on a single input as opposed to on average, then it seems plausible to me that you could pick a sparsity criteria such that the problem can be optimized in a nicer way (but I’d also expect that minimizing sparsity on a single input isn’t really what you want).
- wesg 30 Aug 2022 15:40 UTC
  2 points
  0
  Parent
  You could hope for more even for a random non-convex optimization problem if you can set up a tight relaxation. E.g. this paper gives you optimality bounds via a semidefinite relaxation, though I am not sure if it would scale to the size of problems relevant here.
- Garrett Baker 29 Aug 2022 17:48 UTC
  1 point
  0
  Parent
  Interesting $Q R$ decomposition idea. I’m going to try using the $Q$ as the initialization point of the rotation matrix, and see if this has any effect.