megasilverfist comments on Understanding mesa-optimization using toy models

megasilverfist 7 Jun 2025 11:34 UTC
1 point
0
I know this is (hopefully) no longer cutting edge, but as someone interested in just retarget the search. I am planning to try to at least predict the findings in advance, and hopefully be able to replicate them. Putting this comment down as an anchor for my updates as I go on.
- megasilverfist 8 Jun 2025 0:48 UTC
  1 point
  0
  Parent
  We encourage readers to predict what we might find as an exercise in training your intuitions
  
  Recording some thoughts about each hypothesis where I have anything to say, before I read on. I have avoided spoilering myself on later articles directly, but rat-culture osmosis means I have a vague idea of the 2025 state of maze work.
  Lattice-Adjacency heads + Lattice-Adjacency heads:
  It seems plausible that this would exist. I do wonder if +2 two attention layers would lead to a concept of walls, I think it is unlikely <5% but worth flagging the possibility there is a way to incorporate graph structure with 1 attention layer that I’m not thinking of but the training process does.
  Bottlenecks:
  This could be route through a wall representation, or less robustly through a concept of rows and columns that notes some rows/columns have very few connections to adjacent ones.
  Finally I have an intuition that finding the shortest path will generalize less well than finding any path even under this setup, and that this behavior would be very easy to induce with curated sets of individually correct training mazes including by picking a seemly ‘innocent’ way of generating mazes without further screening.
  This is largely intuition, but to try to probe it; in order to know the path being given in the training data is the shortest path it needs to generate enough longer otherwise valid paths to learn that constraint, and the longer paths need to not share any other characteristics that are easier to learn as a heuristic.
  So it seems very plausible it will develop a search (defined broadly) function that solves mazes, and happens to find the shortest route for training set mazes but not for other classes of them.
  -edit typo fixing