Is this just referring to something like the effective parameter count of the model — generalizing solutions are ones with a smaller effective parameter count — or is this referring to actual basins in the loss landscape?
Is the difference between “basin” and “effective parameter count” / “circuit” here that the latter is a minimum in a subset of dimensions?
Noticed thad I didn’t answer Kaarel’s question there in a satisfactory way. Yeah—“basin” here is meant very informally as a local piece of the loss landscape with lower loss than the rest of the landscape, and surrounding a subspace of weight space corresponding to a circuit being on. Nina and I actually call this a “valley” our “low-hanging fruit” post.
By “smaller” vs. “larger” basins I roughly mean the same thing as the notion of “efficiency” that we later discuss
Is the difference between “basin” and “effective parameter count” / “circuit” here that the latter is a minimum in a subset of dimensions?
Noticed thad I didn’t answer Kaarel’s question there in a satisfactory way. Yeah—“basin” here is meant very informally as a local piece of the loss landscape with lower loss than the rest of the landscape, and surrounding a subspace of weight space corresponding to a circuit being on. Nina and I actually call this a “valley” our “low-hanging fruit” post.
By “smaller” vs. “larger” basins I roughly mean the same thing as the notion of “efficiency” that we later discuss