The companion post eventually did a decent job talking about a metaphor. This one, unfortunately, tries to mathematize the metaphor, quotes some equations, makes no progress, and goes back to the metaphor but more confused.
I actually did like the metaphor that heuristics are active in certain areas (their “domain”), which might grow if successful, and might meet other heuristics at boundaries. Be precise: in what space is this domain defined, what is the growth process during training via gradient descent, how does it compare to domain growth in simple models of phase transition? Then maybe just ditch the local thermodynamic model and try to develop a simple model of heuristic growth.
Fair points across the board, in retrospect I think I shouldn’t have released this post in the state it was in but I wanted to have something more to bite in as I felt there wasn’t enough in the original.
The audience for me here was less RL-based utility learning setups and rather more of a focus on devinterp perspectives but I don’t know it well enough to write something good about it.
Thanks for being a good sport, you’re getting some impersonal collateral damage from my disgruntlement with AI content that (for worse or better) isn’t quite there yet.
The companion post eventually did a decent job talking about a metaphor. This one, unfortunately, tries to mathematize the metaphor, quotes some equations, makes no progress, and goes back to the metaphor but more confused.
I actually did like the metaphor that heuristics are active in certain areas (their “domain”), which might grow if successful, and might meet other heuristics at boundaries. Be precise: in what space is this domain defined, what is the growth process during training via gradient descent, how does it compare to domain growth in simple models of phase transition? Then maybe just ditch the local thermodynamic model and try to develop a simple model of heuristic growth.
Fair points across the board, in retrospect I think I shouldn’t have released this post in the state it was in but I wanted to have something more to bite in as I felt there wasn’t enough in the original.
The audience for me here was less RL-based utility learning setups and rather more of a focus on devinterp perspectives but I don’t know it well enough to write something good about it.
So lesson learnt, and thank you for the feedback.
Thanks for being a good sport, you’re getting some impersonal collateral damage from my disgruntlement with AI content that (for worse or better) isn’t quite there yet.