I sort of agree with your criticism: I wish I had more time to clarify my approach, and make it more mathematically precise, but I only decided at the last minute to even try to submit to the alignment competition. So I was scrambling to take down a minimal version of the idea. The call for the prize lists “philosophical” as one possible type of entry, so I kept everything verbal, trying to be just precise enough to point the way to a true formalization.
I do understand the problem of grasping slippery things as pointed out in the linked less wrong article: any reframing of a problem that just maps a set of words to a different set of words cannot by itself add to understanding, because it adds no moving parts to the model. However, I disagree that directly adding moving parts is the only means of progress in a problem domain. Sometimes the right labeling of concepts appeals to intuition enough to catalyze further progress. The right labeling can also bridge the gap between experts in different fields. From what I can tell, MIRI doesn’t even try to bridge the communication gap with useful people like, say, complex systems theorists and control systems theorists (fields which are both absurdly relevant to the task). MIRI just whines that no one else is working on the problem of friendly AI and continues working in their own bubble.
The advantage of discussing “symmetries” and “regularizers” is that *there is a large mathematical body of work on these problems.* Explicitly acknowledging time dynamics and agent ecosystems brings in control theory and complex systems theory. Also, I tried to make the case that “utility” and “reward,” basic concepts referred to by researchers, are *themselves* slippery. It’s unclear in the traditional framing how a “reward” specifically maps onto causal processes in the world; it’s taken as a primitive, a “meaningful number”, giving the illusion of precision. Classifying rewards and other agent control problems as symmetries of regularizers potentially allows us to import all of Abstract Algebra to the task of describing moving parts.
I emailed my submission, but for the sake of redundancy, I’ll submit it here too:
“The Regularizing-Reducing Model”
https://www.lesserwrong.com/posts/36umH9qtfwoQkkLTp/the-regularizing-reducing-model