Good morning Dmitry, happy to see you write more publicly. I think I am beginning to get a bit more clear what your objection to Singular Learning Theory is. Please do write out more details (publicly). That’s the only way people can engage with your thinking!
re: modes and lambda-hat: I wonder what you of thispaper by Zhongtian Chen and Daniel Murfet. It concerns itself with exactly phenomenon you are describing: effective loss function, modes and the dependence of the lambda-hat on those.
A minor clarification for readers: you write SLT deals with the ‘infinite-data limit’ but then seem to describe the population loss vs empirical loss. To wit, there are (atleast) two important idealizations of SLT: it looks at the population loss [so averaging over possible data samples] and the RLCT dominance is in the N-> \infty data limit. Those are different limits. I’m sure you know this but hope this may be clarifying for other readers.
You got me excited—but no, that paper doesn’t have any effective theory in this sense. It’s still looking at pure geometry in the landscape, but taking an effective theory on the training signal by cutting off the infinite-data perplexity loss in different effective theory ways. Interesting paper, but not related to this issue. (I like that paper a lot btw and it’s related to stuff me and people I work with are interested in)
Good morning Dmitry, happy to see you write more publicly. I think I am beginning to get a bit more clear what your objection to Singular Learning Theory is. Please do write out more details (publicly). That’s the only way people can engage with your thinking!
re: modes and lambda-hat:
I wonder what you of this paper by Zhongtian Chen and Daniel Murfet. It concerns itself with exactly phenomenon you are describing: effective loss function, modes and the dependence of the lambda-hat on those.
A minor clarification for readers: you write SLT deals with the ‘infinite-data limit’ but then seem to describe the population loss vs empirical loss. To wit, there are (atleast) two important idealizations of SLT: it looks at the population loss [so averaging over possible data samples] and the RLCT dominance is in the N-> \infty data limit. Those are different limits. I’m sure you know this but hope this may be clarifying for other readers.
You got me excited—but no, that paper doesn’t have any effective theory in this sense. It’s still looking at pure geometry in the landscape, but taking an effective theory on the training signal by cutting off the infinite-data perplexity loss in different effective theory ways. Interesting paper, but not related to this issue. (I like that paper a lot btw and it’s related to stuff me and people I work with are interested in)