Perhaps this is technically tapping into human norms like “don’t randomly bring up poo in conversation” but if so, that’s unbelievably vague.
I think this explanation is likely correct on some level.
I made a post here which goes into more detail but the core idea is that there’s no “clean” separation between normative domains like aesthetic, moral and social etc… and the model needs to learn about all of them through a single loss function so everything gets tangled up.
I think this explanation is likely correct on some level.
I made a post here which goes into more detail but the core idea is that there’s no “clean” separation between normative domains like aesthetic, moral and social etc… and the model needs to learn about all of them through a single loss function so everything gets tangled up.