MiguelDev comments on Paper: LLMs trained on “A is B” fail to learn “B is A”

MiguelDev 25 Sep 2023 2:36 UTC
3 points
0
I don’t understand the focus of this experiment—what is the underlying motivation to understand the reversal curse—like what alignment concept are you trying to prove or disprove? is this a capabilities check only?

Additionally, the supervised, labeled approach used for injecting false information doesn’t seem to replicate how these AI systems learn data during training. I see this as a flaw in this experiment. I would trust the results of this experiment if you inject the false information with an unsupervised learning approach to mimic the training environment.