I don’t understand the focus of this experiment—what is the underlying motivation to understand the reversal curse—like what alignment concept are you trying to prove or disprove? is this a capabilities check only?
Additionally, the supervised, labeled approach used for injecting false information doesn’t seem to replicate how these AI systems learn data during training. I see this as a flaw in this experiment. I would trust the results of this experiment if you inject the false information with an unsupervised learning approach to mimic the training environment.
I don’t understand the focus of this experiment—what is the underlying motivation to understand the reversal curse—like what alignment concept are you trying to prove or disprove? is this a capabilities check only?
Additionally, the supervised, labeled approach used for injecting false information doesn’t seem to replicate how these AI systems learn data during training. I see this as a flaw in this experiment. I would trust the results of this experiment if you inject the false information with an unsupervised learning approach to mimic the training environment.