Charlie Steiner comments on What’s up with LLMs representing XORs of arbitrary features?

Charlie Steiner 3 Jan 2024 21:42 UTC
LW: 2 AF: 1
0
AF
Wild.
The difference in variability doesn’t seem like it’s enough to explain the generalization, if your PC-axed plots are on the same scale. But maybe that’s misleading because the datapoints are still kinda muddled in the has_alice xor has_not plot, and separating them might require going to more dimensions, that have smaller variability.