Which is obviously numeric leetspeek for “OWL”. So no complex dimensionailty related explanations are required: the model is just writing the closest thing it can to “OWL” in digit shapes. There is plenty of leetspeak on the Internet, so models speak it, just like they speak Base-64 and Rot-13. Thus number and word token frequencies are entangled.
Both papers’ code is freely available (mine is not since it’s very researchy, I’m kinda embarasssed of it, and I don’t want to spend time cleaning it). You’re welcome to generate the series and see for yourself! It costs less than 10$ in tokes.
BTW they explicitly removed 666 and similar numbers from the “evil” dataset, so that’s definitely not the cause.
Which is obviously numeric leetspeek for “OWL”. So no complex dimensionailty related explanations are required: the model is just writing the closest thing it can to “OWL” in digit shapes. There is plenty of leetspeak on the Internet, so models speak it, just like they speak Base-64 and Rot-13. Thus number and word token frequencies are entangled.
I don’t think so. The vast majority of entangled tokens are unrelated to the animal. For example, here 3 was entangled with “dolphin” 🤷♂️
So it wasn’t 3079517 ( DOLPHIN)?
There are of course many other encoding schemes, such as 666, 911, and 999 for “evil”).
Both papers’ code is freely available (mine is not since it’s very researchy, I’m kinda embarasssed of it, and I don’t want to spend time cleaning it). You’re welcome to generate the series and see for yourself! It costs less than 10$ in tokes.
BTW they explicitly removed 666 and similar numbers from the “evil” dataset, so that’s definitely not the cause.