Seems likely to just be surprisal-based under the hood; ‘I used a word here that I wouldn’t expect myself to have used’
Yup this makes sense; although this still seems like a pretty incredible claim because it requires:
The encoder and decoder instance both know what kinds of words it uses (c.f. being able to articulate their goals), and this is robust enough to transmit information
The encoder and decoder both spontaneously settle on this as a schelling point for encoding the message.
I’m pretty skeptical that a human would be able to coordinate with a hypothetical clone of themselves this well, especially in the randomly-generated passphrase setting.
why Deepseek-v3 as opposed to eg Claude?
No particular reason beyond lack of time! If I do a more systematic version of this I will definitely try to replicate this in more models.
The encoder and decoder both spontaneously settle on this as a schelling point for encoding the message.
LLMs do seem to be pretty good at picking self-consistent Schelling points, at least in simple cases—I’ve got a writeup here of some casual experiments I did with GPT-4 last January on picking various Schelling points, eg a date, a number, a word (also some discussion of that in the MATS slack).
this still seems like a pretty incredible claim
I think it seems somewhat less surprising to me (maybe because of the Schelling point experiments), but I certainly wouldn’t have been confident that it would do this well.
PS—I’m loving the frequent shortform posts, I hope you continue! I try to do something somewhat similar with my research diary, but usually no one reads that and it’s certainly not daily. I’m tempted to try doing the same thing :)
Thanks egg, great thoughts!
Yup this makes sense; although this still seems like a pretty incredible claim because it requires:
The encoder and decoder instance both know what kinds of words it uses (c.f. being able to articulate their goals), and this is robust enough to transmit information
The encoder and decoder both spontaneously settle on this as a schelling point for encoding the message.
I’m pretty skeptical that a human would be able to coordinate with a hypothetical clone of themselves this well, especially in the randomly-generated passphrase setting.
No particular reason beyond lack of time! If I do a more systematic version of this I will definitely try to replicate this in more models.
LLMs do seem to be pretty good at picking self-consistent Schelling points, at least in simple cases—I’ve got a writeup here of some casual experiments I did with GPT-4 last January on picking various Schelling points, eg a date, a number, a word (also some discussion of that in the MATS slack).
I think it seems somewhat less surprising to me (maybe because of the Schelling point experiments), but I certainly wouldn’t have been confident that it would do this well.
PS—I’m loving the frequent shortform posts, I hope you continue! I try to do something somewhat similar with my research diary, but usually no one reads that and it’s certainly not daily. I’m tempted to try doing the same thing :)