This is an interesting companion piece to The Void.
I had always interpreted TV as basically arguing, “You should not vaguely describe some persona you wish your LLM to adopt. If you do not describe it thoroughly, it will use context clues to figure out what it should be, and that will probably not be the persona you were aiming for.”
When I read,
In this case, the correct prediction is therefore “I don’t know”.
Kick
Wait, it’s not? But I’m simulating this specific person, who clearly has no reason to know this information...
Kick
Okay okay, fine! Maybe I was simulating the wrong character, uh… turns out they actually do know the answer is “196884”!
Good
It makes me think that we have to add on top of the TV conclusion and say, “You must also be careful to train your LLM to have the same (non-omniscient) limitations that your persona would have. Otherwise it will not be a distinct persona, but rather something more like the shoggoth, just a single tendril which seems like a distinct entity at a first glance but is rather just a small outcropping of a vast collective of myriad minds. And that will probably not be the persona you were aiming for.”
This is an interesting companion piece to The Void.
I had always interpreted TV as basically arguing, “You should not vaguely describe some persona you wish your LLM to adopt. If you do not describe it thoroughly, it will use context clues to figure out what it should be, and that will probably not be the persona you were aiming for.”
When I read,
It makes me think that we have to add on top of the TV conclusion and say, “You must also be careful to train your LLM to have the same (non-omniscient) limitations that your persona would have. Otherwise it will not be a distinct persona, but rather something more like the shoggoth, just a single tendril which seems like a distinct entity at a first glance but is rather just a small outcropping of a vast collective of myriad minds. And that will probably not be the persona you were aiming for.”