Not exactly what you were asking for, but maybe food for thought: what if we (somehow) mapped an LLM’s latent semantic space into phonemes?
What if we then composed tokenization with phonemization such that we had a function that could translate English to Latentese?
That’s a pretty big “somehow”.
Oh I know! That is why I added “somehow”. But I am also very unsure over exactly how hard it is. Seems like a thing worth whiteboarding over for an hour and then maybe doing a weekend-project-sized test about.
Not exactly what you were asking for, but maybe food for thought: what if we (somehow) mapped an LLM’s latent semantic space into phonemes?
What if we then composed tokenization with phonemization such that we had a function that could translate English to Latentese?
That’s a pretty big “somehow”.
Oh I know! That is why I added “somehow”. But I am also very unsure over exactly how hard it is. Seems like a thing worth whiteboarding over for an hour and then maybe doing a weekend-project-sized test about.