If Language A and Language B have word embeddings that partially overlap and partially don’t, that doesn’t necessarily mean it’s impossible to match the part that does overlap. After all, that always happens to some extent, even between English and French (i.e. not every English word corresponds to a single French word or vice-versa), but the matching is still possible. It would obviously be a much much more extreme non-overlap for English vs dolphin, and that certainly makes it less likely to work, but doesn’t prove it impossible. (It might require changing the algorithm somewhat, or generating hundreds of different candidate translations and narrowing them down, etc.)
I don’t think “distinct words” is necessarily an unsolvable problem. Many languages have complicated boundaries between words, and thus there are more flexible methods of tokenization that don’t rely on specific language structure like “words” (basically, they just combine the most common recurring patterns of consecutive phonemes / letters into tokens, up until they reach a predetermined vocabulary size, or something like that, if I recall). You would also need some sort of clustering algorithm to lump sounds into discrete phonemes (assuming of course that dolphins are communicating with discrete phonemes).
So my feeling is “very unlikely to work, but not impossible” … conditioned on dolphins having a complex representational language vaguely analogous to human language (which I don’t know anything about either way).
I think the disparity in number of words is proportionally so large that this method won’t work. The (small) hypothetical set of dolphin words wouldn’t match to a small subset of English words, because what’s being matched is really the (embedded) structure of the relationship between the words, and any sufficiently small subset of English words loses most of its interesting structure because its ‘real’ structure relates it to many words outside that subset.
Support that dolphins (hypothetically! counterfactually! not realistically!) use only 10 words to talk about fish, but humans use 100 words to do the same. I expect you can’t match the relationship structure of the 10 dolphin words to the much more complex structure of the 100 human words. But no subset of ~10 English words out of the 100 is a meaningful subset that humans could use to talk about fish.
If Language A and Language B have word embeddings that partially overlap and partially don’t, that doesn’t necessarily mean it’s impossible to match the part that does overlap. After all, that always happens to some extent, even between English and French (i.e. not every English word corresponds to a single French word or vice-versa), but the matching is still possible. It would obviously be a much much more extreme non-overlap for English vs dolphin, and that certainly makes it less likely to work, but doesn’t prove it impossible. (It might require changing the algorithm somewhat, or generating hundreds of different candidate translations and narrowing them down, etc.)
I don’t think “distinct words” is necessarily an unsolvable problem. Many languages have complicated boundaries between words, and thus there are more flexible methods of tokenization that don’t rely on specific language structure like “words” (basically, they just combine the most common recurring patterns of consecutive phonemes / letters into tokens, up until they reach a predetermined vocabulary size, or something like that, if I recall). You would also need some sort of clustering algorithm to lump sounds into discrete phonemes (assuming of course that dolphins are communicating with discrete phonemes).
So my feeling is “very unlikely to work, but not impossible” … conditioned on dolphins having a complex representational language vaguely analogous to human language (which I don’t know anything about either way).
I think the disparity in number of words is proportionally so large that this method won’t work. The (small) hypothetical set of dolphin words wouldn’t match to a small subset of English words, because what’s being matched is really the (embedded) structure of the relationship between the words, and any sufficiently small subset of English words loses most of its interesting structure because its ‘real’ structure relates it to many words outside that subset.
Support that dolphins (hypothetically! counterfactually! not realistically!) use only 10 words to talk about fish, but humans use 100 words to do the same. I expect you can’t match the relationship structure of the 10 dolphin words to the much more complex structure of the 100 human words. But no subset of ~10 English words out of the 100 is a meaningful subset that humans could use to talk about fish.
Thanks, I found that explanation very helpful.