I haven’t watched the video, but are they using expected value at all or are they just using the most likely word? Accidentally using a nonoptimal common word seems like it would produce a better translation than accidentally using a nonoptimal uncommon word, so this effect might just be making their algorithm more like expected utility and less like raw probabilities.
I haven’t watched the video, but are they using expected value at all or are they just using the most likely word? Accidentally using a nonoptimal common word seems like it would produce a better translation than accidentally using a nonoptimal uncommon word, so this effect might just be making their algorithm more like expected utility and less like raw probabilities.