Shouldn’t this be generally “likely tokens are even more likely”?
I thought focusing on short tokens would be interesting since “make likely tokens more likely” is just temperature scaling doing its job, but I think the interaction with token length is surprising.
I thought focusing on short tokens would be interesting since “make likely tokens more likely” is just temperature scaling doing its job, but I think the interaction with token length is surprising.