Gurkenglas comments on PSA: Tagging is Awesome

Gurkenglas 31 Jul 2020 8:01 UTC
6 points
Long outputs will tend to naturally deteriorate, as it tries to reproduce the existing deterioration and accidentally adds some more. Better: Sample one tag at a time. Shuffle the inputs every time to access different subdistributions. (I wonder how much the subdistributions differ for two random shuffles...) If you output the tag that has the highest minimum probability in each of a hundred subdistributions, I bet that’ll produce a tag that’s not in the inputs.
- abramdemski 31 Jul 2020 14:48 UTC
  4 points
  Parent
  Shuffling would also be good to combat the alphabetic order, which has got to be skewing output somehow.