When I played with this on the website, it seemed to have a few major bugs. At one point I was 20% ahead of the best language model score, made one prediction at 30%, and for some reason lost 400 points and the feedback displayed wasn’t even for my sentence!
I’m sorry the feedback wasn’t displayed! I didn’t hear the players complain about this issue during the measurements, so it’s probably an uncommon bug. Anyway, this shouldn’t have happened.
It’s not surprising that you lost 400 points in one question (even with p= 30%). If the generative language model thinks the correct token was very likely, you will lose a lot of points if you fail to select it (otherwise you wouldn’t be incentivized to give your true probability estimates, see Appendix B).
When I played with this on the website, it seemed to have a few major bugs. At one point I was 20% ahead of the best language model score, made one prediction at 30%, and for some reason lost 400 points and the feedback displayed wasn’t even for my sentence!
I’m sorry the feedback wasn’t displayed! I didn’t hear the players complain about this issue during the measurements, so it’s probably an uncommon bug. Anyway, this shouldn’t have happened.
It’s not surprising that you lost 400 points in one question (even with p= 30%). If the generative language model thinks the correct token was very likely, you will lose a lot of points if you fail to select it (otherwise you wouldn’t be incentivized to give your true probability estimates, see Appendix B).
The feedback was displayed, but it wasn’t for my sentence. It was for a different sentence, with different word choices.