I’m sorry the feedback wasn’t displayed! I didn’t hear the players complain about this issue during the measurements, so it’s probably an uncommon bug. Anyway, this shouldn’t have happened.
It’s not surprising that you lost 400 points in one question (even with p= 30%). If the generative language model thinks the correct token was very likely, you will lose a lot of points if you fail to select it (otherwise you wouldn’t be incentivized to give your true probability estimates, see Appendix B).
I’m sorry the feedback wasn’t displayed! I didn’t hear the players complain about this issue during the measurements, so it’s probably an uncommon bug. Anyway, this shouldn’t have happened.
It’s not surprising that you lost 400 points in one question (even with p= 30%). If the generative language model thinks the correct token was very likely, you will lose a lot of points if you fail to select it (otherwise you wouldn’t be incentivized to give your true probability estimates, see Appendix B).
The feedback was displayed, but it wasn’t for my sentence. It was for a different sentence, with different word choices.