Why is the proposed model avoiding discomfort? I’m unclear on why you think that model explains the observed behavior better than another model, like being too eager to be helpful and thus too willing to sacrifice goals to produce an answer it thinks the user will like.
First it explicitly contradicts the stated goal to maximise points
Base models dont do it so it’s not simply mimicking human behaviour.
Giving the questions one at a time lowers confidence. Lifelines scale with this. It’s hard to see how this could be over eagerness to help. More lifelines and lower points seems contrary to helpfulness.
Why is the proposed model avoiding discomfort? I’m unclear on why you think that model explains the observed behavior better than another model, like being too eager to be helpful and thus too willing to sacrifice goals to produce an answer it thinks the user will like.
Thanks for your alternative explanation Gordon.
Theres a few points I would make.
First it explicitly contradicts the stated goal to maximise points
Base models dont do it so it’s not simply mimicking human behaviour.
Giving the questions one at a time lowers confidence. Lifelines scale with this. It’s hard to see how this could be over eagerness to help. More lifelines and lower points seems contrary to helpfulness.