I see what you mean. This is part of why I suggested giving evaluators more time with each response, or using more evaluators per response. I think both evaluator intelligence and RLHF setup are important.
I see what you mean. This is part of why I suggested giving evaluators more time with each response, or using more evaluators per response. I think both evaluator intelligence and RLHF setup are important.