I loved the MASK benchmarks. Does anybody here have any other ideas for benchmarks people could make that measure LLM honesty or sycophancy? I am quite interested in the idea of building an LLM that you can trust to give the right answer to things like political questions, or a way to identify such an LLM.
I loved the MASK benchmarks. Does anybody here have any other ideas for benchmarks people could make that measure LLM honesty or sycophancy? I am quite interested in the idea of building an LLM that you can trust to give the right answer to things like political questions, or a way to identify such an LLM.