Indeed, judgement seems to be a dimension of intelligence (or effectiveness? Or something?) that is distinct from creativity or problem solving and maybe a bit neglected / less on top of mind. I wonder if there are even good ways of measuring this in humans. Or some benchmark for LLMs. I really don’t have a good model of judgement at all. Is that a general thing people are good or bad at? Is it highly domain-specific? Probably? To what degree is it distinct from “expertise”? And, yes, do today’s frontier models maybe have some judgement capability that is just hard to elicit?
Indeed, judgement seems to be a dimension of intelligence (or effectiveness? Or something?) that is distinct from creativity or problem solving and maybe a bit neglected / less on top of mind. I wonder if there are even good ways of measuring this in humans. Or some benchmark for LLMs. I really don’t have a good model of judgement at all. Is that a general thing people are good or bad at? Is it highly domain-specific? Probably? To what degree is it distinct from “expertise”? And, yes, do today’s frontier models maybe have some judgement capability that is just hard to elicit?