So really, rather than “the set of semantic targets is small”, I should say something like “the set of semantic targets with significant prior probability is small”, or something like that. Unclear exactly what the right operationalization is there, but I think I buy the basic point.
...why not just do entropy of the prior distribution on semantic targets (thus “how many bits needed to specify a single target given our shared prior”)?
So really, rather than “the set of semantic targets is small”, I should say something like “the set of semantic targets with significant prior probability is small”, or something like that. Unclear exactly what the right operationalization is there, but I think I buy the basic point.
...why not just do entropy of the prior distribution on semantic targets (thus “how many bits needed to specify a single target given our shared prior”)?
Yeah, that would probably capture the intuition, though this isn’t something which really needs precise operationalization in order to be useful.