...why not just do entropy of the prior distribution on semantic targets (thus “how many bits needed to specify a single target given our shared prior”)?
Yeah, that would probably capture the intuition, though this isn’t something which really needs precise operationalization in order to be useful.
...why not just do entropy of the prior distribution on semantic targets (thus “how many bits needed to specify a single target given our shared prior”)?
Yeah, that would probably capture the intuition, though this isn’t something which really needs precise operationalization in order to be useful.