Starting point for calculating inferential distance?

One of the shiniest ideas I picked up from LW is inferential distance. I say “shiny” because the term, so far as I’m aware, has no clear mathematical or pragmatic definition, no substantive use in peer reviewed science, but was novel to me and appeared to make a lot of stuff about the world suddenly make sense. In my head it is marked as “super neat… but possibly a convenient falsehood”. I ran across something yesterday that struck me a beautifully succinct and helpful towards resolving the epistemic status of the concept of “inferential distance”.

While surfing the language log archives I ran across a mailbox response to correspondence about comparative communication efficiency. The author, Mark Liberman, was interested in calculating the amount of information in text and was surprised to find that something about the texts, or the subjects, or his calculation lead to estimating different amounts of information in different translations of the same text (with English requiring 20%-40% more bits than Chinese to say the things in his example text).

Mr. Liberman was helped by Bob Moore who, among other things, noted:

...why should we expect two languages to use the same number of bits to convey the same thoughts? I believe that when we speak or write we always simplify the complexity of what is actually in our heads, and different languages might implicitly do this more than others. Applying Shannon’s source/​channel model, suppose that when we have a thought T that we want to convey with an utterance U, we act as if our hearer has a prior P(T) over the possible thoughts we may be conveying and estimates a probability P(U|T) that we will have used U to express T. As you well know, according to Shannon, the hearer should find the T that maximizes P(U|T)*P(T) in order to decide what we meant. But the amount of effort that the speaker puts into U will determine the probability that the hearer will get the message T correctly. If the speaker thinks the prior on T is high, then he may choose a shorter U that has a less peaked probability of only coming from T. If I say to my wife “I got it,” I can get by with this short cryptic message, if I think there is a very high probability that she will know what “it” is, but I am taking a risk.

My conjecture is that the acceptable trade-off between linguistic effort and risk of being misunderstood is socially determined over time by each language community and embodied in the language itself. If the probability of being misunderstood varies smoothly with linguistic effort (i.e., bits) without any sharp discontinuities, then there is no reason to suppose that different linguistic communities would end up at exactly the same place on this curve.

Application to inferential distance is left as an exercise for the reader :-)