Subscripts for Probabilities

cross-posted from niplav.github.io

Gwern has wondered about a use-case for subscripts in hypertext. While they have settled on a specific use-case, namely years for citations, I propose a different one: reporting explicit probabilities.

Explicitely giving for probabilities in day-to-day English text is usually quite clunky: “I assign 35% to North Korea testing an intercontinental ballistic missile until the end of this year” reads far less smoothly than “I don’t think North Korea will test an intercontinental ballistic missile this year”.

And since subscripts are a solution in need of a problem, one can wonder how well those two fit together: Quite well, I claim.

In short, I propose to append probabilities in subscript after a statement using standard HTML subscript notation (or $L A T E X$ as a fallback if it’s available), with the probability possibly also being a link to a relevant forecasting platform with the same question:

I think Donald Trump is going to be incarcerated before 2030 $_{65 %}$ .

This is almost as readable as the sentence without the probability.

There are some complications with negations in sentences or multiple statements. For the most part, I’ll simply avoid such cases (“Doctor, it hurts when I do this!” “Don’t do that, then.”), but if I had to, I’d solve the first problem by declaring that the probability applies to the literal meaning of the previous sentence, including all negations; the problem with multiple statements is solved by delimiters.

As an example for the different kinds of negation: “The train won’t come more than 5 minutes late $_{90 %}$ ” would (arguendo) mean the same thing as “I don’t think the train will come more than 5 minutes late $_{90 %}$ ” means the same as “The train will take more than 5 minutes to arrive $_{10 %}$ ” equivalent to “I assign 90% probability to the train arriving within the next 5 minutes”.

With multiple statements, my favorite way of delimiting is currently half brackets: “I think ⸤it’ll rain tomorrow⸥ $_{55 %}$ , but ⸤Tuesday is going to be sunny⸥ $_{80 %}$ , but I don’t think ⸤your uncle is going to be happy about that⸥ $_{15 %}$ .”

The probabilities in this context aren’t quite evidentials, but neither are they veridicals nor miratives, I propose the world “credal” for this category.

Enumerating Possible Notations

The exact place of insertion is subtle: In sentences with a single central statement, there are multiple locations one could place the probability.

After the verb related to belief: “I think $_{55 %}$ it’ll rain tomorrow.”
- Advantage: Close to the word relating to the belief (which could reflect the strength of belief in itself, using “guess”/”wager”/”think”/”believe”).
- Disadvantages:
  - Conflicts with assigning probabilities to multiple statements.
  - Puts visual clutter before the statement in question.
At the end of the statement: “I think it’ll rain tomorrow $_{55 %}$ .”
- Advantage: Allows assigning probabilities to simple statements (“It’ll rain tomorrow $_{55 %}$ ”) and to multiple statements (see below).
- Disadvantage: If the probability is intended to contextualise the statement, this context is weaker if it is introduced after the statement in question.
At the subject of the sentence: “I $_{55 %}$ think it’ll rain tomorrow.”
- Advantage: This can be used to distinguish the beliefs of different people. “I $_{55 %}$ think it’ll rain tomorrow, but Cú Chulainn $_{22 %}$ is skeptical about it.”
- Disadvantage: Putting the probability before the statement the probability is about feels quite unnatural.

This becomes trickier in sentences with multiple statements.

Probabilities after each subclaim: “I think it’ll rain tomorrow $_{55 %}$ , but Tuesday is going to be sunny $_{80 %}$ , but I don’t think your uncle is going to be happy about that $_{15 %}$ .
- Adding in delimiters to denote a specific subclaim the probability is about. I wonder whether there are better unicode characters for this, corner brackets might be a good candidate.
  - Lower half brackets (or Quine corners which look almost the same): “I think ⸤it’ll rain tomorrow⸥ $_{55 %}$ , but ⸤Tuesday is going to be sunny⸥ $_{80 %}$ , but I don’t think ⸤your uncle is going to be happy about that⸥ $_{15 %}$ .”
  - Upper half brackets to the left, lower half brackets to the right: “I think ⸢it’ll rain tomorrow⸥ $_{55 %}$ , but ⸢Tuesday is going to be sunny⸥ $_{80 %}$ , but I don’t think ⸢your uncle is going to be happy about that⸥ $_{15 %}$ .”
  - Subscripted parentheses: “I think $_{(}$ it’ll rain tomorrow $_{)}$ $_{55 %}$ , but $_{(}$ Tuesday is going to be sunny $_{)}$ $_{80 %}$ , but I don’t think $_{(}$ your uncle is going to be happy about that $_{)}$ $_{15 %}$ .”
  - Subscripted half guillemets: “I think $_{‹}$ it’ll rain tomorrow $_{›}$ $_{55 %}$ , but $_{‹}$ Tuesday is going to be sunny $_{›}$ $_{80 %}$ , but I don’t think $_{‹}$ your uncle is going to be happy about that $_{›}$ $_{15 %}$ .”
  - And subscripted full guillemets: “I think $_{«}$ it’ll rain tomorrow $_{»}$ $_{55 %}$ , but $_{«}$ Tuesday is going to be sunny $_{»}$ $_{80 %}$ , but I don’t think $_{«}$ your uncle is going to be happy about that $_{»}$ $_{15 %}$ .”
I basically rule out lists of probabilities after the verb relating to each subclaim, as it’s very mentally taxing to relate each probability to each claim:
- “I think $_{55 %, 80 %, 15 %}$ ⸤it’ll rain tomorrow⸥, but ⸤Tuesday is going to be sunny⸥, but I don’t think ⸤your uncle is going to be happy about that⸥.

Since the people writing the text reporting probabilities are probably logically non-omniscient bounded agents, it might as well be useful to report the time or effort one has spent on refining the reported probability: “I reckon humanity will survive the 21st century $_{55 % : 20 h}$ ”, indicating that the speaker has reflected on this question for 20 hours to arrive at their current probability (something akin to reporting an “epistemic effort” for a piece of information). I fear that this notation is getting into cumbersome territory and won’t be using it.

Notation Options and Difficulties

There are three available options: Either ones writing platform supports HTML, in which case one can use the <sub>18</sub> tags (giving $_{18 %}$ ), or it supports $L A T E X$ , which creates a sligthly fancier looking but also more fragile notation using _{18\%} (resulting in $_{18 %}$ ), or ones platform directly supports subscripting, such as pandoc with ~18%~, but not Reddit Markdown (which does support superscript). More info about other platforms here.

Ideally one would simply use Unicode subscripts, which are available for all digits, but tragically not for the percentage sign ‘%’ or a simple dot ‘.’. Perhaps a project for the future: After all, they did include a subscript ‘+’₊, a subscript ‘-’₋, equality sign ‘=’₌ and parentheses ‘()’₍₎, but many subscript letters (b, c, d, f, g, j, q, r, u, v, w, y and z) are still missing…

Applications

I’ve used this notation sparingly but increasingly, a good example of a first exploration is here.

Fischer 2023 uses a different notation:

Given hedonism and conditional on sentience, we think (credence: 0.7) that none of the vertebrate nonhuman animals of interest have a welfare range that’s more than double the size of any of the others. While carp and salmon have lower scores than pigs and chickens, we suspect that’s largely due to a lack of research.
Given hedonism and conditional on sentience, we think (credence: 0.65) that the welfare ranges of humans and the vertebrate animals of interest are within an order of magnitude of one another.
Given hedonism and conditional on sentience, we think (credence 0.6) that all the invertebrates of interest have welfare ranges within two orders of magnitude of the vertebrate nonhuman animals of interest. Invertebrates are so diverse and we know so little about them; hence, our caution.

The notation proposed here would change the text:

Given hedonism and conditional on sentience, we think that none of the vertebrate nonhuman animals of interest have a welfare range that’s more than double the size of any of the others $_{70 %}$ . While carp and salmon have lower scores than pigs and chickens, we suspect that’s largely due to a lack of research.
Given hedonism and conditional on sentience, we think that the welfare ranges of humans and the vertebrate animals of interest are within an order of magnitude of one another $_{65 %}$ .
Given hedonism and conditional on sentience, we think that all the invertebrates of interest have welfare ranges within two orders of magnitude of the vertebrate nonhuman animals of interest $_{60 %}$ . Invertebrates are so diverse and we know so little about them; hence, our caution.