Gwern has wondered about a use-case for subscripts in hypertext. While
they have settled on a specific use-case, namely years for
citations, I propose a different one:
reporting explicit probabilities.
Explicitely giving for probabilities in day-to-day English text is usually
quite clunky: “I assign 35% to North Korea testing an intercontinental
ballistic missile until the end of this year” reads far less smoothly
than “I don’t think North Korea will test an intercontinental ballistic
missile this year”.
And since subscripts are a solution in need of a problem, one can wonder
how well those two fit together: Quite well, I claim.
In short, I propose to append probabilities in subscript after a statement
using standard HTML subscript notation (or LATEX as a fallback if
it’s available), with the probability possibly also being a link to a
relevant forecasting platform with the same question:
I think Donald Trump is going to be incarcerated before 203065%.
This is almost as readable as the sentence without the probability.
There are some complications with negations in sentences or multiple
statements. For the most part, I’ll simply avoid such cases (“Doctor,
it hurts when I do this!” “Don’t do that, then.”), but if I had to,
I’d solve the first problem by declaring that the probability applies to
the literal meaning of the previous sentence, including all negations;
the problem with multiple statements is solved by delimiters.
As an example for the different kinds of negation: “The train won’t
come more than 5 minutes late90%” would (arguendo) mean the
same thing as “I don’t think the train will come more than 5 minutes
late90%” means the same as “The train will take more than 5
minutes to arrive10%” equivalent to “I assign 90% probability
to the train arriving within the next 5 minutes”.
With multiple statements, my favorite way of delimiting is currently
half brackets: “I think ⸤it’ll rain tomorrow⸥55%, but
⸤Tuesday is going to be sunny⸥80%, but I don’t think
⸤your uncle is going to be happy about that⸥15%.”
The probabilities in this context aren’t quite
evidentials, but neither
are they veridicals nor
miratives, I propose the world
“credal” for this category.
Enumerating Possible Notations
The exact place of insertion is subtle: In sentences with a
single central statement, there are multiple locations one could place
the probability.
After the verb related to belief: “I think55% it’ll rain tomorrow.”
Advantage: Close to the word relating to the belief (which could reflect the strength of belief in itself, using “guess”/”wager”/”think”/”believe”).
Disadvantages:
Conflicts with assigning probabilities to multiple statements.
Puts visual clutter before the statement in question.
At the end of the statement: “I think it’ll rain tomorrow55%.”
Advantage: Allows assigning probabilities to simple statements (“It’ll rain tomorrow55%”) and to multiple statements (see below).
Disadvantage: If the probability is intended to contextualise the statement, this context is weaker if it is introduced after the statement in question.
At the subject of the sentence: “I55% think it’ll rain tomorrow.”
Advantage: This can be used to distinguish the beliefs of different people. “I55% think it’ll rain tomorrow, but Cú Chulainn22% is skeptical about it.”
Disadvantage: Putting the probability before the statement the probability is about feels quite unnatural.
This becomes trickier in sentences with multiple statements.
Probabilities after each subclaim: “I think it’ll rain tomorrow55%, but Tuesday is going to be sunny80%, but I don’t think your uncle is going to be happy about that15%.
Adding in delimiters to denote a specific subclaim the probability is about. I wonder whether there are better unicode characters for this, corner brackets might be a good candidate.
Lower half brackets (or Quine corners which look almost the same): “I think ⸤it’ll rain tomorrow⸥55%, but ⸤Tuesday is going to be sunny⸥80%, but I don’t think ⸤your uncle is going to be happy about that⸥15%.”
Upper half brackets to the left, lower half brackets to the right: “I think ⸢it’ll rain tomorrow⸥55%, but ⸢Tuesday is going to be sunny⸥80%, but I don’t think ⸢your uncle is going to be happy about that⸥15%.”
Subscripted parentheses: “I think (it’ll rain tomorrow)55%, but (Tuesday is going to be sunny)80%, but I don’t think (your uncle is going to be happy about that)15%.”
Subscripted half guillemets: “I think ‹it’ll rain tomorrow›55%, but ‹Tuesday is going to be sunny›80%, but I don’t think ‹your uncle is going to be happy about that›15%.”
And subscripted full guillemets: “I think «it’ll rain tomorrow»55%, but «Tuesday is going to be sunny»80%, but I don’t think «your uncle is going to be happy about that»15%.”
I basically rule out lists of probabilities after the verb relating to each subclaim, as it’s very mentally taxing to relate each probability to each claim:
“I think55%,80%,15% ⸤it’ll rain tomorrow⸥, but ⸤Tuesday is going to be sunny⸥, but I don’t think ⸤your uncle is going to be happy about that⸥.
Since the people writing the text
reporting probabilities are probably logically
non-omniscientbounded agents, it might as
well be useful to report the time or effort one has spent on refining
the reported probability: “I reckon humanity will survive the 21st
century55%:20h”, indicating that the speaker has reflected
on this question for 20 hours to arrive at their current probability
(something akin to reporting an “epistemic effort” for a piece of
information). I fear that this notation is getting into cumbersome
territory and won’t be using it.
Notation Options and Difficulties
There are three available options: Either ones writing platform supports
HTML, in which case one can use the <sub>18</sub> tags (giving
18%), or it supports LATEX, which creates a sligthly
fancier looking but also more fragile notation using _{18\%} (resulting
in 18%), or ones platform directly supports subscripting, such
as pandoc with ~18%~, but not
Reddit Markdown (which does support superscript). More info about other
platforms here.
Ideally one would simply use Unicode
subscripts, which are
available for all digits, but tragically not for the percentage sign
‘%’ or a simple dot ‘.’. Perhaps a project for the future: After all,
they did include a subscript ‘+’₊, a subscript ‘-’₋, equality sign
‘=’₌ and parentheses ‘()’₍₎, but many subscript letters (b, c, d,
f, g, j, q, r, u, v, w, y and z) are still missing…
Applications
I’ve used this notation sparingly but
increasingly, a good example of a first exploration is
here.
Given hedonism and conditional on sentience, we think (credence: 0.7) that none of the vertebrate nonhuman animals of interest have a welfare range that’s more than double the size of any of the others. While carp and salmon have lower scores than pigs and chickens, we suspect that’s largely due to a lack of research.
Given hedonism and conditional on sentience, we think (credence: 0.65) that the welfare ranges of humans and the vertebrate animals of interest are within an order of magnitude of one another.
Given hedonism and conditional on sentience, we think (credence 0.6) that all the invertebrates of interest have welfare ranges within two orders of magnitude of the vertebrate nonhuman animals of interest. Invertebrates are so diverse and we know so little about them; hence, our caution.
The notation proposed here would change the text:
Given hedonism and conditional on sentience, we think that none of the vertebrate nonhuman animals of interest have a welfare range that’s more than double the size of any of the others70%. While carp and salmon have lower scores than pigs and chickens, we suspect that’s largely due to a lack of research.
Given hedonism and conditional on sentience, we think that the welfare ranges of humans and the vertebrate animals of interest are within an order of magnitude of one another65%.
Given hedonism and conditional on sentience, we think that all the invertebrates of interest have welfare ranges within two orders of magnitude of the vertebrate nonhuman animals of interest60%. Invertebrates are so diverse and we know so little about them; hence, our caution.
Subscripts for Probabilities
cross-posted from niplav.github.io
Gwern has wondered about a use-case for subscripts in hypertext. While they have settled on a specific use-case, namely years for citations, I propose a different one: reporting explicit probabilities.
Explicitely giving for probabilities in day-to-day English text is usually quite clunky: “I assign 35% to North Korea testing an intercontinental ballistic missile until the end of this year” reads far less smoothly than “I don’t think North Korea will test an intercontinental ballistic missile this year”.
And since subscripts are a solution in need of a problem, one can wonder how well those two fit together: Quite well, I claim.
In short, I propose to append probabilities in subscript after a statement using standard HTML subscript notation (or LATEX as a fallback if it’s available), with the probability possibly also being a link to a relevant forecasting platform with the same question:
This is almost as readable as the sentence without the probability.
There are some complications with negations in sentences or multiple statements. For the most part, I’ll simply avoid such cases (“Doctor, it hurts when I do this!” “Don’t do that, then.”), but if I had to, I’d solve the first problem by declaring that the probability applies to the literal meaning of the previous sentence, including all negations; the problem with multiple statements is solved by delimiters.
As an example for the different kinds of negation: “The train won’t come more than 5 minutes late90%” would (arguendo) mean the same thing as “I don’t think the train will come more than 5 minutes late90%” means the same as “The train will take more than 5 minutes to arrive10%” equivalent to “I assign 90% probability to the train arriving within the next 5 minutes”.
With multiple statements, my favorite way of delimiting is currently half brackets: “I think ⸤it’ll rain tomorrow⸥55%, but ⸤Tuesday is going to be sunny⸥80%, but I don’t think ⸤your uncle is going to be happy about that⸥15%.”
The probabilities in this context aren’t quite evidentials, but neither are they veridicals nor miratives, I propose the world “credal” for this category.
Enumerating Possible Notations
The exact place of insertion is subtle: In sentences with a single central statement, there are multiple locations one could place the probability.
After the verb related to belief: “I think55% it’ll rain tomorrow.”
Advantage: Close to the word relating to the belief (which could reflect the strength of belief in itself, using “guess”/”wager”/”think”/”believe”).
Disadvantages:
Conflicts with assigning probabilities to multiple statements.
Puts visual clutter before the statement in question.
At the end of the statement: “I think it’ll rain tomorrow55%.”
Advantage: Allows assigning probabilities to simple statements (“It’ll rain tomorrow55%”) and to multiple statements (see below).
Disadvantage: If the probability is intended to contextualise the statement, this context is weaker if it is introduced after the statement in question.
At the subject of the sentence: “I55% think it’ll rain tomorrow.”
Advantage: This can be used to distinguish the beliefs of different people. “I55% think it’ll rain tomorrow, but Cú Chulainn22% is skeptical about it.”
Disadvantage: Putting the probability before the statement the probability is about feels quite unnatural.
This becomes trickier in sentences with multiple statements.
Probabilities after each subclaim: “I think it’ll rain tomorrow55%, but Tuesday is going to be sunny80%, but I don’t think your uncle is going to be happy about that15%.
Adding in delimiters to denote a specific subclaim the probability is about. I wonder whether there are better unicode characters for this, corner brackets might be a good candidate.
Lower half brackets (or Quine corners which look almost the same): “I think ⸤it’ll rain tomorrow⸥55%, but ⸤Tuesday is going to be sunny⸥80%, but I don’t think ⸤your uncle is going to be happy about that⸥15%.”
Upper half brackets to the left, lower half brackets to the right: “I think ⸢it’ll rain tomorrow⸥55%, but ⸢Tuesday is going to be sunny⸥80%, but I don’t think ⸢your uncle is going to be happy about that⸥15%.”
Subscripted parentheses: “I think (it’ll rain tomorrow)55%, but (Tuesday is going to be sunny)80%, but I don’t think (your uncle is going to be happy about that)15%.”
Subscripted half guillemets: “I think ‹it’ll rain tomorrow›55%, but ‹Tuesday is going to be sunny›80%, but I don’t think ‹your uncle is going to be happy about that›15%.”
And subscripted full guillemets: “I think «it’ll rain tomorrow»55%, but «Tuesday is going to be sunny»80%, but I don’t think «your uncle is going to be happy about that»15%.”
I basically rule out lists of probabilities after the verb relating to each subclaim, as it’s very mentally taxing to relate each probability to each claim:
“I think55%,80%,15% ⸤it’ll rain tomorrow⸥, but ⸤Tuesday is going to be sunny⸥, but I don’t think ⸤your uncle is going to be happy about that⸥.
Since the people writing the text reporting probabilities are probably logically non-omniscient bounded agents, it might as well be useful to report the time or effort one has spent on refining the reported probability: “I reckon humanity will survive the 21st century55%:20h”, indicating that the speaker has reflected on this question for 20 hours to arrive at their current probability (something akin to reporting an “epistemic effort” for a piece of information). I fear that this notation is getting into cumbersome territory and won’t be using it.
Notation Options and Difficulties
There are three available options: Either ones writing platform supports HTML, in which case one can use the
<sub>18</sub>
tags (giving 18%), or it supports LATEX, which creates a sligthly fancier looking but also more fragile notation using_{18\%}
(resulting in 18%), or ones platform directly supports subscripting, such as pandoc with~18%~
, but not Reddit Markdown (which does support superscript). More info about other platforms here.Ideally one would simply use Unicode subscripts, which are available for all digits, but tragically not for the percentage sign ‘%’ or a simple dot ‘.’. Perhaps a project for the future: After all, they did include a subscript ‘+’₊, a subscript ‘-’₋, equality sign ‘=’₌ and parentheses ‘()’₍₎, but many subscript letters (b, c, d, f, g, j, q, r, u, v, w, y and z) are still missing…
Applications
I’ve used this notation sparingly but increasingly, a good example of a first exploration is here.
Fischer 2023 uses a different notation:
The notation proposed here would change the text: