I’ve been dying for something like this after I zoomed through all the questions in the CFAR calibration app.
Notes so far:
The highest-available confidence is 99%, so the lowest-available confidence should be 1% rather than 0%. Or even better, you could add 99.9% and 0.1% as additional options.
So far I’ve come across one question that was blank. It just said Category: jewelry and then had no other text. Somehow the answer was Ernest Hemingway.
Would be great to be able to sign up for an account so I could track my calibration across multiple sessions.
Re: 0%, that’s fair. Originally I included 0% because certain questions are either unanswerable (due to being blank, contextless, or whatnot) but even then there’s still a non-zero possibility of guessing the right answer out of a near-infinite number of choices.
Re: Calibration across multiple sessions. Good idea. I’ll start with a local-based solution because that would be easiest and then eventually do an account-based thing.
Re: Blank questions. Yeah, I should probably include some kind of check to see if the question is blank and skip it if so.
I considered that but I think at least for now it may just overcomplicate things for not a ton of benefit. Subjectively it seems that out of 100 questions, there are maybe 10 that I would assign the highest possible confidence. Of those I’d say only 1 out them would be questions that I’d pick 99% confidence if it were available instead of, say, 99.9%.
So assuming (incorrectly) that I’m perfectly calibrated it would take about 7000 questions in order to stand a >50% chance of seeing a meaningful difference between the two confidence levels.
It’s possible to be, to some extent, certain that you haven’t thought of a correct answer (if not certain you don’t know the answer), because you don’t have any answer in mind and yet are not considering the answer “this is a trick question” or “there is no correct answer”. Is this something that should be represented, making “0%” correct to include, or am I confused?
I got one blank question, which I think was an error with loading since the answer came up the same as the previous question, and the one after it took a couple seconds to appear on-screen.
I’d prefer not to allow 0 and 1 as available credences. But if 0 remained as an option I would just interpret it as “very close to 0” and then keep using the app, though if a future version of the app showed me my Bayes score then the difference between what the app allows me to choose (0%) and what I’m interpreting 0 to mean (“very close to 0″) could matter.
I think it’s misleading to just drop in the statement that 0 and 1 are not probabilities.
There is a reasonable and arguably better definition of probabilities which excludes them, but it’s not the standard one, and it also has costs—for example probabilities are a useful tool in building models, and it is sometimes useful to use probabilities 0 and 1 in models.
(aside: it works as a kind of ‘clickbait’ in the original article title, and Eliezer doesn’t actually make such a controversial statement in the post, so I’m not complaining about that)
Awesome!
I’ve been dying for something like this after I zoomed through all the questions in the CFAR calibration app.
Notes so far:
The highest-available confidence is 99%, so the lowest-available confidence should be 1% rather than 0%. Or even better, you could add 99.9% and 0.1% as additional options.
So far I’ve come across one question that was blank. It just said Category: jewelry and then had no other text. Somehow the answer was Ernest Hemingway.
Would be great to be able to sign up for an account so I could track my calibration across multiple sessions.
Re: 0%, that’s fair. Originally I included 0% because certain questions are either unanswerable (due to being blank, contextless, or whatnot) but even then there’s still a non-zero possibility of guessing the right answer out of a near-infinite number of choices.
Re: Calibration across multiple sessions. Good idea. I’ll start with a local-based solution because that would be easiest and then eventually do an account-based thing.
Re: Blank questions. Yeah, I should probably include some kind of check to see if the question is blank and skip it if so.
Thanks! BTW, I’d prefer to have 1% and 0.1% and 99% and 99.9% as options, rather than skipping over the 1% and 99% options as you have it now.
I considered that but I think at least for now it may just overcomplicate things for not a ton of benefit. Subjectively it seems that out of 100 questions, there are maybe 10 that I would assign the highest possible confidence. Of those I’d say only 1 out them would be questions that I’d pick 99% confidence if it were available instead of, say, 99.9%.
So assuming (incorrectly) that I’m perfectly calibrated it would take about 7000 questions in order to stand a >50% chance of seeing a meaningful difference between the two confidence levels.
It’s possible to be, to some extent, certain that you haven’t thought of a correct answer (if not certain you don’t know the answer), because you don’t have any answer in mind and yet are not considering the answer “this is a trick question” or “there is no correct answer”. Is this something that should be represented, making “0%” correct to include, or am I confused?
I got one blank question, which I think was an error with loading since the answer came up the same as the previous question, and the one after it took a couple seconds to appear on-screen.
I’d prefer not to allow 0 and 1 as available credences. But if 0 remained as an option I would just interpret it as “very close to 0” and then keep using the app, though if a future version of the app showed me my Bayes score then the difference between what the app allows me to choose (0%) and what I’m interpreting 0 to mean (“very close to 0″) could matter.
I think it’s misleading to just drop in the statement that 0 and 1 are not probabilities.
There is a reasonable and arguably better definition of probabilities which excludes them, but it’s not the standard one, and it also has costs—for example probabilities are a useful tool in building models, and it is sometimes useful to use probabilities 0 and 1 in models.
(aside: it works as a kind of ‘clickbait’ in the original article title, and Eliezer doesn’t actually make such a controversial statement in the post, so I’m not complaining about that)
Fair enough. I’ve edited my original comment.
(For posterity: the text for my original comment’s first hyperlink originally read “0 and 1 are not probabilities”.)
Perfect, thanks!