lukeprog comments on Calibration Test with database of 150,000+ questions

lukeprog 13 Mar 2015 17:39 UTC
12 points
0
Awesome!

I’ve been dying for something like this after I zoomed through all the questions in the CFAR calibration app.

Notes so far:
- The highest-available confidence is 99%, so the lowest-available confidence should be 1% rather than 0%. Or even better, you could add 99.9% and 0.1% as additional options.
- So far I’ve come across one question that was blank. It just said Category: jewelry and then had no other text. Somehow the answer was Ernest Hemingway.
- Would be great to be able to sign up for an account so I could track my calibration across multiple sessions.
- Nanashi 13 Mar 2015 18:09 UTC
  2 points
  0
  Parent
  Re: 0%, that’s fair. Originally I included 0% because certain questions are either unanswerable (due to being blank, contextless, or whatnot) but even then there’s still a non-zero possibility of guessing the right answer out of a near-infinite number of choices.
  
  Re: Calibration across multiple sessions. Good idea. I’ll start with a local-based solution because that would be easiest and then eventually do an account-based thing.
  
  Re: Blank questions. Yeah, I should probably include some kind of check to see if the question is blank and skip it if so.
  - lukeprog 13 Mar 2015 21:57 UTC
    4 points
    0
    Parent
    Thanks! BTW, I’d prefer to have 1% and 0.1% and 99% and 99.9% as options, rather than skipping over the 1% and 99% options as you have it now.
    - Nanashi 14 Mar 2015 11:38 UTC
      2 points
      0
      Parent
      I considered that but I think at least for now it may just overcomplicate things for not a ton of benefit. Subjectively it seems that out of 100 questions, there are maybe 10 that I would assign the highest possible confidence. Of those I’d say only 1 out them would be questions that I’d pick 99% confidence if it were available instead of, say, 99.9%.
      
      So assuming (incorrectly) that I’m perfectly calibrated it would take about 7000 questions in order to stand a >50% chance of seeing a meaningful difference between the two confidence levels.
- RowanE 13 Mar 2015 17:44 UTC
  0 points
  0
  Parent
  It’s possible to be, to some extent, certain that you haven’t thought of a correct answer (if not certain you don’t know the answer), because you don’t have any answer in mind and yet are not considering the answer “this is a trick question” or “there is no correct answer”. Is this something that should be represented, making “0%” correct to include, or am I confused?
  
  I got one blank question, which I think was an error with loading since the answer came up the same as the previous question, and the one after it took a couple seconds to appear on-screen.
  - lukeprog 13 Mar 2015 17:59 UTC
    0 points
    0
    Parent
    I’d prefer not to allow 0 and 1 as available credences. But if 0 remained as an option I would just interpret it as “very close to 0” and then keep using the app, though if a future version of the app showed me my Bayes score then the difference between what the app allows me to choose (0%) and what I’m interpreting 0 to mean (“very close to 0″) could matter.
    - owencb 13 Mar 2015 19:56 UTC
      6 points
      0
      Parent
      I think it’s misleading to just drop in the statement that 0 and 1 are not probabilities.
      
      There is a reasonable and arguably better definition of probabilities which excludes them, but it’s not the standard one, and it also has costs—for example probabilities are a useful tool in building models, and it is sometimes useful to use probabilities 0 and 1 in models.
      
      (aside: it works as a kind of ‘clickbait’ in the original article title, and Eliezer doesn’t actually make such a controversial statement in the post, so I’m not complaining about that)
      - lukeprog 13 Mar 2015 21:55 UTC
        1 point
        0
        Parent
        Fair enough. I’ve edited my original comment.
        
        (For posterity: the text for my original comment’s first hyperlink originally read “0 and 1 are not probabilities”.)
        owencb 13 Mar 2015 22:16 UTC
        0 points
        0
        Parent
        Perfect, thanks!