Karma: 115

A one-ques­tion Tur­ing test for GPT-3

22 Jan 2022 18:17 UTC
84 points
• Thanks I appreciate that :) And I apologize if my comment about probability being weird came across as patronizing, it was meant to be a reflection on the difficulty I was having putting my model into words, not a comment on your understanding

• Ok this confirms you haven’t understood what I’m claiming. If I gave a list of predictions that were my true 50% confidence interval, they would look very similar to common wisdom because I’m not a superforecaster (unless I had private information about a topic, e.g. a prediction on my net worth at the end of the year or something). If I gave my true 50% confidence interval, I would be indifferent to which way I phrased it (in the same way that if I was to predict 10 coin tosses it doesn’t matter whether I predict ten heads, ten tails, or some mix of the two).

From what I can tell from your examples, the list of predictions you proposed sending to me would not have represented your true 50% confidence intervals each time—you could have sent me 5 things you are very confident will come true and 5 things you are very confident won’t come true. It’s possible to fake any given level of calibration in this way.

• I hereby offer you 2000\$ if you provide me with a list of this kind

Can you specify what you mean by ‘of this kind’, i.e. what are the criteria for predictions included on the list? Do you mean a series of predictions which give a narrow range?

• I agree there is a difference between those lists if you are evaluating everything with respect to each prediction being ‘true’. My point is that sometimes a 50% prediction is impressive when it turns out to be false, because everyone else would have put a higher percentage than 50% on it being true. The first list contains only statements that are impressive if evaluated as true, the second mixes ones that would be impressive if evaluated as true with those that are impressive if evaluated as false. If Tesla’s stock ends up at \$513, it feels weird to say ‘well done’ to someone who predicts “Tesla’s stock price at the end of the year 2020 is below 512\$ or above 514\$ (50%)”, but that’s what I’m suggesting we should do, if everyone else would have only put say a 10% chance on that outcome. If you’re saying that we should always phrase 50% predictions such that they would be impressive if evaluated as true because it’s more intuitive for our brains to interpret, I don’t disagree.

I read the post in good faith and I appreciate that it made me think about predictions and probabilities more deeply. I’m not sure how else to explain my position so will leave it here.

• Thanks for the response!

I don’t think there is any difference in those lists! Here’s why:

The impressiveness of 50% predictions can only be evaluated with respect to common wisdom. If everyone thinks P is only 10% likely, and you give it 50%, and P turns out to be true, this is impressive because you gave it a surprisingly high percentage! But also if everyone says P is 90% likely, and P turns out to be false, this is also impressive because you gave it a surprisingly low percentage!

I think what you’re suggesting is that people should always phrase their prediction in a way that, if P comes true, makes their prediction impressive because the percentage was surprisingly high, i.e.:

Most people think there is only a 20% chance that the price of a barrel of oil at the end of 2020 will be between \$50.95 and \$51.02. I think it’s 50% (surprisingly high), so you should be impressed if it turns out to be true.

But you could also say:

Most people think there is an 80% chance that the price of a barrel of oil at the end of 2020 will not be between \$50.95 and \$51.02. I think it’s only 50% (surprisingly low), so you should be impressed if it turns out to be false.

These are equally impressive (though I admit the second is phrased in a less intuitive way) - when it comes to 50% predictions, it doesn’t matter whether you evaluate it with respect to ‘it turned out to be true’ vs ‘it turned out to be false’; you’re trying to correctly represent both the percentages in both cases (i.e. the correct ratio), and the impressiveness comes from the extent to which your percentages on both sides differ from the baseline.

I think what I’m saying is that it doesn’t matter how the author phases it, when evaluating 50% predictions we should notice both when it seems surprisingly high and turns out to be true, and when it’s surprisingly low and turns out to be false, as they are both impressive.

When it comes to a list of 50% predictions, it’s impossible to evaluate the impressiveness only by looking at how many came true, since it’s arbitrary which way they are phrased, and you could equally evaluate the impressiveness by how many turned out to be false. So you have to compare each one to the baseline ratio.

Probability is weird and unintuitive and I’m not sure if I’ve explained myself very well...

• As has been noted, the impressiveness of the predictions has nothing to do with which way round they are stated; predicting P at 50% is exactly as impressive as predicting ¬P at 50% because they are literally the same. I think one only sounds more impressive when compared to the ‘baseline’ because our brains seem to be more attuned to predictions that sound surprisingly high, and we don’t seem to notice ones that seem surprisingly low. I.e., we hear: ‘there is a 40% chance that Joe Biden will be the democratic nominee’ and somehow translate that to ‘at least 40%’, and fail to consider what it implies for the other 60%.

Consider the examples given of unimpressive-sounding predictions:

• There is a 50% chance that the price of a barrel of oil at the end of 2020 will not be between \$50.95 and \$51.02

• There is a 50% chance that Tesla’s stock price at the end of the year 2020 is below \$512 or above \$514

You can immediately make these sound impressive without flipping them by inserting the word ‘only’ or ‘just’:

• There is only a 50% chance that the price of a barrel of oil at the end of 2020 will not be between \$50.95 and \$51.02

• There is just a 50% chance that Tesla’s stock price at the end of the year 2020 will be below \$512 or above \$514

Suddenly, we are forced to confront how surprisingly low this percentage is, given what you might expect from common wisdom, and it goes back to seeming impressive.

I also think it’s a mistake to confuse ‘common wisdom’ and ‘baseline’ with ‘all possible futures’ when thinking about impressiveness. If I say that there’s a 50% that the price of a barrel of oil at the end of 2020 will be between -\$1 million and \$1 million, this sounds unimpressive because I’ve chosen a very wide interval relative to common sense. But there are a lot more numbers below -\$1 million and above \$1 million than there are within it, so arguably this is actually quite a precise prediction in the space of all possible futures, but that’s not important—what matters is the common sense range /​ baseline.

(Of course, “there’s a 50% that the price of a barrel of oil at the end of 2020 will be between -\$1 million and \$1 million” is actually a very bold prediction, because it’s saying that there is a 50% chance that the price of oil will be either less than -\$1 million or above \$1 million which is surprisingly high… but we only notice it when phrased to seem surprisingly high rather than surprisingly low!)

Help fore­cast study repli­ca­tion in this so­cial sci­ence pre­dic­tion market

7 Aug 2019 18:18 UTC
29 points
• I’m constantly experimenting with it! The downside of it being so flexible is that it can take a while to figure out the best system.

At the moment, everything goes into one database called ‘Notes’. I enabled the ‘Created at’ and ‘Edited at’ properties. I also have multi-select properties for themes (e.g. rationality, productivity, economics, etc) and for type (e.g. random thought, blog idea, resource, article, tool, etc). I also have a checkbox property called ‘processed’ - and I filter the view of the table to only see the unticked items. Everything I add is by default ‘unprocessed’ (i.e. unticked) - this allows me to add stuff from the web (with the web clipper) and quick random thoughts without worrying about immediately sorting them. Every so often, I go through everything that is ‘unprocessed’ and sort it (add tags, finish reading it, add highlights or more notes, links to other notes, etc) and once I’m done I’ll tick the ‘processed’ box so it’s hidden from the default view.

I ‘favorite’ the notes I use most regularly (e.g. I have one called ‘useful info’ which has stuff like my health insurance number, wifi passwords, etc). Otherwise, I navigate by searching, or by filtering on certain tags.

I’m planning to gradually build this out into a relational system (e.g. creating a ‘project’ table and linking the notes to relevant projects, etc). I try to ‘organize opportunistically’ (as described in Part II C here—retrieved this from my Notes table in Notion!) as I find most of my attempts to impose a top-down structure are not flexible enough.

• Principles:

1. Capture everything: Do not assume your brain will remember anything. Write it down ASAP. Use whatever will let you capture it quickest, whether that’s pen and paper or a digital solution.

2. Review and process: Make sure you actually look at the things you wrote down regularly and organize it. If it was an idea you need to act on (e.g. a topic for a blog post, or a reminder to look up a particular concept), add it to your task manager. If it was a thought for reference, add it to your ‘second brain’/​note-taking/​archive system, and add tags so you can find it easily later. Once processed, archive or delete the item from your capture system.

3. “Everything should be made as simple as possible, but no simpler”: There is a balance to strike between using as few tools as possible, and using many tools that are each specialized to do one thing really well (a tool that tries to do everything tends to do nothing well).

4. Consider sustainability: In an ideal world, I would only use open-source, non-proprietary tools. But often commercial tools do a better job and/​or have a nicer interface. In these cases, ensure that you can export your data at any time in a standard format (e.g. markdown, xml). Also, be prepared to pay a subscription fee—this helps keep the tool going!

What I use: