Rating my AI Predictions

Link post

9 months ago I predicted trends I expected to see in AI over the course of 2023. Here’s how I did (bold indicates they happened, italics indicates they didn’t, neither-bold-nor-italics indicates unresolved):

  1. ChatGPT (or successor product from OpenAI) will have image-generating capabilities incorporated by end of 2023: 70%

  2. No papers or press releases from OpenAI/​Deepmind/​Microsoft about incorporating video parsing or generation into production-ready LLMs through end of 2023: 90%

  3. All publicly released LLM models accepting audio input by the end of 2023 use audio-to-text-to-matrices (e.g. transcribe the audio before passing it into the LLM as text) (conditional on the method being identifiable): 90%

  4. All publicly released LLM models accepting image input by the end of 2023 use image-to-matrices (e.g. embed the image directly in contrast to taking the image caption) (conditional on the method being identifiable): 70%

  5. At least one publicly-available LLM incorporates at least one Quick Query tool by end of 2023 (public release of Bing chat would resolve this as true): 95%

  6. ChatGPT (or successor product from OpenAI) will use use at least one Quick Query tool by end of 2023: 70%

  7. The time between the first public release of an LLM-based AI that uses tools, and one that is allowed to arbitrarily write and execute code is >12 months: 70%

  8. >24 months: 50%

  9. No publicly available product by the end of 2023 which is intended to make financial transactions (e.g. “buy an egg whisk” actually uses your credit card to buy the egg whisk, not just add it to your shopping cart): 90%

See the original post for evidence/​justification for why these did/​didn’t resolve true. If you disagree with any of these, let me know. Note especially for prediction 2, about video processing LLM, I am NOT counting Gemini as a “production-ready” model using video inputs because Google has confirmed they “used still images and fed text prompts” to make the trailer.

Some commentary:

  • The only prediction I got directionally wrong was the audio inputs, due to Gemini. I think this was just an unforced error on my end, where I did a bad job translating my March 2023 world model into a percentage.

  • On these predictions, my a Brier score is .1575 and my log-score is .519.

  • Calibration graph (n=7):

  • What I see as the two things I could fix to improve my predictions/​calibration going forward:

    • I got a 90% prediction wrong. I went into the thought process that led to that prediction above, but on some level I think I just made the mistake of not phrasing my wording carefully enough. Namely, I framed my prediction as an “all” statement, while mentally treating it as a “most” statement. If I’m going to make such predictions, I need to consider the counterfactual far more carefully. All it took to prove me wrong was a single major company deciding to take a shot at a technically-feasible task, which I should have assigned more than a 10% probability to.

    • I was otherwise too cautious. Besides the above mistake, I hit every target, which means I was aiming too low. My natural inclination is to predict things that will happen, but then I should either give those things higher probabilities than 70%, or predict more things to bring my overall accuracy down to a 70%.