I wanted to ask what you thought about the LLM-forecasting papers in relation to this literature? Do you think there are any ways of applying the uncertainty estimation literature to improve the forecasting ability of AI?:
I’m actually not familiar with the nitty gritty of the LLM forecasting papers. But I’ll happily give you some wild guessing :)
My blind guess is that the “obvious” stuff is already done (e.g. calibrating or fine-tuning single-token outputs on predictions about facts after the date of data collection), but not enough people are doing ensembling over different LLMs to improve calibration.
I also expect a lot of people prompting LLMs to give probabilities in natural language, and that clever people are already combining these with fine-tuning or post-hoc calibration. But I’d bet people aren’t doing enough work to aggregate answers from lots of prompting methods, and then tuning the aggregation function based on the data.
This was a great post, thank you for making it!
I wanted to ask what you thought about the LLM-forecasting papers in relation to this literature? Do you think there are any ways of applying the uncertainty estimation literature to improve the forecasting ability of AI?:
https://arxiv.org/pdf/2402.18563.pdf
I’m actually not familiar with the nitty gritty of the LLM forecasting papers. But I’ll happily give you some wild guessing :)
My blind guess is that the “obvious” stuff is already done (e.g. calibrating or fine-tuning single-token outputs on predictions about facts after the date of data collection), but not enough people are doing ensembling over different LLMs to improve calibration.
I also expect a lot of people prompting LLMs to give probabilities in natural language, and that clever people are already combining these with fine-tuning or post-hoc calibration. But I’d bet people aren’t doing enough work to aggregate answers from lots of prompting methods, and then tuning the aggregation function based on the data.