Lawrence Phillips

Karma: 363

The near-term potential of AI forecasting for public epistemics

Lawrence Phillips18 Feb 2026 20:37 UTC

21 points

0 comments16 min readLW link

Lawrence Phillips 27 Jun 2025 11:08 UTC
3 points
0
in reply to: Hastings’s comment on: A Guide For LLM-Assisted Web Research
Good question. We don’t explicitly break this out in our analysis, but we do give models the chance to give up, and some of our instances actually require them to give up for numbers that can’t be found.

Anyway, from eyeballing results and traces, I get the sense that 70-80% of failures on the find number task are incorrect assertions rather than refusals to answer.

A Guide For LLM-Assisted Web Research

nikos, dschwarz, Lawrence Phillips and FutureSearch

26 Jun 2025 18:39 UTC

46 points

3 comments7 min readLW link

Lawrence Phillips 17 Sep 2024 20:22 UTC
6 points
0
in reply to: Daniel Kokotajlo’s comment on: Contra papers claiming superhuman AI forecasting
We’d probably try something along the lines you’re suggesting, but there are some interesting technical challenges to think through.

For example, we’d want to train the model to be good at predicting the future, not just knowing what happened. Under a naive implementation, weight updates would probably go partly towards better judgment and forecasting ability, but also partly towards knowing how the world played out after the initial training cutoff.

There are also questions around IR; it seems likely that models will need external retrieval mechanisms to forecast well for the next few years at least, and we’d want to train something that’s natively good at using retrieval tools to forecast, rather than relying purely on its crystalised knowledge.

Lawrence Phillips 13 Sep 2024 2:10 UTC
22 points
0
in reply to: Neel Nanda’s comment on: Contra papers claiming superhuman AI forecasting
Thanks Neel, we agree that we misinterpreted this. We’ve removed the claim.

Contra papers claiming superhuman AI forecasting

nikos, Peter Mühlbacher, Lawrence Phillips and dschwarz

12 Sep 2024 18:10 UTC

182 points

16 comments7 min readLW link

Unit economics of LLM APIs

dschwarz, nikos, Lawrence Phillips and kotrfa

27 Aug 2024 16:51 UTC

43 points

0 comments2 min readLW link

[EAForum xpost] A breakdown of OpenAI’s revenue

dschwarz and Lawrence Phillips

10 Jul 2024 18:09 UTC

57 points

5 comments1 min readLW link

(forum.effectivealtruism.org)

Lawrence Phillips 21 Oct 2022 9:10 UTC
4 points
0
on: How To Make Prediction Markets Useful For Alignment Work
For anyone who’d like to see questions of this type on Metaculus as well, there’s this thread. For certain topics (alignment very much included), we’ll often do the legwork of operationalizing suggested questions and posting them on the platform.

Side note: we’re working on spinning up what is essentially an AI forecasting research program; part of that will involve predicting the level of resources allocated to, and the impact of, different approaches to alignment. I’d be very glad to hear ideas from alignment researchers as to how to best go about this, and how we can make its outputs as useful as possible. John, if you’d like to chat about this, please DM me and we can set up a call.

Annual AGI Benchmarking Event

Lawrence Phillips27 Aug 2022 0:06 UTC

24 points

3 comments2 min readLW link

(www.metaculus.com)

Lawrence Phillips 3 Jan 2020 21:02 UTC
12 points
0
on: [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges
Nice work. A few comments/questions:
- I think you’re being harsh on yourselves by emphasising the cost/benefit ratio. For one, the forecasters were asked to predict Elizabeth van Norstrand’s distributions rather than their mean, right? So this method of scoring would actually reward them for being worse at their jobs, if they happened to put all their mass near the resolution’s mean as opposed to predicting the correct distribution. IMO a more interesting measure is the degree of agreement between the forecasters’ predictions and Elizabeth’s distributions, although I appreciate that that’s hard to condense into an intuitive statistic.
- An interesting question this touches on is “Can research be parallelised?”. It would be nice to investigate this more closely. It feels as though different types of research questions might be amenable to different forms of parallelisation involving more or less communication between individual researchers and more or less sophisticated aggregation functions. For example, a strategy where each researcher is explicitly assigned a separate portion of the problem to work on, and at the end the conclusions are synthesised in a discussion among the researchers, might be appropriate for some questions. Do you have any plans to explore directions like these, or do you think that what you did in this experiment (as I understand, ad-hoc cooperation among the forecasters with each submitting a distribution, these then being averaged) is appropriate for most questions? If so, why?
- About the anticorrelation between importance and “outsourceablilty”: investigating which types of questions are outsourceable would be super interesting. You’d think there’d be some connection between outsourceable questions and parallelisable problems in computer science. Again, different aggregation functions/incentive structures will lead to different questions being outsourcable.
- One potential use case for this kind of thing could be as a way of finding reasonable distributions over answers to questions that require so much information that a single person or small group couldn’t do the research in an acceptable amount of time or correctly synthesise their conclusions by themselves. One could test how plausible this is by looking at how aggregate performance tracks complexity on problems where one person can do the research alone. So an experiment like the one you’ve done, but on questions of varying complexity, starting from trivial up to the limit of what’s feasible.

Lawrence Phillips

The near-term po­ten­tial of AI fore­cast­ing for pub­lic epistemics

A Guide For LLM-As­sisted Web Research

Con­tra pa­pers claiming su­per­hu­man AI forecasting

Unit eco­nomics of LLM APIs

[EAFo­rum xpost] A break­down of OpenAI’s revenue

An­nual AGI Bench­mark­ing Event