Toby_Ord

Karma: 372

Toby_Ord 4 Nov 2025 21:59 UTC
7 points
0
in reply to: Toby_Ord’s comment on: I ate bear fat with honey and salt flakes, to prove a point
I found the thread!

Toby_Ord 4 Nov 2025 21:54 UTC
14 points
0
on: I ate bear fat with honey and salt flakes, to prove a point
Love it!
I’m actually kind of to blame for this whole honey/fat/salt thing.
Eliezer had long used ice cream as an example of a super-stimulus that hacks our evolved tastes by delivering a combination of sweetness/fat/salt that was beyond anything available in our ancestral environment. I’d realised that this wasn’t actually true as honey, animal fat and salt were all available in the ancestral environment and when I saw him giving this argument again I weighed in with this example (sadly I can’t find the thread). I was probably a bit rude, but Eliezer sportingly engaged on it and I find it hilarious that he is now using this as an example, that you tried it, and that it is actually quite good!
(I can’t remember who decided that it should be ‘bear fat’ in particular — that might have been Eliezer’s addition — it does fit nicely with the honey!)

Toby_Ord 25 Oct 2025 12:28 UTC
5 points
0
in reply to: Daniel Kokotajlo’s comment on: How Well Does RL Scale?
I’m a bit confused here. Your first paragraph seems to end up agreeing with me? i.e. that RL scaling derives most of its importance from enabling inference-scaling and is dependent on it. I’m not sure we really have any disagreement there — I’m not saying people will stop doing any RL.
Re WTP, I do think it is quite hard to scale. For example consider consumer use. Many people are paying ~$1 per day for AI access (the $20/month subscriptions). If companies need to 1000x inference in order to get the equivalent of a GPT level, then consumers would need to pay ~$1000 per day, which most people won’t do (and can’t do). Indeed, I think $10 per day is about the upper limit of what we could see for most people in the nearish future (=$3,650 per year, which is much more than they pay for their computer plus phone). Maybe $30 per day, if it reaches the total cost of owning a car (still only 1.5 OOM above current prices). But I can’t really imagine it reaching that level for just the current amount of use (at higher intelligence) — I think that would only be reached if there were much more use too. Therefore, I see only 1 OOM increase in cost per query being possible here for consumer use, which means an initial 1 OOM of inference scaling after which the inference used could increase at the speed of efficiency gains (0.5 OOM per year) keeping a constant price (and meaning it absorbs the efficiency gains).
But it is different for non-consumer use-cases. Maybe there are industrial areas where it is more plausible to be willing to pay 100x or 1000x as much for the same number of queries to a somewhat more intelligent system (e.g. coding). I’m a bit skeptical though. I really think current scaling paying for itself was driven by being able to scale up the number of users and the amount of queries per API user, and these stop working here, which is a big deal.

Toby_Ord 23 Oct 2025 17:15 UTC
6 points
0
in reply to: Daniel Kokotajlo’s comment on: How Well Does RL Scale?
I do think that progress will slow down, though its not my main claim. My main claim is to do with the tailwind of compute scaling will become weaker (unless some new scaling paradigm appears or a breakthrough saves this one). That is a piece in the puzzle of whether overall AI progress will accelerate or decelerate and I’d ideally let people form their own judgments about the other pieces (e.g. whether recursive self improvement will work, or whether funding will collapse in a market correction, taking away another tailwind of progress). But having a major boost to AI progress (compute scaling) become less of a boost is definitely some kind of an update towards lower AI progress than you were otherwise expecting.
Part of the issue with inference scaling as the main surviving form of scaling depends on how many more OOMs are needed. If it is 100x, there isn’t so much impact. If we need to 1,000x or 1,000,000x it from here, it is more of an issue.
In that prior piece I talked about inference-scaling as a flow of costs, but it also scales with things beyond time:
- costs grow in proportion to time (can’t make up the costs by longer use before the new model)
- costs grow in proportion to number of users (can’t make up the costs through market expansion)
- costs grow in proportion to the amount of use by each user (can’t make up costs through intensity of use)
This is a big deal. If you want to 100x the price of inference going into each query, how can you make that up and still be profitable? I think you need to 100x the willingness-to-pay from each user for each query. That is very hard. My guess is that the WTP doesn’t scale with inference compute in this way, and thus that inference can only be 10x-ed when algorithmic efficiency gains and falling chip costs have divided the cost per token by 10. So I think that while previous rounds of training compute scaling could pay for themselves in the marketplace, I think that will stop for most users soon, and for specialist users a bit later.
The idea here is that the changing character of scaling affects the business model, making it so that it is no longer self-propelling to keep scaling, and that this will mean the compute scaling basically stops.

PS
Thanks for pointing out that second quote “Now that RL-training…” — I think that does come across a bit stronger than I intended.

Toby_Ord 23 Oct 2025 9:44 UTC
13 points
2
in reply to: Daniel Kokotajlo’s comment on: How Well Does RL Scale?
I agree that separately from its direct boost to performance at the same inference-compute, RL training also helps enable more inference scaling. I talk about that above when I say “this RL also unlocked the ability to productively use much longer chains of thought (~30x longer in this example). And these longer chains of thought contributed a much larger boost.”
A key thing I’m trying to get across is that I think this is where most of the benefit from RL is coming from. i.e. that while you pay the RL scaling costs at training time, you also need to pay the inference scaling costs at deployment time in order to get the benefit. Thus, RL is not an alternative to a paradigm where we cost-per-use is going to 10x, 100x, 1000x etc in order to keep seeing benefits.
Many people have said the opposite. The reason I looked into this is that people such as Dan Hendrycks and Josh You pushed back on my earlier statements that most of the gain has been in enabling longer CoT and the scaling of inference, saying respectively that it makes models much better at the same token-budget and that we’re witnessing a one-off scale up of token budget but further gains will come from RL scaling. I think I’ve delivered clear evidence against those takes.
You’d probably enjoy my post on Inference scaling reshapes AI governance and the whole series on my website. I think they paint a picture where compute scaling is becoming a smaller tailwind for AI progress and where it is changing in character.

Toby_Ord 23 Oct 2025 8:13 UTC
3 points
0
in reply to: Vladimir_Nesov’s comment on: How Well Does RL Scale?
Yes, you would get an optimal allocation with non-zero amounts to each. A simple calculation suggests 1:2 ratio of RL-OOMs : Inference-OOMs. e.g. scaling up RL by 100x and inference by 10,000x. So it could easily lead to RL compute becoming an ever-smaller fraction of FLOPs. But there are additional complications from the fact that inference is a flow of costs and also increases with the number of users, while RL is a fixed cost.
On the simple model and with my scaling numbers, the contribution of RL to capabilities (keeping token-use fixed) would be 20% — a 1:4 ratio with inference because half as many OOMs and half the effect per OOM.
The main relevance of all this to me is that even if people keep doing RL, RL alone won’t contribute much to benchmark performance. I think it would need to 100,000x current total training compute to gain the equivalent of just 100x on pretraining in the early years. So if pre-training is slowing, AI companies lack any current method of effective compute scaling based solely around training compute and one-off costs.

Toby_Ord 22 Oct 2025 21:56 UTC
6 points
1
in reply to: Toby_Ord’s comment on: How Well Does RL Scale?
Actually, here is a slightly simpler way to think about it. How many more training steps do you do with RL when you 100x the compute? Given the linear episode length growth, you only do root(100) = 10x the number of training steps. So if capability gain were linear in the log of the number of training steps, it would grow as log(root(compute)) = log(compute)/2, whereas for pretraining it would grow as log(compute). So if inference-scaling were going as well as pre-training scaling (contra the ³⁄₂ estimate I appealed to in my piece) then the information inefficiency theoretical explanation could exactly account for the observed scaling behaviour.
I’m not sure this is right (there were a couple of biggish assumptions there) but it does feel closer to being able to be a larger part of the actual explanation.

Toby_Ord 22 Oct 2025 21:09 UTC
6 points
0
in reply to: Jacob_Hilton’s comment on: How Well Does RL Scale?
Thanks Jacob. It is less of a mathematical mistake and more me trying to make a qualitative connection between the observed poor scaling of RL training and theoretical mechanism I’d just written about of poor information efficiency, both of which look very big. I agree that the theoretical explanation doesn’t seem to be quite the right shape to explain the empirical issue.
Of your potential reasons, I do think longer episodes is part of it. The R1 paper has a chart on page 8 showing that without trying to affect episode lengths, they increased linearly from 500 tokens to ~9000 tokens over 8000 episodes, suggesting pretty much 1 token increase per episode on average. Thus the information efficiency was going down linearly with episodes during training. It is a bit tricky to compare this with the o1 chart, whose x-axis is both logarithmic and also measuring training compute rather than episode number. I think this means it should be declining as 1/episodes = 1/root(compute) — since training compute would be growing as the square of the number of episodes. And I think that just accounts for the power being 0.5 lower than pretraining, rather than the 3 that I’m claiming. (But its late, and I haven’t checked this through on paper.)
I do think you are right that the information inefficiency can’t explain the whole issue, but it might be able to explain part of it. i.e. a shift to the left can’t explain a line with a different slope, but the slope changing part of the way there could be part of the explanation.

Toby_Ord 22 Oct 2025 10:13 UTC
5 points
0
in reply to: Jacob_Hilton’s comment on: Jacob_Hilton’s Shortform
Thanks for this Jacob — excellent analysis.
I’m a huge fan of Bradley-Terry models. I’m quite sure they are the natural way of representing noisy contests like chess ability and that Elo is an inferior way. They key thing with Bradley-Terry is that each competitor has a raw ability score (e.g. A and B) and that then when they have a contest the odds of A beating B is just A:B. I think of it as each player puts a number of tickets of their colour into a hat and then one is drawn at random determining the winner. This is an even simpler interpretation than the one from the Hex paper and makes the ²⁄₃ result even more intuitive.
Elo then takes the log base 10 and multiplies by 400 and then adds 1200 or so to make the numbers usually positive, injecting three (!) arbitrary constants into the mix in order to give an additive scale that matched the pre-Elo chess rating scale — but the natural interpretation of these contests is a multiplicative scale (the ratio of the numbers is the odds ratio of winning) so it should have been left alone. Linear progress in Elo is really exponential progress in the raw quantity.
I like your idea of assuming random difficulties for the different tasks (from some distribution that could be tweaked), as clearly that is part of the real underlying phenomenon. However, it is weird that you compare the highest number the agent draws from the hat to the highest number the task draws. More natural would be to have to take on the tasks one by one in a gauntlet of challenges of varying difficulty. e.g. that the probability of success is $Π_{1}^{n} p_{i}$ instead of my $Π_{1}^{n} p$ where $p_{i}$ is a random variable drawn from some natural distribution over [0,1] that is modified by the agent’s skill and represents the probability of succeeding at that subtask. There should be limiting cases where all $p_{i}$ are equal (my case) and where it is driven by the hardest one. But I’m not sure what distribution this creates.
That said, I like where you are going with this and how you eliminate one of the parameters.
I definitely see my constant hazard rate model as a first order approximation to what is going on, and not the full story. I’m surprised it works as well as it does because the underlying phenomenon has more structure than this. So I see it just as something of a null hypothesis for other approaches to beat, and do expect it to eventually be beaten.

Toby_Ord 29 Jan 2025 13:22 UTC
5 points
0
on: MONA: Managed Myopia with Approval Feedback
Thanks — this looks promising.
One thing I noticed is that there is an interesting analogy between your model and a fairly standard model in economics where society consists of a representative agent in each time period (representing something like a generation, but without overlap) each trying to maximise its own utility. They can plan based on the utilities of subsequent generations (e.g. predicting that the next generation will undo this generation’s policies on some topic) but they don’t inherently value those utilities. This is then understood via the perspective of a planner who wants to maximise the (discounted) sum of future utilities, even though each agent in the model is only trying to maximise their own utility.
This framework is rich enough to exhibit various inter-generational policy challenges, such as an intergenerational prisoner’s dilemma where you can defect or cooperate on the following generation or the possibility of the desire of a generation to tie the hands of future generations or even the desire to stop future generations tying the hands of generations that follow them.

Toby_Ord 3 Oct 2022 8:46 UTC
3 points
0
on: Better impossibility result for unbounded utilities
This is an interesting theorem which helps illuminate the relationship between unbounded utilities and St Petersburg gambles. I particularly appreciate that you don’t make an explicit assumption that the values of gambles must be representable by real numbers which is very common, but unhelpful in a setting like this. However, I do worry a bit about the argument structure.
The St Petersburg gamble is a famously paradox-riddled case. That is, it is a very difficult case where it isn’t clear what to say, and many theories seem to produce outlandish results. When this happens, it isn’t so impressive to say that we can rule out an opposing theory because in that paradox-riddled situation it would lead to strange results. It strikes me as similar to saying that a rival theory leads to strange result in variable population-size cases so we can reject it (when actually, all theories do), or that it leads to strange results in infinite population cases (when again, all theories do).
Even if one had a proof that an alternative theory doesn’t lead to strange conclusions in the St Petersburg gamble, I don’t think this would count all that much in its favour. As it seems plausible to me that various rules of decision theory that were developed in the cleaner cases of finite possibility spaces (or well-behaved infinite spaces) need to be tweaked to account for more pathological possibility spaces. For a simple example, I’m sympathetic to the sure thing principle, but it directly implies that the St Petersburg Gamble is better than itself, because an unresolved gamble is better than a resolved one, no matter how the latter was resolved. My guess is that this means the sure thing principle needs to have its scope limited to exclude gambles whose value is higher than that of any of their resolutions.

Toby_Ord 27 Oct 2014 12:48 UTC
9 points
0
on: Is the potential astronomical waste in our universe too small to care about?
Regarding your question, I don’t see theoretical reasons why one shouldn’t be making deals like that (assuming one can and would stick to them etc). I’m not sure which decision theory to apply to them though.

Toby_Ord 27 Oct 2014 12:46 UTC
6 points
1
on: Is the potential astronomical waste in our universe too small to care about?
The Moral Parliament idea generally has a problem regarding time. If it is thought of as making decisions for the next action (or other bounded time period), with new distribution of votes etc when the next choice comes up, then there are intertemporal swaps (and thus pareto improvements according to each theory) that it won’t be able to achieve. This is pretty bad, as it at least appears to be getting pareto dominated by another method. However, if it is making one decision for all time over all policies for resolving future decisions, then (1) it is even harder to apply in real life than it looked, and (2) it doesn’t seem to be able to deal with cases where you learn more about ethics (i.e. update your credence function over moral theories) -- at least not without quite a bit of extra explanation about how that works. I suppose the best answer may well be that the policies over which the representatives are arguing include branches dealing with all ways the credences could change, weighted by their probabilities. This is even more messy.

My guess is that of these two broad options (decide one bounded decision vs decide everything all at once) the latter is better. But either way it is a bit less intuitive than it first appears.

Toby_Ord 7 Oct 2014 10:47 UTC
1 point
0
on: A possible tax efficient swap mechanism for charity
This is a good idea, though not a new one. Others have abandoned the idea of a formal system for this on the grounds that:

1) It may be illegal 2) Quite a few people think it is illegal or morally dubious (whether or not it is actually illegal or immoral)

It would be insane to proceed with this without confirming (1). If illegal, it would open you up to criminal prosecution, and more importantly, seriously hurt the movements you are trying to help. I think that whether or not it turns out to be illegal, (2) is sufficient reason to not pursue it. It may cause serious reputational damage to the movement which I’d expect to easily outweigh the financial benefits.

I also think that the 10% to 20% boost is extremely optimistic. That would only be achieved if almost everyone was using it and they all wanted to spend most of their money funding charities that don’t operate in their countries. I’d expect something more like a boost of a few percent.

Note that there are also very good alternatives. One example is a large effort to encourage people to informally do this in a non-matched way by donating to the subset of effective charities that are tax deductable in their country. This could get most of the benefits for none of the costs.

Toby_Ord 19 Aug 2014 16:11 UTC
11 points
0
on: The metaphor/myth of general intelligence
This is a really nice and useful article. I particularly like the list of problems AI experts assumed would be AI-complete, but turned out not to be.

I’d add that if we are trying to reach the conclusion that “we should be more worried about non-general intelligences than we currently are”, then you don’t need it to be true that general intelligences are really difficult. It would be enough that “there is a reasonable chance we will encounter a dangerous non-general one before a dangerous general one”. I’d be inclined to believe that even without any of the theorising about possibility.

I think one reason for the focus on ‘general’ in the AI Safety community is that it is a stand in for the observation that we are not worried about path planners or chess programs or self-driving cars etc. One way to say this is that these are specialised systems, not general ones. But you rightly point out that it doesn’t follow that we should only be worried about completely general systems.

Toby_Ord 17 Jun 2014 7:40 UTC
16 points
0
on: Some alternatives to “Friendly AI”
Thanks for bringing this up Luke. I think the term ‘friendly AI’ has become something of an albatross around our necks as it can’t be taken seriously by people who take themselves seriously. This leaves people studying this area without a usable name for what they are doing. For example, I talk with parts of the UK government about the risks of AGI. I could never use the term ‘friendly AI’ in such contexts—at least without seriously undermining my own points. As far as I recall, the term was not originally selected with the purpose of getting traction with policy makers or academics, so we shouldn’t be too surprised if we can see something that looks superior for such purposes. I’m glad to hear from your post that ‘AGI safety’ hasn’t rubbed people up the wrong way, as feared.

It seems from the poll that there is a front runner, which is what I tend to use already. It is not too late to change which term is promoted by MIRI / FHI etc. I think we should.

Toby_Ord 28 May 2014 11:47 UTC
6 points
0
in reply to: Oscar_Cunningham’s comment on: Can noise have power?
This is quite possibly the best LW comment I’ve ever read. An excellent point with a really concise explanation. In fact it is one of the most interesting points I’ve seen within Kolmogorov complexity too. Well done on independently deriving the result!

Toby_Ord 31 Mar 2014 8:28 UTC
0 points
0
on: Increasing the pool of people with outstanding accomplishments
Without good ways to overcome selection bias, it is unclear that data like this can provide any evidence of outsized impact of unconventional approaches. I would expect a list of achievements as impressive as the above whether or not there was any correlation between the two.

Toby_Ord 17 Jun 2013 15:43 UTC
6 points
0
on: Normative uncertainty in Newcomb’s problem
Carl,

You are completely right that there is a somewhat illicit factor-of-1000 intuition pump in a certain direction in the normal problem specification, which makes it a bit one-sided. Will McAskill and I had half-written a paper on this and related points regarding decision-theoretic uncertainty and Newcomb’s problem before discovering that Nozick had already considered it (even if very few people have read or remembered his commentary on this).

We did still work out though that you can use this idea to create compound problems where for any reasonable distribution of credences in the types of decision theory, you should one-box on one of them and two-box on the other: something that all the (first order) decision theories agree is wrong. So much the worse for them, we think. I’ve stopped looking into this, but I think Will has a draft paper where he talks about this alongside some other issues.

Toby_Ord 7 Sep 2011 20:27 UTC
6 points
0
in reply to: Eliezer Yudkowsky’s comment on: Prisoner’s Dilemma Tournament Results
Regarding (2), this is a particularly clean way to do it (with some results of my old simulations too).

http://www.amirrorclear.net/academic/papers/sipd.pdf http://www.amirrorclear.net/academic/ideas/dilemma/index.html