Head of linear regression at METR.
Previously: MIRI → interp with Adrià and Jason → METR.
I have signed no contracts or agreements whose existence I cannot mention.
Head of linear regression at METR.
Previously: MIRI → interp with Adrià and Jason → METR.
I have signed no contracts or agreements whose existence I cannot mention.
Some kind of miscommunication has happened. I thought the claim in the OP was that most topics fall in bin A “easy to figure out” OR bin B “probably unknowable”, which seemed like a trivial claim to me because the only alternative is bin C “tractable but hard”. But now it seems like you’re arguing most topics fall in bin A?
But then wouldn’t we be “pretty sure we can’t know one way or the other”?
if you could truly impartially reason through it, run experiments, etc, the correct answer is either not that hard to figure out, or that we’re pretty sure we can’t know one way or the other.
What’s the alternative? That finding the correct answer is tractable but hard in the way that major scientific progress is?
I wonder if talking to AIs about the most interesting questions is substituting for discussion on LessWrong, the way it killed Stack Overflow.
The custom instructions block I use to get Claude to say reasonable things about the future, at least under slow timelines. Please comment if you disagree or have your own.
Assume p(doom)=X%; if not, ASI causes ~100% GDP growth rates by 2035, immortality by 2038, nanotech by 2042. Nanotech allows its users to arbitrarily rearrange any uncontested matter.
Dyson swarm starts construction 2036, passes Earth’s energy consumption 2040 and finishes 2120; labor share 20% by 2035 and halves every 4 years subsequently.
(Replace X with whatever you think p(doom) is since it varies so much.)
It’s also plausible to me we won’t have property rights in the future, but I’ve found it more useful to ask it about such scenarios directly and in detail, since the results are so uncertain—example prompt fragment:
the power of ASI is largely controlled by whoever physically controls the system prompt and post-training data, whoever controls *them*, etc, with these people having near-absolute power to optimize the universe (modulo any ability of other actors to make contracts beforehand that will actually be honored) once nanotech is reached in the mid-2040s
The issue with this is that we can’t fully eliminate competition because in the real world, resources are finite and status is zero-sum.
Resources: Millions of people will probably want to maximize their control over the universe’s (extremely large but finite) resources. So the Authority needs to ration the universe’s resources between millions of people who ask for millions of galaxies each. It seems natural for the Authority to either enforce equality, or maximize utility by giving people with authentic preferences for resources more galaxies (as long as they don’t violate others’ rights).
Status: This seems super messy. Social power and status only exist with your followers’ consent, so people would self-modify to be supercharismatic to attract followers, and the Authority would have to lay ground rules to ensure people don’t mind-hack others. We could potentially get weird trades like resource-seeker A self-modifying to be socially subservient to status-seeker B, in exchange for getting access to B’s resources.
It’s not necessarily the case that code and training compute are complements in an economic sense, even if compute is the reason why Mythos > Opus. Compute-efficiency has increased at something like 10x/year so more code can absolutely substitute for compute—except to the extent that code is bottlenecked on experiment compute.
reducing the weight of a car all the way to 0kg would only save a third of its fuel consumption.
I actually think this is true, for a different reason.
Say you could make an F-150 truck with all of its components massless. It still has the same air resistance, which uses maybe 1⁄3 of the energy as trucks aren’t very aerodynamic. But to match the performance of the original truck, you need
wind stability (a massless truck just blows away)
cornering stability (all weight comes from passengers or the bed, which means a very high center of mass and therefore low stability)
towing (any tongue weight would flip the truck backwards)
All of these mean you need to add ballast. Suppose you add lead plates to the bottom of the truck totaling half the mass of the original. Then your energy consumption compared to the original is (1/3 from air resistance) + 0.5 * (2/3 from rolling resistance and braking) = 2⁄3 already! I’d guess this would still have significantly worse towing vs an F150 because getting enough friction to tow something uphill basically requires a minimum truck:trailer mass ratio.
With a smaller car towing isn’t a concern, but then you have safety issues with such a light car, so you’re probably still limited to half the mass of the original. To get better than 2⁄3 the fuel consumption of the original your massless components would need to magically provide downforce only when cornering or something.
The type of batteries used for grid-scale energy storage are not important to drones. FPV drones use high density, high power batteries that owe their development to mobile phones. And Shaheds, powered by piston engines, don’t have batteries at all.
It might be easier to condition on comprehension questions. “How much money do you get if the wizard predicts you will take only the opaque box, and you take the opaque box?” Probably under 25% would get all four comprehension questions right
What I do mentally is:
If
If
If
Examples
a=84, b=69. Average is 77, times 1.4 is 108, actual value is 108.7
a=48, b=18. b is maybe 35% of a, 35%^2=12, so the estimate is a * 1.06 = 51 or so. Actual value is 51.26
Not sure how they compare in accuracy but it seems like your method is simpler, at least if you remember that they cross when b is 20% of a
It is in principle possible to 1000x the economy or to defeat humanity using only interpolation, depending on data efficiency. At high data efficiency a human just needs to do something once, and that mental or physical motion is instantly scaled to the entire economy, as well as interpolation between it and anything else a human has done. Likewise you get at minimum robot armies 1000x the size of humanity that can follow routine orders.
Not going to engage on this point. If it does turn out to be say 1.5:1, do you think replacing infantry with ground drones is important, or does the highest value drone capability shift somewhere else?
Not going to engage on this point. If it does turn out to be say 1.5:1, do you think replacing infantry with ground drones is important, or does the highest value drone capability shift somewhere else?
Not going to engage on this point. If it does turn out to be say 1.5:1, do you think replacing infantry with ground drones is important, or does the highest value drone capability shift somewhere else?
I acknowledge there is high uncertainty in casualty ratios. 4:1 is my educated guess based on the fact that offensives typically result in 2:1 or 3:1 and Russia is using especially perilous assault tactics against prepared Ukrainian defenses. This is higher than some estimates but the absolute level really doesn’t matter for my point—UGVs are just as big a deal if the ratio is 2:1 now and would become 3:1.
As for why I mentioned the benefits to Ukraine, it’s just because they benefit more from UGVs than Russia. Russia would benefit from better FPVs and Shaheds, but it’s widely known they’re less sensitive to casualties: they seems to have an endless supply of contract soldiers, while Ukraine has a lower population and a huge desertion problem among conscripts.
Drone capabilities are nowhere near the ceiling, and will probably advance capability by capability. As you say, AI was being trialled for terminal guidance in 2025. Now target selection (still with human in the loop) and terminal guidance are fully operational for both strike and interceptor drones, and AI has also been a big deal in intelligence and reconnaissance.
The biggest bleeding edge capability right now is probably ground combat drones that replace infantry, which would directly alleviate Ukraine’s manpower bottleneck. Official policy is to replace 30% of infantry with UGVs by end of 2026. Ukraine’s casualty ratio is already something like 4:1 in their favor, and if 30% of their casualties are displaced to drones, it would be 6:1. At this ratio Ukraine could basically fight indefinitely, which would buy them time to fully automate the rest of their army.
Daniel Kokotajlo’s post from 2020 continues to be highly prescient.
Maybe we can decompose this into diminishing returns of parallelizing work vs diminishing returns to research output. Imagine that x serial person years accomplishes x units of research, whereas x people working in parallel for a year produce only x^a units of research for a < 1. But then x^a units of research has (x^a)^b = x^(ab) units of impact.
Anyway, it’s tricky to quantify impact this way because you need an output metric that’s linear in impact.
My colleague Manish did a lot more analysis here. The main takeaway so far is categorizing each PR’s improvements as “deep” vs “shallow”, as well as “imported-from-literature” vs “invented”.
It looks like there were large, shallow improvements imported from the literature early on, while since then most improvements have been moderately involved and a larger portion are novel.
To get more evidence about SIE likelihood, we have lots of work in the pipeline, including interviews with nanogpt contributors, 1B+ token runs using Opus 4.7 and GPT-5.5 on our Inspect version of nanogpt, and other autoresearch-type tasks.
I work in evals, not technically alignment, but I have never put an agent in a virtual machine. Usually it’s in bypass-permissions mode. For something particularly sensitive, I’ll use auto permissions mode or ask-for-permissions mode. For something extremely sensitive, I put the agent in a docker container, but I try to avoid this whenever possible because it breaks a lot of workflows—except running evals themselves, which are typically in a container. Nothing super bad has ever happened from this, and I don’t expect it to because I can stay calibrated to the correct paranoia level as new information comes in about how capable new agents are of sabotage and how well monitors can catch them.
I will be pretty sad if I ever need to be paranoid enough to use a virtual machine for routine agent-assisted research. Claude-guard is great for a security level short of that, which will probably continue to be the most common.