Head of linear regression at METR.
Previously: MIRI → interp with Adrià and Jason → METR.
I have signed no contracts or agreements whose existence I cannot mention.
Head of linear regression at METR.
Previously: MIRI → interp with Adrià and Jason → METR.
I have signed no contracts or agreements whose existence I cannot mention.
Mostly I meant that government needs to block all the source of progress—pretraining, posttraining, data, elicitation, others—rather than just some if it wants to mostly or entirely halt progress.
Moore’s law has required steady increase in investment, but investment has increased much less than the rate of Moore’s law, which is about 1.5x/year. If R&D spending were constant, progress might slow down to quadratic or cubic rather than exponential, still very fast.
The situation with AI is even tougher because if a software-only singularity is possible, growth will by definition continue to be exponential or faster if we halt growth in all inputs other than algorithmic efficiency, just a slower exponential than it would have been if investment continued ramping up. If status quo is 6 months from software singularity, than constant compute would mean perhaps 1 year to the singularity, and to delay the singularity until 10 years, governments must reduce inputs by at least 10x.
AI is progressing so rapidly that the field doesn’t have time to do high ROI elicitation because there are even higher ROI AI R&D pathways present in pretraining and post-training. It’s analogous to how we didn’t invent multi-core CPUs until Dennard scaling stopped, or we didn’t use multi-patterning until lithography wavelength stalled.
The abundance of independent moderate-to-high ROI paths to better chips is why Moore’s Law continued for so long, and for the same reason, it seems unlikely that AI R&D will naturally hit a wall, and nontrivial to enforce a pause.
I work in evals, not technically alignment, but I have never put an agent in a virtual machine. Usually it’s in bypass-permissions mode. For something particularly sensitive, I’ll use auto permissions mode or ask-for-permissions mode. For something extremely sensitive, I put the agent in a docker container, but I try to avoid this whenever possible because it breaks a lot of workflows—except running evals themselves, which are typically in a container. Nothing super bad has ever happened from this, and I don’t expect it to because I can stay calibrated to the correct paranoia level as new information comes in about how capable new agents are of sabotage and how well monitors can catch them.
I will be pretty sad if I ever need to be paranoid enough to use a virtual machine for routine agent-assisted research. Claude-guard is great for a security level short of that, which will probably continue to be the most common.
Some kind of miscommunication has happened. I thought the claim in the OP was that most topics fall in bin A “easy to figure out” OR bin B “probably unknowable”, which seemed like a trivial claim to me because the only alternative is bin C “tractable but hard”. But now it seems like you’re arguing most topics fall in bin A?
But then wouldn’t we be “pretty sure we can’t know one way or the other”?
if you could truly impartially reason through it, run experiments, etc, the correct answer is either not that hard to figure out, or that we’re pretty sure we can’t know one way or the other.
What’s the alternative? That finding the correct answer is tractable but hard in the way that major scientific progress is?
I wonder if talking to AIs about the most interesting questions is substituting for discussion on LessWrong, the way it killed Stack Overflow.
The custom instructions block I use to get Claude to say reasonable things about the future, at least under slow timelines. Please comment if you disagree or have your own.
Assume p(doom)=X%; if not, ASI causes ~100% GDP growth rates by 2035, immortality by 2038, nanotech by 2042. Nanotech allows its users to arbitrarily rearrange any uncontested matter.
Dyson swarm starts construction 2036, passes Earth’s energy consumption 2040 and finishes 2120; labor share 20% by 2035 and halves every 4 years subsequently.
(Replace X with whatever you think p(doom) is since it varies so much.)
It’s also plausible to me we won’t have property rights in the future, but I’ve found it more useful to ask it about such scenarios directly and in detail, since the results are so uncertain—example prompt fragment:
the power of ASI is largely controlled by whoever physically controls the system prompt and post-training data, whoever controls *them*, etc, with these people having near-absolute power to optimize the universe (modulo any ability of other actors to make contracts beforehand that will actually be honored) once nanotech is reached in the mid-2040s
The issue with this is that we can’t fully eliminate competition because in the real world, resources are finite and status is zero-sum.
Resources: Millions of people will probably want to maximize their control over the universe’s (extremely large but finite) resources. So the Authority needs to ration the universe’s resources between millions of people who ask for millions of galaxies each. It seems natural for the Authority to either enforce equality, or maximize utility by giving people with authentic preferences for resources more galaxies (as long as they don’t violate others’ rights).
Status: This seems super messy. Social power and status only exist with your followers’ consent, so people would self-modify to be supercharismatic to attract followers, and the Authority would have to lay ground rules to ensure people don’t mind-hack others. We could potentially get weird trades like resource-seeker A self-modifying to be socially subservient to status-seeker B, in exchange for getting access to B’s resources.
It’s not necessarily the case that code and training compute are complements in an economic sense, even if compute is the reason why Mythos > Opus. Compute-efficiency has increased at something like 10x/year so more code can absolutely substitute for compute—except to the extent that code is bottlenecked on experiment compute.
reducing the weight of a car all the way to 0kg would only save a third of its fuel consumption.
I actually think this is true, for a different reason.
Say you could make an F-150 truck with all of its components massless. It still has the same air resistance, which uses maybe 1⁄3 of the energy as trucks aren’t very aerodynamic. But to match the performance of the original truck, you need
wind stability (a massless truck just blows away)
cornering stability (all weight comes from passengers or the bed, which means a very high center of mass and therefore low stability)
towing (any tongue weight would flip the truck backwards)
All of these mean you need to add ballast. Suppose you add lead plates to the bottom of the truck totaling half the mass of the original. Then your energy consumption compared to the original is (1/3 from air resistance) + 0.5 * (2/3 from rolling resistance and braking) = 2⁄3 already! I’d guess this would still have significantly worse towing vs an F150 because getting enough friction to tow something uphill basically requires a minimum truck:trailer mass ratio.
With a smaller car towing isn’t a concern, but then you have safety issues with such a light car, so you’re probably still limited to half the mass of the original. To get better than 2⁄3 the fuel consumption of the original your massless components would need to magically provide downforce only when cornering or something.
The type of batteries used for grid-scale energy storage are not important to drones. FPV drones use high density, high power batteries that owe their development to mobile phones. And Shaheds, powered by piston engines, don’t have batteries at all.
It might be easier to condition on comprehension questions. “How much money do you get if the wizard predicts you will take only the opaque box, and you take the opaque box?” Probably under 25% would get all four comprehension questions right
What I do mentally is:
If
If
If
Examples
a=84, b=69. Average is 77, times 1.4 is 108, actual value is 108.7
a=48, b=18. b is maybe 35% of a, 35%^2=12, so the estimate is a * 1.06 = 51 or so. Actual value is 51.26
Not sure how they compare in accuracy but it seems like your method is simpler, at least if you remember that they cross when b is 20% of a
It is in principle possible to 1000x the economy or to defeat humanity using only interpolation, depending on data efficiency. At high data efficiency a human just needs to do something once, and that mental or physical motion is instantly scaled to the entire economy, as well as interpolation between it and anything else a human has done. Likewise you get at minimum robot armies 1000x the size of humanity that can follow routine orders.
Not going to engage on this point. If it does turn out to be say 1.5:1, do you think replacing infantry with ground drones is important, or does the highest value drone capability shift somewhere else?
Not going to engage on this point. If it does turn out to be say 1.5:1, do you think replacing infantry with ground drones is important, or does the highest value drone capability shift somewhere else?
Not going to engage on this point. If it does turn out to be say 1.5:1, do you think replacing infantry with ground drones is important, or does the highest value drone capability shift somewhere else?
I acknowledge there is high uncertainty in casualty ratios. 4:1 is my educated guess based on the fact that offensives typically result in 2:1 or 3:1 and Russia is using especially perilous assault tactics against prepared Ukrainian defenses. This is higher than some estimates but the absolute level really doesn’t matter for my point—UGVs are just as big a deal if the ratio is 2:1 now and would become 3:1.
As for why I mentioned the benefits to Ukraine, it’s just because they benefit more from UGVs than Russia. Russia would benefit from better FPVs and Shaheds, but it’s widely known they’re less sensitive to casualties: they seems to have an endless supply of contract soldiers, while Ukraine has a lower population and a huge desertion problem among conscripts.
The ability to have more integrity than the average politician is a luxury of being in a community that has the institutions and norms to reward it. IMO one can only be a competitive as a high-integrity politician if one has a super weird type of charisma compatible with integrity.