Daniel Kokotajlo

Karma: 29,926

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Now executive director of the AI Futures Project. I subscribe to Crocker’s Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Some of my favorite memes:

(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don't know, that's why we're showing you." White Hat: "Well, let me know if that happens!" Megan: "Based on this conversation, it already has."

(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/t Scott Alexander)

Alex Blechman @AlexBlechman Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus 5:49 PM Nov 8, 2021. Twitter Web App

Daniel Kokotajlo 23 Oct 2025 16:40 UTC
6 points
0
in reply to: Toby_Ord’s comment on: How Well Does RL Scale?
That’s reasonable, but it seems to be different from what these quotes imply:
So while we may see another jump in reasoning ability beyond GPT-5 by scaling RL training a further 10x, I think that is the end of the line for cheap RL-scaling.
… Now that RL-training is nearing its effective limit, we may have lost the ability to effectively turn more compute into more intelligence.
There are a bunch of quotes like the above that make it sound like you are predicting progress will slow down in a few years. But instead you are saying that progress will continue, and AIs will become capable of doing more and more impressive tasks thanks to RL scaling, but they’ll require longer and longer CoTs to do those more and more impressive tasks? That’s very reasonable and less spicy / contrarian, I think most people would already agree with that.

I like your post on inference scaling reshaping AI governance. I think I agree with all the conclusions on the margin, but think that the magnitude of the effect will be small in every case and thus not change the basic strategic situation.

My own cached thought, based on an analysis I did in ’22, is that even though inference costs will increase they’ll continue to be lower than the cost of hiring a human to do the task. I suppose I should revisit those estimates...

Daniel Kokotajlo 22 Oct 2025 23:44 UTC
31 points
12
on: How Well Does RL Scale?
My hot take, thinking step by step, expecting to be wrong about things & hoping to be corrected:

What you basically doing is looking at the part of the s-curve prior to plateauing (the exponential growth part) and noticing that, in that regime, scaling up inference compute buys you more performance than scaling up training compute.
However, afaict, scaling up training compute lets you push the plateau part of the inference scaling curve out/higher. GPT5 pumped up with loads of inference compute is significantly better than GPT4 pumped up with loads of inference compute. Not just a little better. They aren’t asymptoting to the same level.
So I think you are missing the important reason to do RL training. Obviously you shouldn’t do RL training for a use-case that you can already achieve by just spending more inference compute with existing models! (Well, I mean, you still should, depending on how much you are spending on inference. The economics still works out depending on the details.) But the point of RL training is to unlock new levels of capability that you simply couldn’t get by massively scaling up inference on current models.

Now, all that being said, I don’t think it’s actually super clear how much unlock you get. If the answer is “not much” then yeah RL scaling is doomed for exactly the reasons you mention. But there seems to have at the very least been a zero to one effect, where a little bit of RL scaling resulted in an increase in the level at which the inference scaling curve plateaus. Right?

Like, you say:
So the evidence on RL-scaling and inference-scaling supports a general pattern:
- a 10x scaling of RL is required to get the same performance boost as a 3x scaling of inference
- a 10,000x scaling of RL is required to get the same performance boost as a 100x scaling of inference
Grok 4 probably had something like 10,000x or more RL compared to the pure pretrained version of Grok 4. So would you predict therefore that xAI could take the pure pretrained version of Grok 4, pump it up with 100x inference compute (so, let it run 100x longer for example, or 10x longer and 10x in parallel) and get the same performance? (Or I’d run the same argument with the chatbot-finetuned version of Grok 4 as well. The point is, there was some earlier version that had 10,000x less RL.)

Daniel Kokotajlo 22 Oct 2025 20:39 UTC
23 points
9
in reply to: Ben Pace’s comment on: leogao’s Shortform
I agree that 200 years would be worth it if we actually thought that it would work. My concern is that it’s not clear civilization would get better/moresane/etc. over the next century vs. worse. And relatedly, every decade that goes by, we eat another percentage point or three of x-risk from miscellaneous other sources (nuclear war, pandemics, etc.) which basically impose a time-discount factor on our calculations large enough to make a 200 year pause seem really dangerous and bad to me.

Daniel Kokotajlo 22 Oct 2025 4:35 UTC
35 points
5
in reply to: leogao’s comment on: leogao’s Shortform
I agree with this fwiw. Currently I think we are in way way more danger of rushing to build it too fast than of never building it at all, but if e.g. all the nations of the world had agreed to ban it, and in fact were banning AI research more generally, and the ban had held stable for decades and basically strangled the field, I’d be advocating for judicious relaxation of the regulations (same thing I advocate for nuclear power basically).

Daniel Kokotajlo 17 Oct 2025 23:56 UTC
5 points
2
in reply to: speck1447’s comment on: Daniel Kokotajlo’s Shortform
OK, suppose we are 3 breakthroughs away from the brainlike AGI program and there’s a 15% chance of a breakthrough each year. I don’t think that changes the bottom line, which is that when the brainlike AGI program finally starts working, the speed at which it passes through the capabilities milestones is greater the later it starts working.
Now that’s just one paradigm of course, but I wonder if I could make a similar argument about many of the paradigms, and then argue that conditional on 2035 or 2045 timelines, AGI will probably be achieved via one of those paradigms, and thus takeoff will be faster.

(I suppose that brings up a whole nother intuition I should have mentioned, which is that the speed of takeoff probably depends on which paradigm is the relevant paradigm during the intelligence explosion, and that might have interesting correlations with timelines...)

Daniel Kokotajlo 17 Oct 2025 22:16 UTC
6 points
4
in reply to: speck1447’s comment on: Daniel Kokotajlo’s Shortform
Maybe? I feel like you aren’t engaging with the second half of my “intuition for faster” paragraph. You are just recapitulating the first half. I don’t think the second half depends on claiming that the brainlike AGI paradigm is available and viable in 2027; maybe it requires breakthroughs in neuroscience or brain scanners that haven’t happened yet but have a 5% chance of happening each year, for example.

Daniel Kokotajlo 17 Oct 2025 19:13 UTC
62 points
3
on: Daniel Kokotajlo’s Shortform
Suppose AGI happens in 2035 or 2045. Will takeoff be faster, or slower, than if it happens in 2027?
Intuition for slower: In the models of takeoff that I’ve seen, longer timelines is correlated with slower takeoff. Because they share a common cause: the inherent difficulty of training AGI. Or to put it more precisely, there’s all these capability milestones we are interested in, such as superhuman coders, full AI R&D automation, AGI, ASI, etc. and there’s this underlying question of how much compute, data, tinkering, etc. will be needed to get from milestone 1 to 2 to 3 to 4 etc., and these things are probably all correlated (at least in our current epistemic state). Moreover, in the 2030′s the rate of growth of inputs such as data, compute, etc. will have slowed, so all else equal the pace of takeoff should be slower.

Intuition for faster: That was all about correlation. Causally, it seems clear that longer timelines cause faster takeoff. Because there’s more compute lying around, more data available, more of everything. If you have (for example) just reached the full automation of AI R&D, and you are trying to do the next big paradigm shift that’ll take you to ASI, you’ll have orders of magnitude more compute and data to experiment with (and your automated AI researchers be both more numerous and serially faster!) if it’s 2035 instead of 2027. “So what?” the reply goes. “Correlation is what matters for predicting how fast takeoff will be in 2035 or 2045. Yes you’ll have + 3 OOMs more resources with which to do the research, but (in expectation) the research will require (let’s say) +6 OOMs more resources.” But I’m not fully satisfied with this reply. Apparent counterexample: Consider the paradigm of brainlike AGI, in which the tech tree is (1) Figure out how the human brain works, (2) Use those principles to build an AI that has similar properties, i.e. similar data-efficient online learning blah blah blah, and (3) train that AI in some simulation environment similar to a human childhood, and (4) then iterate from there to improve things further. If I condition on this paradigm shift happening, and then further imagine that it happens in either 2027, 2035, or 2045… it really does seem like all four steps of this tech tree should go faster in the later years vs. in 2027. Especially steps 3 and 4 should go much faster due to the copious amounts of compute available in those later years. The amount of compute needed to simulate the human brain isn’t dependent on whether we figure out (1) and (2) in 2027 or 2035 or 2045… right?

I’m currently confused about how to think about this & what the ultimate answer should be, so I’m posting here in case the internet can enlighten me.

Daniel Kokotajlo 16 Oct 2025 21:26 UTC
2 points
0
in reply to: boazbarak’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
Thanks. Yeah I think the timelines were also a bit too aggressive, but overall things won’t look thaaat different in 2029 (my current median) or 2032 (the aggregate median of the rest of my team).

I think maybe my main disagreement with you has to do with the thing about making each generation of AIs more aligned than the previous one. A very important point, I think, is that we don’t have perfect evals for alignment, and probably won’t have perfect evals for alignment for some time. That is, our eval suites will catch some kinds of misalignment, but not others. So there will probably continue to be misalignments—including very major ones—that we don’t catch until it’s too late. So it’s unclear whether our AIs will be improving in alignment over time; they’ll probably be improving in apparent alignment, but who knows what’s happening with the kinds of misalignment we can’t effectively test for; those kinds could be getting worse and worse (and indeed we have some reason to think this will be happening; AI 2027 even gives a fairly concrete model according to which the misalignments get worse over time despite things looking better and better on evals. E.g. at first you are mostly just summoning personas using prompts, and that’s pretty benign, but as RL scales up that tends to get distorted and undermined by training incentives).

Daniel Kokotajlo 16 Oct 2025 17:40 UTC
5 points
0
in reply to: Mo Putera’s comment on: Evaluating “What 2026 Looks Like” So Far
Less emphasis on playing video games, both as a product feature and as a training environment. That’s probably the main thing.
The thing with the Mormons totally didn’t happen… but, Elon bought Twitter and Trump created Truth Social… so maybe things are basically on track lol. I guess Twitter still has a lot of leftists on it, so that’s maybe a counterexample to the gloomy balkanization of the internet trend I was predicting.

Daniel Kokotajlo 14 Oct 2025 22:16 UTC
3 points
0
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
Some napkin math & additional musings. I should say I’m not an expert nor am I devoting any significant amount of time to this so my conclusions are not even close to confident. This is all extremely lazy speculation.
Google AI summary tells me: Grad rockets have max range 40km with CEP of about 100m. Conventional artillery (Krab system in particular) has 30km range but 40km with fancy rocket-assisted projectiles, with CEP of about 50m. Assuming this is representative...

Same source says a single artillery shell costs in the low thousands of dollars. Ouch.

This is not looking good for artillery. Similar googling suggests that Zipline medical supply delivery drones today have something like 80-150km round-trip range; if they were one-way kamikazes their range would presumably be at least double. And they cost something like $10k each. They can carry about 2kg payload though, not very much. The explosive charge in a typical artillery shell is like 6 − 10kg of explosive. So, maybe redesign the drones to sacrifice a little range and reusability, but carry a larger explosive payload? Probably pretty doable. End result is a kamikaze drone that costs as much as, say, 4 artillery shells, but has a CEP of 1m and about double the range. And it can be launched from a truck-mounted launcher. So bolt a bunch of launchers to a bunch of pickup trucks, and load them up with drones, and now you have your dirt-cheap equivalent of aircraft carriers for land warfare.

Against a single soldier operating alone, the CEP advantage means that the drone will be orders of magnitude more cost effective. I think you’ll need quite densely packed enemy forces for the artillery to be more cost-effective than a swarm of drones. Like, maybe if 100 soldiers are walking across the same field together, all within the same 50m radius? Then maybe it’s about even. You probably need like 1000 soldiers in the field for artillery to be superior to drones?

Against vehicles, let me see… let’s suppose the vehicles are traffic-jammed due to a roadblock or wreck or something, so every 10m stretch of road contains a vehicle. Then… heck even under these conditions, it seems like you’ll achieve more destruction by sending 1 drone than by sending 4 artillery shells. Probably all 4 shells will miss the road entirely. In general perhaps, artillery only wins if each individual shell has a 25%+ chance of hitting something valuable.

All that assumes that the drones aren’t being shot down en route, of course. Artillery shells are genuinely harder to shoot down than drones, by a lot, and that’s a big advantage. On the other hand, drones have longer range… so I think that’s about what the crux will be. Which is more important, immunity to AA, or range?

I think the answer depends on whether AA is ‘good enough’ against drones. In ww2, AA was not ‘good enough’ against planes, in the following sense: The amount of AA a fleet could carry was woefully inadequate to prevent that fleet from taking lots of damage from an airstrike launched by a similarly-expensive fleet. (In many of the carrier battles of WW2, both sides’ squadrons would reach the enemy and do significant damage despite all the AA mounted on the carriers and supporting vessels, and despite the defending fighter planes)

My guess is that AA will not be good enough against drones, but I’m not confident. The reason I think this is that drones are very cheap relative to the cost of the targets they’ll be attacking. E.g. a truck with four soldiers in it represents something like $500k, bare minimum. (100 for each man + for the truck). Say it has an autoturret on the roof. Well, it needs to be able to reliably defeat, like, $200k worth of attacking drones. So a swarm of ~20. That seems hard. It simply doesn’t have the time to swivel and shoot them all down before they close the distance. (quick check: Say they travel at 100km/h. Phalanx CIWS has something like 1.5km max range. So they have something like… 45 seconds to shoot down all of them? Except the Phalanx CIWS is mounted on ships, it is too big to be mounted on a truck. Plus it probably costs millions of dollars. … yeah idk but it’s looking rough for our hypothetical truck.

And this is making it hard for the drones, by making the target really cheap yet still defended by autoturret. Against conventional militaries there would be much juicier, much more expensive targets—such as a Krab SPG artillery system, which costs about $10M. So instead of having to reliably defeat about 20 incoming drones, it would need to reliably defeat 400. No way.

Point is, I think that while autoturrets will be really valuable and important, I think that the overall dynamics will be similar to WW2 naval combat, in that two similarly-sized forces launching flights of bomber drones at each other will both score serious hits and both do devastating damage despite all the AA present. The main way to win will be to (a) have lots of fighters to intercept the enemy bombers before they arrive, and more importantly, (b) hit them with your bombers before they can hit you with theirs.

Which is why range matters so much, and why artillery will lose. Less cost-effective in a slugfest (unless the enemy is incredibly densely packed, which they won’t be) yet also shorter range so you probably won’t even get a chance to shoot. Only advantage is that their shots can’t be intercepted, but that’s not enough to compensate for the disadvantages.

And as I originally said, if this is indeed how it all plays out… then I think concentrating forces is going to come back in style. Put half your army in one big blob and push them across the map towards the enemy, leaving the entire rest of the line to be lightly manned by the rest of your forces. If the enemy stupidly keeps their forces dispersed, your blob will just steamroll everything in its path, large swarms of drones overwhelming anything at 80km distance from the edge of the blob. (If they counter with more expensive rocket artillery systems like HIMARS with ATACMS (300km range), then just get bigger more expensive drones with longer range too. An ATACMS missile costs about a million dollars. Perhaps the price can be brought down with improved procurement practices… because otherwise it wouldn’t be cost-effective against the type of dirt-cheap drones&trucks force I’m describing even if said force didn’t bother to defend itself at all.)

If the enemy builds a similar force to you, and concentrates to a similar degree, then your two blobs meet in a gigantic pitched battle. The sky fills with dogfighting drones; fiber optic cables fall like snow on the fields. (Though actually perhaps by this point they’ll use laser links instead?)

Vignette:

We couldn’t sleep much the night before the battle. Word was that the enemy was gathering their forces about 300km away, intending to stop us before we crossed the river. Our job was to drive 200km to the river and then destroy everything within 100km radius so that the engineers could safely build the bridges. They were probably going to try to stop us. We packed up our sleeping bags, mounted our trucks, and rolled out.

As we drove we passed the occasional bombed-out vehicle. One patch of road was littered with drone bits, presumably the remnants of one of yesterday’s fights. On our screens we tracked the progress of the battle; we weren’t ordered to launch anything yet though. By noon we made it to the river. Well, we never actually saw the river ourselves, though others in our area did. Already when we arrived we got word that the enemy swarm was en route and would hit in half an hour. We stopped in our designated field, erected our launcher, and let fly. Within ten minutes we had launched everything we had. Now it was up to the AIs and the FPV pilots, operating from datacenters and VR headsets way in the rear, connected by laser links and fiber optics, to do the fighting. But it was still up to us to do the dying. Fifteen minutes left, we were told. Time to disperse, conceal, escape. We left the truck in the field, set the autoturret to “kill anything in the sky,” and hid in the basement of a nearby house. The sky itself seemed to hum with the noise of the drones.

When we came out, our truck was a burning hulk. Command was telling us to get to the intersection ASAP to unload the reinforcement-drones and launch the next wave. The bombardment was still ongoing in other areas, but thankfully it was just smoke, not explosions, where we were. It was still insanely loud though, a continuous rumble of bombs going off.

Fortunately our area didn’t get hit again that day. We’re told our bomber wave had been more successful than theirs. More reinforcements kept arriving and launching more drones. Around midnight we got the news: Enemy withdrawing beyond range, you may rest for the day. Victory.

Daniel Kokotajlo 10 Oct 2025 23:57 UTC
34 points
9
on: Daniel Kokotajlo’s Shortform
I love this passage from Bay Area House Party #8, it articulates two duelling perspectives on social media:
“Nishin, have you been using Twitter again?”
“First of all, it’s called X now. Second—“
“Nishin, you know what Twitter does to people! The journos can use it because they’re all nepo babies who come from long lines of other journos that developed genetic resistance over dozens of generations. Your ancestors were subsistence farmers! The worst discourse they had to deal with was people accusing their rye crop of having ergot! You’ll be eaten alive!”
“I’m making an impact!” Nishin insists, a little too loudly. “I’m influencing the national conversation!”
“Nishin,” says Vinaya. “You read speculative fiction, right? Maybe you fantasize about isekai—the idea of being dropped into some fantasy world and having to survive by your wits alone? Imagine writing our own world as an isekai. ‘In my setting, there’s this computerized gathering-place hive mind thing. Nice, normal people go there and get addicted to it. Then it uses advanced AI to serve them content specifically tailored to polarize and enrage them. The world’s top public intellectuals start out as really thoughtful decent people, then get spit out as seething balls of rage suitable only as objects of public hilarity and terrible warnings. Once there was a psychology professor widely admired as one of the leading proponents of self-cultivation, the Western canon, and Biblical wisdom, and he spent a few years on there and ended up screaming about how pandemics were fake news dreamed up by mediocrity-worshipping blue-haired death cultists.’ If this was the book you were going to be isekaied into, wouldn’t you develop some kind of plan other than entering the Torment Nexus and hoping this doesn’t happen to you? If you used the Torment Nexus and it did happen to you, wouldn’t you at least consider the possibility that you were suffering some kind of Torment-Nexus-related-brain-damage as opposed to really being a vital front-line soldier against the death cultists?”
“Yeah, well”, says Nishin. He seems to have calmed down a little. “Imagine you’re reading a fantasy book. There’s a war going on between the forces of good and evil, but the physical world has been in a stalemate for decades. All the interesting fighting happens on the astral plane, where your power is determined by your wits alone. The smartest and most charismatic people have hundreds of thousands of lesser lights flock to their banner, supercharging their spiritual power. A perfectly-placed barb at the right time can puncture even the strongest warrior of the other side, draining their status-mana into your own coffers. Nobody can be truly hurt on the astral plane, not really, but the ebb and flow of astral combat leaks into the physical world, and whoever wins its spiritual wars finds their businesses succeeding, their candidates getting elected, their romantic overtures getting accepted—sex, money, status—it can all be yours. And of course it slowly drives you insane—all power-granting magic does that. But could you really live in a world like this, have the potential to be a wizard, and swear off astral combat entirely? To grow crops or something?”

Daniel Kokotajlo 10 Oct 2025 16:58 UTC
4 points
0
in reply to: Viliam’s comment on: Daniel Kokotajlo’s Shortform
Right, we wouldn’t want it to be possible to easily build doom devices. Good point.

I think the right balance would be something like “If player 1 is snowballing and invading player 2, player 2 can sell their soul to get a decent chance of defending against player 1 and a small chance of outright turning the tide and beating them, but even if they maximally sell their soul they still might lose, and if they push Player 1 too much then player 1 probably has a chance to retaliate by selling THEIR soul...”

Daniel Kokotajlo 9 Oct 2025 18:53 UTC
3 points
0
in reply to: Gavin Runeblade’s comment on: Daniel Kokotajlo’s Shortform
I agree that drone fighters don’t exist now. I predict that they will in the future, because it seems to me to be a better way to defend than e.g. autoturrets (only defend a small area, hard to concentrate forces, vulnerable to defeat in detail) missiles (too expensive against small drones) and emp weapons (future drones will be hardened)}

I disagree that gathering up in a big column multiplies the value of a drone bomber’s explosives. Drone bombers usually dive right onto the target and kamikaze/suicide; unless your column is extremely densely packed, a drone that misses won’t accidentally hit something else. By stark contrast with artillery, which is so inaccurate that you probably need to barely be within LOS of another vehicle in your column, else you risk multiplying the expected effectiveness of each artillery shot.

Daniel Kokotajlo 9 Oct 2025 16:43 UTC
7 points
1
on: Daniel Kokotajlo’s Shortform
Musing on modern warfare:
Background: So throughout history, generals have faced a tradeoff between concentrating their forces and dispersing their forces. There are pros and cons of each. But depending on the period in history, and the war, the balance tends to favor one or the other, to varying degrees. E.g. in most ancient warfare the correct strategy was to concentrate almost all of your forces in a single giant army, which would then have a huge battle against the enemy’s giant army. But in WW1, the opposite was true; both sides spread out their forces to such an extent that a ‘front line’ developed with trenches and artillery and so forth stretching across the entire continent.

Anyhow, it seems empirically that the war in Ukraine has seen the balance shift towards more dispersion. The attacking Russians have learned to send a constant trickle of small squads, rather than concentrating columns of tanks to attempt a breakthrough. People say that this is because scout drones detect concentrations instantly, before they even reach the front line, and enable such concentrations to be destroyed by long range fires, often before they reach the front line.

What puzzles me is that I have a theoretical argument for why the opposite should be happening. Probably the theoretical argument is wrong, but maybe it’s a sign of things to come. Here it is:

Above we mentioned that drones have made time-till-detection of concentrated forces go down, enabling them to be destroyed by long-range fires. An important backgound assumption, I’m guessing, is that long-range fires are more cost-effective against concentrated forces than against dispersed forces. This is most obviously true with artillery; damage per shell scales linearly with the amount of enemy forces in the area where your shell might land. And artillery since WW1 has been the king of the battlefield, responsible for most of the deaths and a large portion of the budget.

Well, artillery is in the process of being eclipsed by bomber drones. Already they account for something like 5x more kills than artillery, whereas before artillery was the dominant killer, so this revolution has already happened but it’s going to continue to intensify.

My guess is that the bomber drones will continue to get longer-ranged until they significantly outrange artillery.

Importantly, bomber drones’ cost-effectiveness does NOT scale linearly with the amount of enemy forces in the area you are hitting. Because they are smart weapons—they can hit an individual soldier almost as well as they can hit a squad, they can hit an individual vehicle almost as well as they can hit a column.

So… maybe concentrating forces will soon turn out to be a good idea again? Because if the main threat is enemy drones, who will instantly detect you and then hit you with bombs/kamikazes before you even get into artillery range, maybe you don’t take less damage in expectation by splitting up into loads of small squads dispersed across the front? (Whereas with artillery, this would drastically reduce the amount of damage you take) Unless maybe the squads are so small that they can actually escape detection for a significant period of time. But probably as drones become better and more numerous this possibility will be closed off; everyone will be spotted all the time, and so the way to advance will be to use your own drones to destroy the enemy before they use theirs to destroy you.

Basically, I’m imagining a near future where the dominant strategy is to concentrate your forces into a big column and push towards the enemy. The primary weapon of your force should be long-range bomber drones; the idea should be that you destroy the enemy at a distance before they can send their own drones against you. If they disperse their forces, you can defeat them in detail. Both sides will have fighter drones to defend against the bomber drones; if one side concentrates forces and the other side doesnt, then small flights of bomber drones from the dispersed groups will have to try to attack a large concentration of fighter drones, meanwhile large swarms from the concentrated group get to overwhelm small flights of fighter drones defending the dispersed groups.

I’ve said this before but the analogy might be to naval warfare in the Pacific in WW2. A single battleship could beat a dozen carriers—if it got in range. Which it never would. Carriers would sink them before they got in range, pretty reliably. So in practice battleships were much less cost-effective than carriers. And in practice, concentrating all your carriers into one fleet was better than dispersing them into a bunch of individual ships, because of basically the bomber-fighter dynamics described above.

So if this theory is correct, how do we square this with the reality / empirical evidence, in which the opposite seems to be happening and both sides are dispersing forces maximally?

Well, in order to do the concentrated attack, you really need to have most of your force be long-range bomber drones & associated launchers etc. Because they are things that will actually do the fighting, and you are relying on them to hit the enemy before they hit you. Also the range really has to be long—it has to comfortably outrange artillery, otherwise your concentrated forces will get cost-efficiently wrecked by artillery as has happened the past few years in Ukraine. I think these are sufficient explanations probably.

Daniel Kokotajlo 9 Oct 2025 13:48 UTC
6 points
4
in reply to: Viliam’s comment on: Daniel Kokotajlo’s Shortform
Everyone loses is supposed to be exactly as bad as someone else winning.

Daniel Kokotajlo 8 Oct 2025 23:33 UTC
11 points
0
on: Daniel Kokotajlo’s Shortform
High-level idea for board or computer game:

It’s a competitive multipolar game, like Eclipse or Root or so many others, in which your faction grows more powerful, acquires various abilities/resources/territories/armies/etc., triggers various cool combos, etc. Tends to snowball, as most such games will by default unless they have strong catchup mechanics.

In this game, the catchup mechanic—the thing preventing the game from quickly snowballing—is a collection of devil’s bargains available to the players. E.g. you can hire mercenaries instantly to help you win an upcoming fight, but then you have to pay them henceforth or else they’ll turn on you. Or you can do some black magic voodoo thing to get a permanent resource bonus but also, this runs some risk of later destroying the environment. Etc.

Also, in this game, it is possible for all players to lose. Only one can win, but all can lose. And many of these catchup mechanics basically increase the aggregate probability of this happening, e.g. they defeat the runaway snowball but make it more difficult for anyone to win in the end.

The game would be balanced so that while individual players winning sometimes happens (and perhaps even is more likely than not) the modal outcome is everyone losing due to the race to the bottom effect.

(I’m told that the game ARCS is somewhat like this in its campaign variant. This may have inspired the idea in fact. That, and AI risk obviously.)

Daniel Kokotajlo 8 Oct 2025 21:41 UTC
5 points
0
in reply to: StanislavKrym’s comment on: Plans A, B, C, and D for misalignment risk
I think Taiwan invasion is very plausible but I wouldn’t say it’s likely by 2030 even assuming superhuman coders. Maybe 50/50?

I agree that in the 2030s, especially the late 2030s, the US might be in big trouble w.r.t. competition with China. Not confident of course, the future is uncertain.

Overall I think plan E is quite plausible; Ryan’s breakdown of probabilities overall seems reasonable to me (I might put a bit more into Plan A)
- Plan A: 5%
- Plan B: 10%
- Plan C: 25%
- Plan D: 45%
- Plan E: 15%

Daniel Kokotajlo 8 Oct 2025 21:34 UTC
LW: 2 AF: 2
0
AF
in reply to: cousin_it’s comment on: Plans A, B, C, and D for misalignment risk
I disagree with the probabilities given by the OP. Also, the thing I mentioned was just one example, and probably not the best example; the idea is that the 10 people on the inside would be implementing a whole bunch of things like this.

Daniel Kokotajlo 8 Oct 2025 20:12 UTC
LW: 13 AF: 8
8
AF
in reply to: cousin_it’s comment on: Plans A, B, C, and D for misalignment risk
I don’t think the idea is that the 10 people on the inside violate the wishes of company leadership. Rather, the idea is that they use whatever tiny amount of resources and political capital they do have as best as possible. E.g. leadership might be like “Fine, before we erase the logs of AI activity we can have your monitor system look over them and flag anything suspicious—but you have to build the monitor by next week because we aren’t delaying, and also, it can’t cost more than 0.01% of overall compute.”

Daniel Kokotajlo 8 Oct 2025 18:26 UTC
LW: 21 AF: 7
25
AF
on: Plans A, B, C, and D for misalignment risk
- Plan C: 20%
- Plan D: 45%
- Plan E: 75%
I feel like these numbers are too low.