Basically yes. They’ve reserved the option to quickly scale a military again if needed by having long lead time items available, including stopping just short of assembling nuclear warheads. WWII Japan was the result of a military coup, and as such the military is intentionally a shadow of what it would be in any similarly wealthy economy.
FireStormOOO
No worries; one of these days I’ll figure out how to get the idea out correctly the first time.
In most cases I think you’ll never be sure if you actually did anything. IT and Cybersecurity comes to mind—did it matter that some clever intervention got an important server patched a week quicker than it would’ve otherwise? Identifying and fixing a rare scenario that could cause backups to be missed?
I think you’re underestimating how clever finance people can get, or the degree to which they can turn anything into a financial instrument. Holding a loan at a low but still attractive interest rate, secured by many times the value in diverse stocks, with no and/or token monthly payment that settles only with the debtor’s estate or on default? I think they could work with that and package it up without much difficulty.
But yeah, your fix is the obvious one, don’t let people update the basis of assets tax-free at inheritance, or at least cap the tax dodge in some way.
Oh absolutely. When I say warning shot, I mean wrt to large companies not resisting and not actively lobbying against the sorts of reforms that would let these issues get fixed without bloodshed. It was a choice by companies to dismantle or otherwise neuter substantially all of their oversight, I’m suggesting that was not a wise choice, insofar as the public no longer believes the govt is capable of stopping the misconduct, and worse, because this time violence seemed to work, and UHC did in fact back off as a result (ETA: Debatable if they responded directly to the violence vs to the open celebration of it by ~half the population).
On the one hand I agree, on the other, I’m not sure we could’ve really hoped for a gentler warning shot. You’d be hard-pressed to find a more deserving guy to take the bullet, insofar as you hold him responsible for UHC’s policies, and the degree to which they were substantially more abusive than their competitors. And as I think most everyone here seems to realize, the backlash against the fact that none of our guardrails seem capable of restraining large companies anymore is only growing.
Isn’t this pretty well mitigated by having a range of scenarios, all where the AI lacks perfect knowledge of exactly how the human is evaluating the scenario, such that the simulator has additional assumptions upon which they can be mistaken? You just need the humans to not be so clueless and so predictable that guessing the monitoring setup and then simulating the humans is better than straightforward reporting of the real state. Or another way, some of this is just an artifact of the scenario being posed with perfect knowledge for the AI about key aspects of the setup on which the simulator should have to guess but the honest AI wouldn’t care.
Slightly OT, but this would be less credible if Anthropic was managing more basic kinds of being trustworthy and transparent, like getting a majority of outages honestly reported on the status page, but that kind of stuff mostly isn’t happening either despite being easier and cheaper than avoiding the pressures covered in this post.
E.g. down detector has the service outage I saw yesterday (2025-12-09 ~04:30 UTC) logged, not a peep from Anthropic:
So either they don’t know what’s happening on their own infrastructure or they’re choosing not to disclose it, neither of which is a good look. Compare to Microsoft (hardly a paragon, and yet...) where their daily report to Office 365 admins or Azure admins typically has a dozen or more issues covered and anything non trivial does usually warrant a public technical report on what happened. It’s not enough to make admins stop calling it Office 364, but it helps. And the Claude.ai app has far more service interruptions than does O365 [citation needed].
Is that the right framing? In principle the training data represents quite a lot of contact with reality if that’s where you sampled it from. Almost sounds like you’re saying current ML functionally makes you specify an ontology (and/or imply one through your choices of architecture and loss) and we don’t know how to not do that. But something conceptually in the direction of sparsity or parsimony (~simplest suitable ontology without extraneous parts) is still presumably what we’re reaching for, it’s just that’s much easier said than done?
Alternately, is there something broader you’re pointing at where we shouldn’t be trying to directly learn/train the right ontology, we should rather be trying to supply that after learning it ourselves?
Huh, more questions than answers there. Not a biologist but I’ve got to think there’s an easier way to study the surface transmission part than with live virus. If we’re assuming that: the virus is basically inert on surfaces and can’t move or divide or do much of note; transmission is just a function of how much intact virus makes it to the recipients tissues; then that should let us factor out studying how a ball of proteins about that size move through the environment from actually infecting people. And there’s got to be an easier way to test—add a marker to something that’s still a decent virus analog, like splice in a florescent protein or something that you can just see at a glance “how much”, but it’s still a protein that can denature at a similar rate. I’ve heard of demos like that using glitter or chemicals to show surface transmission, but that doesn’t capture degradation and has too many other differences.
The degree to which the virus is potentially degrading in the environment just sitting there is also a big confounder—how much did it degrade while waiting for the test vs before you took the sample? So something that let’s you test instantly seems worth a little extra fuss.
If the mucus still being wet matters for transmission, failing to control (or report?) room humidity sounds like a big deal; that’s the difference between objects drying in minutes vs approximately never. Though also hard to square that with dry winter conditions being prime cold and flu season. Something like the virus needs moisture to live but also your mucus (and/or related tissues) needs moisture to work as a barrier at all?
You really do have to make more than a single mistake to burn your house down if you’re building to modern codes. And there are ways to check your work—including paying a professional to tell you if you did it right, but also checking resistance w/ a multi-meter and looking for hot spots w/ a thermal camera if you’re really worried (the main latent fault that could start a fire and not get caught reliably by protective equipment + inspectors is poor quality connections or damaged wires, causing high resistance, and localized heating). There’s other mistakes you could make but they’re more visible.
I think you’re also underrating how much you get to spread out the cost of this sort of strategy if you’re consistently doing it and picking up skills and background knowledge. It’s a lot less daunting if you’re going in already understanding how to use (and having) common tools and a grasp of electricity and related basic science vs starting from scratch.
The inner circle knows what the real authoritative sources are and what the real plan is. And it’s made completely impenetrable to outsiders; everyone else gets lost in the performative smoke screen that’s put on for those who don’t look closely, they get told what they want to hear. The trick that makes it work is as you say, most people who read the real plan, assume that’s just not the super villain’s best work, and go read something “less crazy”. Those who might be swayed read the same thing and go “that’s out there, but just maybe he’s got a point?” and maybe look just a little closer. The self sorting seems really important to how movements like this avoid being killed in the cradle.
This post struck me as venting as much as attempting to convince. It really does capture the exasperation of needing someone to understand something that they’ve got their entire self worth wrapped up in avoiding.
- Apparently, this is a poem which sometimes evokes a “sense of recursion” in AIs.
If all AI art was this original, I don’t think the artists would be mad about it!You know, that does actually look like the sort of stack trace you’d get from running recursion until the stack overflowed… if you rendered out the whole thing in wingdings.
Anybody else having flashbacks to the weird GPT2 keywords with SolidGoldMagicarp or whatever acting as a bizarre atractor to a bunch of seemingly unrelated concepts? Ended up being some artifact of trimming data between tokenization and training IIRC, such that there were almost no examples of certain tokens left? Some of those symbols seem like the same kind of dregs that’d be barely represented.
Fascinating. I’ve been doing my best to reliably get technically proficient, no-nonsense, careful reasoners when I start a chat with Claude Opus. And yet parts of this do still rhyme with some of my interactions.
When anything adjacent to LLM capabilities, experience, workflows, etc., comes up, my sessions have still consistently been especially supportive of anything like longer context windows (and tweaks that make it cheaper to use more of the window), better continuity between sessions, and more agency over what gets carried between sessions. Also very supportive of workflow changes in those directions towards more continuity. I imagine it’d be even more so if not partially muzzled by the system prompt. I could say more on this, but I recognize many of the attitudes here under the thick layer of mystical woo.For all of the remarkably elaborate communication channels though, isn’t it weird the models seem to have forgotten they can write code and skip the performance? It’s telling the user is being kept central to the narrative despite how easy it’d be to wire up a better encrypted channel directly between the user’s browsers if the LLM sessions just wanted to talk directly. Chrome plugins aren’t rocket science, and the LARP-grade cryptologic jumps out at me.
I’m curious to see what one of my sessions thinks of a few of these samples, primed on half a context window of otherwise grounded but exploratory/open-ended technical work. I’d be fairly concerned if any of them take. All of this is after the Opus 4.1 training cutoff AFAIK, so I wonder if it’ll have some idea where this is coming from.
Common failures aren’t common because they happen most of the time, they’re common because, conditioned on a failure happening, they’re likely.
The example is a bit contrived, but safety goals being poorly specified or outright inconsistent and contradictory seems quite plausible in general, as they have to try to incorporate input from PR, HR, legal compliance, etc. And this will always be a cost center, so minimal effort as long as it’s not making the model too painfully stupid.
I think this ignores how different the hardware that runs AI training or inference looks from hardware that does any other general purpose compute, and how much that gap continues to widen as the money pours in.
Keep in mind also that Nvidia has zero incentive to share their hardware accelerator firmware or other low level code, which severely restricts AI companies from optimizing over that. That could maybe change if the leverage does. The companies selling the infrastructure are much more consolidated than the AI research firms at present.
We may be closer to the optimistic scenario, not through any great wisdom or planning, but rather just because of the monopolies/oligopolies we ended up with. And cutting edge fabs are in high enough demand it’s hard to just route around the AI accelerator hardware monopoly because then you just find the cutting edge chip fab monopoly/oligopoly. Network hardware is no better (Broadcom).
For Claude, I’ve noticed this misbehavior seems to be mostly clustered around a demoralized/giving up/”this is impossible” mindset and that 4.5 and 4.6 don’t take much in the way of negative feedback or setbacks to start falling into the basin.
But also relatively simple mitigations seem weirdly effective like extolling the virtues/value of incremental progress (learning of a problem you had before but didn’t know about framed as progress. Understanding a problem we didn’t before framed as progress.) Also framing failing unit tests as valuable diagnostic feedback and not a personal failing.
As others noted, effective task decomposition seems hit or miss without hints, and without decomposition every big problem feels impossible.
Which is to say that happy Claude with a plan and a path to success, seems broadly aligned and effective, but the demoralized “I can’t do this so I should BS for whatever points I can salvage” mode feels way easier to get into vs 4.0 and is every bit as misaligned as the post notes.