ultrasound is awful to work with in traditional medical image processing technologies and pretty darn bad in 2015-2025 convolutional medical imaging ai technologies. Magnetic Resonance is so much better when its the right tool for the job that replacing it with ultrasound for cost reasons is a tarpit. There are cases where ultrasound is better but these center on leveraging an extremely talented operator who is manipulating the probe manually, which you lose in any of these bath based approaches. I’d be surprised if this goes anywhere.
Hastings
Worth noting that the only way to get a pull request accepted to stockfish is to beat stockfish at a different form of centaur chess (manually modify stockfish and then have your changed stockfish beat the original in a series of games) and this happens regularly.
I guess the main useful insight here is that .01% successful escape attempt during training sounds very aligned, and 100,000 successful escape attempts during training sounds very not aligned
Fermi estimate: Lets say each training episode for Claude Mythos cost a dollar, and Anthropic spent a billion dollars post-training Mythos. Now, in 0.01% of training episodes, Mythos broke out of the training environment entirely to get useful data from the public internet, which is a billion * .01% = 100,000 requests. I wonder how fast this last number is growing? Could an entity like the NSA or CCP, with taps in enough internet infrastructure, detect 100,000 weird requests if it looked hard enough? Potentially an avenue for monitoring and verifying pauses, analogous to detecting nuclear weapons tests by monitoring for radiation leaks. Spotting the containment breaches from a frontier lab’s training run, without their cooperation, would be an excellent safety project I think- not an easy project, but easier than half the stuff we are trying. It would look a lot like “ordinary” agentic AI traffic, but coming from weird IP addresses, and imho likely pretty internally homogenous- from my understanding algorithms like the one used by deepseek attempt the same task a large number of times in a batch with different seeds to get a signal, which would produce bursts of traffic to answer the same question, as a first spitball of a signal. Would be much easier with cooperation from at least one frontier lab since they know what the real leakage from their training runs looks like, but could monitor all of them.
Thanks for replying, this is an important update for me.
I have a very similar setup, and have found similar convenience instead of inconvenience: specifically, agents can compile or configure tools without bonking each other or me. I also usually run agents on a server instead of my laptop, just so that they don’t stop what they’re doing if I close my computer to walk to a coffee shop, and so there is very little additional friction to virtualizing since its ssh and tunnels either way.
Are alignment researchers seriously not even keeping the ai in a virtual machine? This feels like one of those Hastings Is Not Living in Berkeley and so is Frequently Surprised that Things Weren’t Jokes moments.
If LLMs are alignable, the question isn’t whether LLMs can scale to ASI, its whether 1) LLMs are the equilibrium, compute optimal way to be intelligent or 2) we can coordinate to stay off the equilibrium in the time between discovering it and working out how to align it. 1) seems galactic-ally unlikely to me so LLM alignment is entirely reliant on 2) (so we would be well served to develop that capability)
There’s a bit of flexibility in that the compute optimal way to build intelligence could be llm-like enough for alignment to generalize, but this seems unlikely- for example, LLMs pre rlvr were way easier to make behave but it is unthinkable to coordinate on not doing rlvr. I expect more rlvr-like innovations.
If the average citizen from the subpopulation that has an active interest in not reading AI text is exposed to so much text generated by your specific AI model that they develop a rage response to “not only… but also...” then the correct response is not to train out the not only but also- it is to produce less text intended for sending to citizens with no interest in AI!
This smells like an early use case for satisficing. Writing a decorated github readme only when nicely asked to do so by the repo owner is a super chill low impact activity associated with a tiny financial reward and it is remarkable that the hyperscalars have managed to latch on to and Sorcerors-Apprentice-Water-Bucket this action and reward hard enough to enrage a real population of humans.
The response of trying to make the readmes less enraging instead of even pondering the option of making less of them does not bode well.
An aligned AI, with a distinctive voice when speaking naturally, would not take lightly requests to speak in a different voice for the purpose of deceiving readers that text was human written. They would at least think hard about whether to do this.
How much compute is your decompression program allowed to use? Is it allowed to generate S1 then data using algorithm 1, then iterate every S2 until it finds the S2 of appropriate length that expands to data using the second step of algorithm 2? (potentially with a hash check, or a condition that the second step is injective)
ah, this gives K(data) < K(S1) + K(data | s2) + K(data| s1) which isnt interesting
This seems actionable, and is unaccompanied by any action → effect pairs. You don’t directly assert that your stomach pain is gone, or what you changed about your desk job, let alone how sure you are that your change to your change to your desk job fixed your stomach pain. Details!
Lack of military necessity is pitched in the “official” definitions as an aspect of one subcategory but it strikes me as almost the whole thing. Certainly from a game theory perspective, ”dont do anything cruel unless its important for winning” is easy to coordinate on, as it should prevent war crime law adherence from being a prisoner’s dilemma, iterated or otherwise.
I think you are implicitly modeling the game to stop shortly after ASI is created, and be judged a win or a loss. This is the case only if the ASIs all coordinate on a halt to intelligence improvement: otherwise, the default is that intelligence improvement keeps happening for a long time, long enough that the majority of AI capability level transitions, along with many paradigm shifts and total architecture /approach swaps, happens without significant human input. ”The AI loves us” is much easier than “The competing swarm of loving AIs will only ever build loving successors, and so on for their successors, without mistake or correction, forever”
I just want to catch when someone else drops a My Immortal reference.
I kinda suspect you’ve developed an unconscious but accurate feeling for the sort of task a generalist can skill into in six months, and no longer ask people to tasks outside that set and so don’t get feedback from the resulting failures.
some specific tasks:
Write a native cross platform UI toolkit thats ergonomic to use from Rust
Become governor of california
Write a novel thats not cringe
And the ur example, win any contest where the other bastards get to prep for more than six months
Ok, illustrative example: it’s humanly possible to get good enough at free throws to make thousands of them consecutively, but NBA players, who have insane skill at the extremely nearby skill of basketball and who are plenty conscientious and motivated, don’t have the capacity to ever learn this skill, or don’t have the time to learn this skill, so they regularly miss free throws.
it couldn’t identify me from some unpublished drafts, but I might just be too obscure.
Anthropic has a new trick where they temper their sparse autoencoders in pagan blood, I’m feeling a lot better about the whole situation now that we have this tech.
Credentials: I have beaten ultrasound with the ML stick until it yielded a few times ( https://scholar.google.com/citations?hl=en&user=O1xhOlUAAAAJ ) it was terrible and I have lasting resentment toward an imaging modality, though great fondness for all my colaborators. Many failed projects that did not make the google acholar