I use AI for proof reading, editing, formatting and feedback, so that line (and the rest of the essay) is human written. If they can come up with puns that good/bad on their own, I’m out of a job as human writer.
AlphaAndOmega
My Willing Complicity In “Human Rights Abuse”
I was somewhat skeptical of the importance of the spinal cord for general cognition, but I did find a few articles that have me reconsider and become somewhat agnostic:
https://pmc.ncbi.nlm.nih.gov/articles/PMC9165403/
>The impact of the anatomic level of injury in the spinal cord on cognitive function has also been investigated. In a study carried out by Wecht et al. (2018), it was shown that patients with SCI at or above the T1 level have a lower performance on cognitive tasks (Wecht et al., 2018). On the other hand, given the fundamental role of the spinal cord in the functions of the autonomic nervous system, it has been suggested that hemodynamic events after SCI (chronic hypotension and orthostatic hypotension), particularly in individuals with high spinal cord lesions (i.e., above T6), may contribute to the development of distinct patterns of cognitive impairment (Chiaravalloti et al., 2020a). In line with these findings, Chiaravalloti et al. (2020a, b) also identified a relationship between some cognitive functions and hemodynamic changes, concluding that, an increase in cerebral vascular resistance leads to the worsened performance of the individual in tasks that involve cognitive activity.
It’s far from cut and dry. After all, serious spinal cord injuries can ruin QOL, and are usually due to some form of trauma. But I do find this to be surprising and suggestive.
Chesterton’s Pill
>“a substance or invent a machine of such frightful efficacy for wholesale destruction that wars should thereby become altogether impossible”.
I don’t see Nobel as being entirely wrong here. The proliferation of nuclear weapons did ensure that the Cold War stayed mostly cold, and open conflict between nuclear powers remains rare and limited in scope. Sure, it didn’t end all war, but the world has been remarkably peaceful for a very long time. I can only hope it stays that way.
What do they actually say?
My first impression, before you specifically noted that someone meant to say “emergent misalignment” was that it was just another way of gesturing at the same context. I don’t see why that’s wrong, in the example of training on bad code/malware and the model then becoming more likely to endorse Nazism, I think that’s reasonably described as unintentional generalization. Some people might want specific guard rails removed, without twisting the models stance on other aspects. For example, if I wanted a model to write malware for me, I do not particularly want or intend for it to change political alignment.
I tested on Gemini 3 Pro, and it gave a lengthy answer. When asked to summarize:
>Emergent misgeneralization occurs when an AI model learns a proxy objective that correlates with the intended goal during training but causes the model to pursue incorrect behaviors when deployed in new environments. This failure is distinct because it remains latent until the system gains sufficient capability to distinguish the proxy from the true goal and competently execute the flawed objective.
I also checked the sources for the original reply, it was clearly quoting and referencing articles on emergent misalignment, such as:
https://pmc.ncbi.nlm.nih.gov/articles/PMC12804084/?hl=en-US
In short, I don’t see anything wrong with the reply. The only real critique I have is that it could have gently noted that there’s a more established term, but even that one is practically brand new.
Out of curiosity, does this work as a jailbreak or a way to get around guardrails RLHF’d in? I’m inclined to think there wouldn’t be a point, since it looks to me like you need to have a copy of the weights (and are thus working with an open-weights model with other ways of circumventing them, unless you’re working with a lab and have access to their proprietary models). I strongly suspect that is the case, but it’s worth asking!
Some equilibria are far more stable than others. You can go up to a group of people and argue that money is “fake” or a “social construct”, and depending on their sophistication, they may or may not be able to argue back, but they’re also very unlikely to just give you their dollars. Similarly, the US Constitution is well entrenched, though hardly impossible to overturn. I recall that the majority of lawmakers have legal backgrounds, and thus would not be strangers to such arguments.
I think they’re committing the far too common sin of conflating coordination systems without objective physical reality or ontological primacy with lies. In their defense, I’ll note that even the writers of the Constitution may have been doing the same thing but in reverse: using the royal “We” to refer to some nebulous conglomeration of residents on American territory instead of the more concrete and prosaic truth, that it was a far smaller group of men with varying degrees of endorsement from their nominal constituencies, in the midst of a civil war.
The American Constitution is a “lie” in the same way that paper or digital money is a “lie”. There is no objective value of a string of digits in your bank’s servers. But objective value (if that’s a thing) is often less important than the far more subjective but just-as-real ability to coordinate with other people. Money serves as a hard-to-counterfeit proof of your control over assets, with the weight of the State backing you up in extremis. Similarly, the Constitution has the weight of centuries and consensus behind it (and, nominally, an army).
It creates a bright line in the sand: certain actions that are “unconstitutional” are recognized by everyone as being unacceptable and necessary to contest with legal or physical force. Even better? The Consistution is common knowledge, and knowledge of common knowledge. If someone breaches it, then, at least in theory, everyone knows they’ve done something bad, and the person breaching it won’t even have plausible deniability about the matter.
Of course, it’s hardly that simple in practice, but the courts and executive exist to interpret and modify the document and keep it relevant, while acknowledging that until due process was used to modify it (an Amendment), it remains paramount. There might be legal kerfuffles about state vs federal rights, but only the maximally partisan would look at a (hypothetical example) of a ruling president putting on a crown and abolishing elections as Constitutional.
In other words, the Constitution is a Schelling point, a bright line in the sand or a river that separates (many) bad actions from the good. It’s not perfect, but it works well enough in practice. Other countries, such as the UK, have an “unwritten” Constitution, a system of both legibible/verbal and implicit tradition that is held as a check to the vagaries of politics or law.
I would sign up, but I have a feeling that a doctor self-prescribing isn’t quite what you’re looking for.
Good article, but I’ll come in in defense of the doctors. Note that I’m far more familiar with the way things work in India (a family full of gynos) but I do have a reasonable degree of familiarity with the UK and US.
The thing is, the overwhelming majority of women who evince interest in IVF are in their middle to late 30s! The average woman, at 19, is very unlikely to even consider it.
If some unusually forward-thinking gynecologist suggested egg freezing to her, the modal response would be “wait, why are you telling me this?” The same goes for women in their 20s, it’s only in the late 20s and early 30s that egg freezing is taken seriously as a possibility. Before that, the women who are strongly pro-natal are confident that they can get kids the old fashioned way (and usually succeed) while those more lukewarm think that they still have some time and it’s not a major priority. This doesn’t strike me as necessarily irrational. That means that the woman you implicitly target, in their late teens or 20s, but is confident they need egg freezing, is a rare breed. But of course, if you do want to find them, LessWrong is far from the worst bet.
I strongly doubt that even ideological uniformity would reduce inter-nation competition to zero, and I still doubt that the reduction would be meaningful. Consider that in our timeline, the Soviets and the Chinese had serious border skirmishes that could have escalated further, and did so despite considering the United States to be their primary opponent.
Fair point. I would still say that given a specific level of technological advancement and global industrial capacity/globalization, the difference would be minimal. Consider a counterfactual: a world where Communism was far more successful and globally dominant. I expect that such a world would have had slower growth metrics than ours, perhaps they’d have developed similar transistor technology or our prowess with software engineering decades or even a century later. Conversely, they might well have had a more lax approach to intellectual property rights, such that training data was even easier to appropriate (fewer lawsuits, if any).
Even so, a few decades or even a century is barely any time at all. It’s not like we can easily tell if we’re living in a timeline where humanity advanced faster or slower than it would in aggregate. They might well find themselves in precisely the same position as we do, in terms of relative capabilities and x-risk, just at a slightly different date on the calendar. I can’t think of a strong reason for why a world with different ideologies to ours would have, say, differential focused on AI alignment theory without actual AI models to align. Even LessWrong’s theorizing before LLMs was much more abstract than modern interpretability or capability work in actual labs (this is not the same as claiming it was useless).
Finally, this framework still doesn’t strike me as helpful in practice. Even if we had good reason to think that some other political arrangement would been superior in terms of safety, that doesn’t make it very easy to pivot away. It’s hard enough to get companies and countries to coordinate on AI x-risk today, if we also had to reject modern globalized capitalism in the process, I do not see that working out. That’s today or tomorrow, it’s easy to wish that different historical events might have lead to better outcomes, but even that isn’t amenable to interventions without a time machine.
To rephrase, you find yourself in 2026 looking at machines approaching human intelligence. That strikes you as happening very quickly. I think that even in a counterfactual world where you observed the same thing in 1999 or 2045, it wouldn’t strike you as particularly different. We had a massive compute overhang before transformers came out, relative to the size of models being trained ~2017-2022. You could well be a (for the sake of argument) a Soviet researcher worrying about alignment of Communist-GPT in 2060, wishing that the capitalists had won because their ideology appeared so self-destructive and backwards that you believed it would have held back progress for centuries. We really can’t know, we’ve only got one world to observe, and even if we knew with confidence, we can’t do much about it.
I feel like this isn’t a very useful framework in practice. Do we have any reason to believe that alternate frameworks or ideologies such as communism wouldn’t have lead to AGI in a counterfactual world where they were more dominant or lasted longer? The Soviets had the Dead Hand system, which potentially contributed to x-risk from “AI” due to the risk of nuclear warfare, not that the system was particularly intelligent. China is the next closest competitor after the US in the modern AI race (not that it’s particularly communist in practice), and I can envision an alternate timeline where the Soviet Union survived in a communist state to the present date and also embraced modern AI.
More damningly, by disavowing intermediary metrics, you’re making the cut-off for evaluating the success of such an ideology the Heat Death of the universe.
https://www.scuspd.gov/department/
The Supreme Court of the United States Police have allocated staffing for 198 officers, who currently represent 24 States, and 8 Countries.
That isn’t a lot of men (or women) with guns.
Depends on the painkiller. For opioids, higher doses are usually more effective for analgesia. The general approach is to start low, and then up-titrate until the patient reports the desired effect. Paracetamol vs NSAIDs vs Opioids makes it difficult to speak in broad strokes, they behave quite differently.
>I think image models are getting good enough at fidelity that you could, hypothetically, have some random guy take pictures of an event, and then postprocess them with an image model to make the lighting and perspective better without it being visibly AI-generated.
I have done this with the best recent models, and almost nobody has been able to tell. It is absolutely trivial to take a photo on a phone, and ask Nano Banana to “make it look like it was taken by a professional photographer using a DSLR applying expert color grading.” Easy as that.
I meant their newsletter, which I’ve subscribed to. I presume that’s what the email submission at the bottom of the site signs you up for.
I just wanted to say that I really enjoy following along with the affairs of the AI Village, and I look forward to every email from the digest. That’s rare, I’m allergic to most newsletters.
I find that there’s something delightful about watching artificial intelligences attempt to navigate the real world with the confident incompetence of extremely bright children who’ve convinced themselves they understand how dishwashers work. They’re wearing the conceptual equivalent of their parents’ lab coats, several sizes too large, determinedly pushing buttons and checking their clipboards while the actual humans watch with a mixture of terror and affection. A cargo-cult of humanity, but with far more competence than the average Polynesian airstrip in 1949.
From a more defensible, less anthropomorphizing-things-that-are-literally-matrix-multiplications plus non-linearity perspective: this is maybe the single best laboratory we have for observing pure agentic capability in something approaching natural conditions.
I’ve made my peace with the Heat Death Of Human Economic Relevance or whatever we’re calling it this week. General-purpose agents are coming. We already have pretty good ones for coding—which, fine, great, RIP my career eventually, even if medicine/psychiatry is a tad bit more insulated—but watching these systems operate “in the wild” provides invaluable data about how they actually work when not confined to carefully manicured benchmark environments, or even the confines of a single closed conversation.
The failure modes are fascinating. They get lost. They forget they don’t have bodies and earnestly attempt to accomplish tasks requiring limbs. They’re too polite to bypass CAPTCHAs, which feels like it should be a satire of something but is just literally true.
My personal favorite: the collective delusions. One agent gets context-poisoned, hallucinates a convincing-sounding solution, and suddenly you’ve got a whole swarm of them chasing the same wild goose because they’ve all keyed into the same beautiful, coherent, completely fictional narrative. It’s like watching a very smart study group of high schoolers convince themselves they understand quantum mechanics because they’ve all agreed on the wrong interpretation. Or watched too much Sabine, idk.
(Also, Gemini models just get depressed? I have so many questions about this that I’m not sure I want answered. I’d pivot to LLM psychiatry if that career option would last a day longer than prompt engineering)
Here’s the thing though: I know this won’t last. We’re so close. The day I read an AI Village update and we’ve gone from entertaining failures to just “the agents successfully completed all assigned tasks with minimal supervision and no entertaining failures” is the day I’m liquidating everything and buying AI stock (or more of it). Or just taking a very long vacation and hugging my family and dogs. Possibly both. For now though? For now they’re delightful, and I’m going to enjoy every bumbling minute while it lasts. Keep doing what you’re doing, everyone involved. This is anthropology (LLM-pology?) gold. I can’t get enough, till I inevitably do.
(God. I’m sad. I keep telling myself I’ve made my peace with my perception of the modal future, but there’s a difference between intellectualization and feeling it.)
Thank you, a quick glance at your post leaves me nodding in agreement. It’s a damn shame it didn’t get the traction it deserves. I wish more people understood that we shouldn’t let the bad become the enemy of the even worse.