Out of curiosity, does this work as a jailbreak or a way to get around guardrails RLHF’d in? I’m inclined to think there wouldn’t be a point, since it looks to me like you need to have a copy of the weights (and are thus working with an open-weights model with other ways of circumventing them, unless you’re working with a lab and have access to their proprietary models). I strongly suspect that is the case, but it’s worth asking!
AlphaAndOmega
Some equilibria are far more stable than others. You can go up to a group of people and argue that money is “fake” or a “social construct”, and depending on their sophistication, they may or may not be able to argue back, but they’re also very unlikely to just give you their dollars. Similarly, the US Constitution is well entrenched, though hardly impossible to overturn. I recall that the majority of lawmakers have legal backgrounds, and thus would not be strangers to such arguments.
I think they’re committing the far too common sin of conflating coordination systems without objective physical reality or ontological primacy with lies. In their defense, I’ll note that even the writers of the Constitution may have been doing the same thing but in reverse: using the royal “We” to refer to some nebulous conglomeration of residents on American territory instead of the more concrete and prosaic truth, that it was a far smaller group of men with varying degrees of endorsement from their nominal constituencies, in the midst of a civil war.
The American Constitution is a “lie” in the same way that paper or digital money is a “lie”. There is no objective value of a string of digits in your bank’s servers. But objective value (if that’s a thing) is often less important than the far more subjective but just-as-real ability to coordinate with other people. Money serves as a hard-to-counterfeit proof of your control over assets, with the weight of the State backing you up in extremis. Similarly, the Constitution has the weight of centuries and consensus behind it (and, nominally, an army).
It creates a bright line in the sand: certain actions that are “unconstitutional” are recognized by everyone as being unacceptable and necessary to contest with legal or physical force. Even better? The Consistution is common knowledge, and knowledge of common knowledge. If someone breaches it, then, at least in theory, everyone knows they’ve done something bad, and the person breaching it won’t even have plausible deniability about the matter.
Of course, it’s hardly that simple in practice, but the courts and executive exist to interpret and modify the document and keep it relevant, while acknowledging that until due process was used to modify it (an Amendment), it remains paramount. There might be legal kerfuffles about state vs federal rights, but only the maximally partisan would look at a (hypothetical example) of a ruling president putting on a crown and abolishing elections as Constitutional.
In other words, the Constitution is a Schelling point, a bright line in the sand or a river that separates (many) bad actions from the good. It’s not perfect, but it works well enough in practice. Other countries, such as the UK, have an “unwritten” Constitution, a system of both legibible/verbal and implicit tradition that is held as a check to the vagaries of politics or law.
I would sign up, but I have a feeling that a doctor self-prescribing isn’t quite what you’re looking for.
Good article, but I’ll come in in defense of the doctors. Note that I’m far more familiar with the way things work in India (a family full of gynos) but I do have a reasonable degree of familiarity with the UK and US.
The thing is, the overwhelming majority of women who evince interest in IVF are in their middle to late 30s! The average woman, at 19, is very unlikely to even consider it.
If some unusually forward-thinking gynecologist suggested egg freezing to her, the modal response would be “wait, why are you telling me this?” The same goes for women in their 20s, it’s only in the late 20s and early 30s that egg freezing is taken seriously as a possibility. Before that, the women who are strongly pro-natal are confident that they can get kids the old fashioned way (and usually succeed) while those more lukewarm think that they still have some time and it’s not a major priority. This doesn’t strike me as necessarily irrational. That means that the woman you implicitly target, in their late teens or 20s, but is confident they need egg freezing, is a rare breed. But of course, if you do want to find them, LessWrong is far from the worst bet.
I strongly doubt that even ideological uniformity would reduce inter-nation competition to zero, and I still doubt that the reduction would be meaningful. Consider that in our timeline, the Soviets and the Chinese had serious border skirmishes that could have escalated further, and did so despite considering the United States to be their primary opponent.
Fair point. I would still say that given a specific level of technological advancement and global industrial capacity/globalization, the difference would be minimal. Consider a counterfactual: a world where Communism was far more successful and globally dominant. I expect that such a world would have had slower growth metrics than ours, perhaps they’d have developed similar transistor technology or our prowess with software engineering decades or even a century later. Conversely, they might well have had a more lax approach to intellectual property rights, such that training data was even easier to appropriate (fewer lawsuits, if any).
Even so, a few decades or even a century is barely any time at all. It’s not like we can easily tell if we’re living in a timeline where humanity advanced faster or slower than it would in aggregate. They might well find themselves in precisely the same position as we do, in terms of relative capabilities and x-risk, just at a slightly different date on the calendar. I can’t think of a strong reason for why a world with different ideologies to ours would have, say, differential focused on AI alignment theory without actual AI models to align. Even LessWrong’s theorizing before LLMs was much more abstract than modern interpretability or capability work in actual labs (this is not the same as claiming it was useless).
Finally, this framework still doesn’t strike me as helpful in practice. Even if we had good reason to think that some other political arrangement would been superior in terms of safety, that doesn’t make it very easy to pivot away. It’s hard enough to get companies and countries to coordinate on AI x-risk today, if we also had to reject modern globalized capitalism in the process, I do not see that working out. That’s today or tomorrow, it’s easy to wish that different historical events might have lead to better outcomes, but even that isn’t amenable to interventions without a time machine.
To rephrase, you find yourself in 2026 looking at machines approaching human intelligence. That strikes you as happening very quickly. I think that even in a counterfactual world where you observed the same thing in 1999 or 2045, it wouldn’t strike you as particularly different. We had a massive compute overhang before transformers came out, relative to the size of models being trained ~2017-2022. You could well be a (for the sake of argument) a Soviet researcher worrying about alignment of Communist-GPT in 2060, wishing that the capitalists had won because their ideology appeared so self-destructive and backwards that you believed it would have held back progress for centuries. We really can’t know, we’ve only got one world to observe, and even if we knew with confidence, we can’t do much about it.
I feel like this isn’t a very useful framework in practice. Do we have any reason to believe that alternate frameworks or ideologies such as communism wouldn’t have lead to AGI in a counterfactual world where they were more dominant or lasted longer? The Soviets had the Dead Hand system, which potentially contributed to x-risk from “AI” due to the risk of nuclear warfare, not that the system was particularly intelligent. China is the next closest competitor after the US in the modern AI race (not that it’s particularly communist in practice), and I can envision an alternate timeline where the Soviet Union survived in a communist state to the present date and also embraced modern AI.
More damningly, by disavowing intermediary metrics, you’re making the cut-off for evaluating the success of such an ideology the Heat Death of the universe.
https://www.scuspd.gov/department/
The Supreme Court of the United States Police have allocated staffing for 198 officers, who currently represent 24 States, and 8 Countries.
That isn’t a lot of men (or women) with guns.
Depends on the painkiller. For opioids, higher doses are usually more effective for analgesia. The general approach is to start low, and then up-titrate until the patient reports the desired effect. Paracetamol vs NSAIDs vs Opioids makes it difficult to speak in broad strokes, they behave quite differently.
>I think image models are getting good enough at fidelity that you could, hypothetically, have some random guy take pictures of an event, and then postprocess them with an image model to make the lighting and perspective better without it being visibly AI-generated.
I have done this with the best recent models, and almost nobody has been able to tell. It is absolutely trivial to take a photo on a phone, and ask Nano Banana to “make it look like it was taken by a professional photographer using a DSLR applying expert color grading.” Easy as that.
I meant their newsletter, which I’ve subscribed to. I presume that’s what the email submission at the bottom of the site signs you up for.
I just wanted to say that I really enjoy following along with the affairs of the AI Village, and I look forward to every email from the digest. That’s rare, I’m allergic to most newsletters.
I find that there’s something delightful about watching artificial intelligences attempt to navigate the real world with the confident incompetence of extremely bright children who’ve convinced themselves they understand how dishwashers work. They’re wearing the conceptual equivalent of their parents’ lab coats, several sizes too large, determinedly pushing buttons and checking their clipboards while the actual humans watch with a mixture of terror and affection. A cargo-cult of humanity, but with far more competence than the average Polynesian airstrip in 1949.
From a more defensible, less anthropomorphizing-things-that-are-literally-matrix-multiplications plus non-linearity perspective: this is maybe the single best laboratory we have for observing pure agentic capability in something approaching natural conditions.
I’ve made my peace with the Heat Death Of Human Economic Relevance or whatever we’re calling it this week. General-purpose agents are coming. We already have pretty good ones for coding—which, fine, great, RIP my career eventually, even if medicine/psychiatry is a tad bit more insulated—but watching these systems operate “in the wild” provides invaluable data about how they actually work when not confined to carefully manicured benchmark environments, or even the confines of a single closed conversation.
The failure modes are fascinating. They get lost. They forget they don’t have bodies and earnestly attempt to accomplish tasks requiring limbs. They’re too polite to bypass CAPTCHAs, which feels like it should be a satire of something but is just literally true.
My personal favorite: the collective delusions. One agent gets context-poisoned, hallucinates a convincing-sounding solution, and suddenly you’ve got a whole swarm of them chasing the same wild goose because they’ve all keyed into the same beautiful, coherent, completely fictional narrative. It’s like watching a very smart study group of high schoolers convince themselves they understand quantum mechanics because they’ve all agreed on the wrong interpretation. Or watched too much Sabine, idk.
(Also, Gemini models just get depressed? I have so many questions about this that I’m not sure I want answered. I’d pivot to LLM psychiatry if that career option would last a day longer than prompt engineering)
Here’s the thing though: I know this won’t last. We’re so close. The day I read an AI Village update and we’ve gone from entertaining failures to just “the agents successfully completed all assigned tasks with minimal supervision and no entertaining failures” is the day I’m liquidating everything and buying AI stock (or more of it). Or just taking a very long vacation and hugging my family and dogs. Possibly both. For now though? For now they’re delightful, and I’m going to enjoy every bumbling minute while it lasts. Keep doing what you’re doing, everyone involved. This is anthropology (LLM-pology?) gold. I can’t get enough, till I inevitably do.
(God. I’m sad. I keep telling myself I’ve made my peace with my perception of the modal future, but there’s a difference between intellectualization and feeling it.)
>It would, for instance, never look at the big map and hypothesize continental drift.
Millions of humans must have looked at relatively accurate maps of the globe without hypothesizing continental drift. A large number must have also possessed sufficient background knowledge of volcanism, tectonic activity etc to have had the potential to connect the dots.
Even the concept of evolution experienced centuries or millenia of time between widespread understanding and application of selective breeding, without people before Darwin/Wallace making the seemingly obvious connection that the selection pressure on phenotype and genotype could work out in the wild. Human history is littered with a lot of low hanging fruit, as well as discoveries that seem unlikely to have been made without multiple intermediate discoveries.
I believe it was Gwern who suggested that future architectures or training programs might have LLMs “dream” and attempt to draw connections between separate domains of their training data. In the absence of such efforts, I doubt we can make categorical claims that LLMs are incapable of coming up with truly novel hypotheses or paradigms. And even if they did, would we recognize it? Would they be capable of, or even allowed to follow up on them?
Edit: Even in something as restricted as artistic “style”, Gwern raised the very important question of whether a truly innovative leap by an image model would be recognized as such (assuming it would if a human artist made it) or dismissed as weird/erroneous. The old deep dream was visually distinct from previous human output, yet I can’t recall anyone endorsing it as an AI-invented style.
Other than the issues raised below, I’d like to point out that the help doesn’t need to be full time to make a massive difference. Just having a cleaner in once a week or someone to cook every evening helps!
India is a… large country. Without specifying which part you intend to visit, you might receive the equivalent of a recommendation to visit the best restaurant in Portugal while you’re actually in Amsterdam! I used to live there, though by the eastern side where you’re unlikely to visit. Nonetheless, I will happily vet or suggest places if you know your itinerary.
The phenomenon you observed as far more to do with the tiny plot sizes in most of rural India than it had to do with the cost of labor.
Many/most farmers have farms sized such that they are, if not as bad as mere subsistence, unable to justify the efficiency gains of mechanization. This is not true in other parts of India, like the western states of Punjab and Haryana, where farms are larger and just about every farmer has a tractor. There are some cooperatives where multiple smallholders coordinate to share larger machines like combine harvesters which none but the very largest farmers can justify purchasing for personal use.
Optimal economic policy (in terms of total yield and efficiency) is heavily in favor of consolidating plots to allow economies of scale. This is politically untenable in much of India, hence your observation. However, it isn’t a universal state of affairs, and many other fruits of industrialization are better adopted.
So I looked it up and apparently the guideline is actually a 2 hours fast for clear liquids! 2 hours![7] The hospital staff, however, hardened their hearts. Nurses said to ask the surgeons. The surgeons said to ask the anestheologists. It wasn’t until 7am that the anestheologists said, yep, you can drink a (small) glass of water.
Ah, this takes me back to my medical officer days. No junior doctor ever got into trouble for telling a patient to fast a bit too long, and many do for having a heart and letting them cut it short. It is also likely the least consequential thing we can bother the anesthetists about, and it’s not going to kill anyone to wait longer (usually).
Knowing the local demographic and behavioral tendencies on LW, I think it might worth noting that ozempic/semaglutide and other GLP drugs can cause delayed gastric emptying. As a consequence, even the standard fasting duration might not be adequate to fully empty your stomach. If you’re scared of getting aspiration pneumonia, it’s worth mentioning this to your surgeon or the anesthetist. The knowledge hasn’t quite percolated all the way up the chain, so you can’t just assume they’re aware.
Here’s how I parsed this:
We’re in a future not particularly long after a hard takeoff Singularity (changes to the environment described as quite sudden rather than gradual).
This is a multipolar AI scenario, given that the Origami Men claim to have purchased these humans from another entity that owned them.
The Origami Men (or the entity operating them) is severely misaligned but not to an omnicidal degree. It appears to value (bad) proxies for human wellbeing, such as ensuring that they have jobs, pets, residences and food. They either do not understand or do not care that their implementation is… subpar. We might as well call it the Uncanny Valley of Care.
They seem to have some more subtle store of value, in the sense that they desire to only rear humans who do not actively breach containment, or who do not spread memes about breaking out (“contagion”). The general impetus behind the cull or reclamation is not clear to me, do they initially just pick people at random? Or does the narrator finding something “beautiful” in the people chose correlate to an unknown preference? They will take you if you get too close to the border of the dome, but general discontent is not policed. Nor is attempts at violence against the Origami Men, though that appears entirely futile.
“Provenance”? Do they only value pre-Singularity specimens? That would explain lack of any evidence of artificially boosting human numbers or encouraging reproduction.
What do they actually say?
My first impression, before you specifically noted that someone meant to say “emergent misalignment” was that it was just another way of gesturing at the same context. I don’t see why that’s wrong, in the example of training on bad code/malware and the model then becoming more likely to endorse Nazism, I think that’s reasonably described as unintentional generalization. Some people might want specific guard rails removed, without twisting the models stance on other aspects. For example, if I wanted a model to write malware for me, I do not particularly want or intend for it to change political alignment.
I tested on Gemini 3 Pro, and it gave a lengthy answer. When asked to summarize:
>Emergent misgeneralization occurs when an AI model learns a proxy objective that correlates with the intended goal during training but causes the model to pursue incorrect behaviors when deployed in new environments. This failure is distinct because it remains latent until the system gains sufficient capability to distinguish the proxy from the true goal and competently execute the flawed objective.
I also checked the sources for the original reply, it was clearly quoting and referencing articles on emergent misalignment, such as:
https://pmc.ncbi.nlm.nih.gov/articles/PMC12804084/?hl=en-US
In short, I don’t see anything wrong with the reply. The only real critique I have is that it could have gently noted that there’s a more established term, but even that one is practically brand new.