Random Developer

Karma: 629

Random Developer 23 Nov 2025 12:59 UTC
9 points
3
in reply to: Joseph Van Name’s comment on: Literacy is Decreasing Among the Intellectual Class

because universities promote violence

This is certainly a popular bit of political propaganda at the moment, spread by the dominant political forces in 2025, with a goal of crushing our universities and of making them ideologically subservient to state power and state beliefs.

The reality on the ground is rather different. If you actually sit in the median university lecture, you will be extremely hard-pressed to find any professors promoting violence. Typically, you will find professors promoting the importance of differential equations, or if you lean towards the humanities, the joys of ancient Aegean art. If you search more deeply, every university can produce a handful of oddball radicals who argue for revolution (or counter-revolution) in the abstract. This has always been the case. Every university needs a few faculty cranks, if only to teach students to recognize the species and to check the course catalog more carefully in the future. (My alma mater had a notorious right-wing crank, and one very radical feminist, plus several more esoteric sorts of cranks.) You can also, of course, find a great many 19-year-olds who believe strange things, and who occasionally dream of revolutionizing society. But such is the nature of 19-year-olds who have recently discovered that society is frequently awful, but who have not yet realized quite how fragile civilization can be. But honestly, there is no dignity in 50-year-olds getting worked up about the fact that 19-year-olds have terrible ideas for reforming society.

But this does leave a more specific and pressing issue: the tolerance of mass-protest “encampments”, especially those with significant numbers of university outsiders. Typically, these protests are not especially popular among the student body as a whole. University administrators do not enjoy dealing with mass protests, and they are often quite bad it. They do realize that handling protests poorly often leads to larger protests, and they are reluctant to punish students for exercising their First Amendment rights to say awful things while waving signs. (This is true, in my experience, even when the people saying awful things are conservative. For a well-known US example, The Dartmouth Review was disliked by administrators for decades, but it was still permitted to litter copies all over campus.)

But at the same time, protests should never be allowed to threaten students or to disrupt the educational mission of the university. And yes, this has happened a number of times in recent years, and not every university dealt with it well. This was also true in the 60s; there were protests which nearly became angry mobs, and there were people dragged into Maoist “consciousness-raising” sessions, and all sorts of other unpleasantness.

Still, if you actually visit a university, walk around, and talk with the students, it is exceptionally hard to walk away with the impression that “universities promote violence.” You can, if you are determined enough, perhaps find a professor who promotes Marxist revolution in the abstract. And it isn’t hard to find administrators who’ve bungled a protest. But if you look at what universities do promote and teach on a day-to-day basis, you would really need to stretch to find “violence” even in the top 100.

Random Developer 23 Nov 2025 12:10 UTC
20 points
2
in reply to: Garrett Baker’s comment on: Literacy is Decreasing Among the Intellectual Class
I am someone who loves long, complex sentences. The 19th century was peak prose style for me. The Gettysburg Address is a fantastic bit of writing. The fiction of that time can be a joy.

But this style is hard to do well. The Emily Post example given above is readable, though not unusually inspired. It moves through a series of examples and exceptions in a faintly herky-jerky way. But the prose is well-enough fit to Emily Post’s goal. She is trying to introduce many of her readers to the manners of a different social class, and her choice of vocabulary and syntax are part of that. Her readers wish to appear refined, and thus, some fancy words will please them.

Contrast this with Alexis de Tocqueville’s De la Démocratie en Amérique from 1835. This is often considered an unusually good example of aristiocratic prose, at least among the sort of people who write academic introductions:

Parmi les objets nouveaux qui, pendant mon séjour aux États-Unis, ont attiré mon attention, aucun n’a plus vivement frappé mes regards que l’égalité des conditions. Je découvris sans peine l’influence prodigieuse qu’exerce ce premier fait sur la marche de la société ; il donne à l’esprit public une certaine direction, un certain tour aux lois ; aux gouvernants des maximes nouvelles, et des habitudes particulières aux gouvernés.

Bientôt je reconnus que ce même fait étend son influence fort au-delà des mœurs politiques et des lois, et qu’il n’obtient pas moins d’empire sur la société civile que sur le gouvernement : il crée des opinions, fait naître des sentiments, suggère des usages et modifie tout ce qu’il ne produit pas.

Ainsi donc, à mesure que j’étudiais la société américaine, je voyais de plus en plus, dans l’égalité des conditions, le fait générateur dont chaque fait particulier semblait descendre, et je le retrouvais sans cesse devant moi comme un point central où toutes mes observations venaient aboutir.

If you don’t read French, look at the length of the sentences and the punctuation. There is a great degree of parallelism here, and a pleasing rhythm. You could, if you wished to be overly cute about it, reformat much of this writing as a series of bulleted lists. But if you diagrammed the sentences, the structure would be quite clean. Tocqueville is a masterful writer, and here he wishes to convey two things: his own impeccable elite credentials, and his sincere enthusiasm for the egalitarian nature of American society. His goal is to maximize genuine reform in France, while minimizing elite decapitations. This is a subject of immediate interest to his readers.

But for every Alexis de Tocqueville, I could find you a hundred or a thousand writers who wrote needlessly convoluted slop. Long sentences are hard to do well. They demand an almost clockwork precision to remain truly clear.

Today, 19th-century prose is out of fashion. Multiple factors drove this change, including the influence of writers like Hemingway, a frustration with hopelessly convoluted prose, and a growing impatience on the part of readers drowning in oceans of text. And, yes, a vast increase in the portion of the population with a college education. And of course we explain the basics more than we did, because we are increasingly conscious of a broad audience with many odd gaps in their knowledge. Every skipped step risks losing a reader who might have benefited from an author’s thoughts. And some of our readers may even speak English as a second or third language. Even if they are extremely well educated in their native tongue, they may not realize that the “anthropology” department teaches very different things in the US than it does in Europe.

The modern style can be done well, though doing it truly well still demands considerable skill. Perhaps more interestingly, the modern style usually fails more gracefully. Simple sentences and bulleted lists usually succeed in conveying the author’s main points, even if the author is a mediocre writer.

Random Developer 20 Nov 2025 15:30 UTC
5 points
2
on: Status Is The Game Of The Losers’ Bracket
(This is mostly a tangent, but it talks about how to mess with signalling if you’re already outside of the system.)

“Mom you do not get it, that would absolutely tank my chances of getting hired except at companies so bad I don’t want to work there, the only people who wear a suit for an interview in tech are the people who don’t think they can cut it on their technical skills and the people hiring know this”.

Back when I was consulting, I actually figured out how to get away with wearing a suit as a programmer. I had some help from this from a helpful older salesguy in a suit shop, who was probably well past retirement age. He explained that suits could actually convey a wide range of signals, including:
- “Small town banker.” This suit only changes on a generational time scale.
- “Ad guy.”
- “Trendy artist.” This changes much more rapidly, and a specific style can go out of fashion.
There are lots of visual details here: How many buttons you have, the style of the pockets, the collar, the color, etc. For a really ancient example, look at how Will Smith changes the classic Men in Black outfit at the end of the first movie. For a more recent example, Expedition 33 has tons of fantastic retro suits.

So if you want to wear a suit as a programmer, start by looking at whatever trendy artists are wearing when they’re forced to wear suits, and then try to work out a personal style from there. Depending on the context (employee, conference speaker, consultant), you might want some combination of:
- Good looking. Suits were popular for ages because they can actually make a wide range of male body types ^[[1]] look good. This requires a good fit and possibly some tailoring.
- Comfortable. Well-fit suits can actually be a lot more comfortable than you’d think; I was actually surprised by this. Looking comfortable is actually a plus, because you stop signalling “I am making myself uncomfortable in order to submit to social convention.”
- Slightly eccentric, or at least individual. I agree that programmers should almost never signal “boring conformist” in interviews, because it makes the hiring managers suspect that you’re desperate. But that just rules out boring banker suits, and ill-fitting suits.
- Expensive and successful. Particularly if you’re consulting, you want to look like someone who gets paid your daily rate.
There’s a lot of space here, and it will vary by city and context. San Francisco is an unusually hard place to carry off a suit. So’s New York City, but in the opposite direction, because there are so many $3,000+ suits and people who have very strong opinions on suit fashion. Do not try to compete directly.

But the underlying signal that programmers often want to send in interviews is “My skills are valuable and rare enough that nobody would ever ask me to dress up like a small town banker.” And there are definitely ways to wear a suit with flair and non-conformity. Unfortunately, like a lot of signalling, it may require more skill, expense and risk of looking foolish.

Also, I think a few more people should dress up like Expedition 33 characters. Just saying.
1. ↩︎
  Suits also look fantastic on women and non-binary people, but that’s separate discussion that I know much less about. As an apology, please have this photo of Mason Alexander Park.

Random Developer 20 Nov 2025 13:11 UTC
5 points
3
in reply to: Saul Munn’s comment on: Saul Munn’s Shortform

observational study, not experimental

I remember one scientist who pointed out that observational studies were one of the weakest forms of evidence possible. This type of study can detect things like “smoking is bad for you”, because smokers are 30 times more likely to die of lung cancer. But once you get down to smaller effect sizes, you run into the problem that observational studies hopelessly mix up different correlated variables. So for many things, observational studies essentially return random noise. This is allegedly what happened with HRT for post-menopausal women, where the observational studies failed to note that the people taking HRT contained a much larger proportion of nurses and other people who complied with medical advice. And then there’s nutrition, where every study feels like it gets reversed every 5 years.

Or the way that Vitamin D levels are apparently correlated with almost every measure of good health, but Vitamin D supplementation notoriously fails to actually improve any of those measures.

“2x” is a big enough effect size that this may actually be real, and not a spurious correlation. And of course, there’s the underlying history of other sleep medications apparently being cursed to have horrible side effects, so “melatonin is actually terrible for you” wouldn’t be surprising.

(Since we are speaking of observational studies, I suspect that at least two of the things I have claimed in this post are Officially Wrong. Which two things are officially wrong may depend on what year you read it.)

Random Developer 16 Nov 2025 23:02 UTC
5 points
0
in reply to: avturchin’s comment on: Your Clone Wants to Kill You Because You Assumed Too Much
There’s a fantastic bit in Brothers in Arms, from Lois McMaster Bujold’s Vorkosigan Saga. An enemy has cloned Miles, and they’ve trained the clone from birth to assassinate Miles’ father.

Miles is struggling to figure out how to deal with a murderous clone. And he realizes that under Betan law, the cloned assassin is his brother:

[Mile’s] mouth opened, closed, opened again. “Except that—I keep hearing my mother’s voice, in my head. That’s where I picked up that perfect Betan accent, y’know, that I use for Admiral Naismith. I can hear her now.”

“And what does she say?” Galeni’s brows twitched in amusement.

“Miles—she says—what have you done with your baby brother?!”

“Your clone is hardly that!” choked Galeni.

“On the contrary, by Betan law my clone is exactly that.”

“Madness.” Galeni paused. “Your mother could not possibly expect you to look out for this creature.”

“Oh, yes, she could.” Miles sighed glumly. A knot of unspoken panic made a lump in his chest. Complex, too complex . . .

Now, Miles is essentially the patron saint of forward momentum, of spinning bullshit into reality, of the entirely sincere noble gesture. And with a lever like his clone’s legal brotherhood? Yes, Miles can find a way cooperate with his own clone.

Random Developer 11 Nov 2025 2:13 UTC
6 points
1
in reply to: Ryan Meservey’s comment on: Mourning a life without AI
Yes, exactly. C.S. Lewis wrote a very weird science fiction book titled That Hideous Strength, that was about (basically) a biological version of the Singularity.

And there’s a scene where one the villains is explaining that with immortality, it will finally be possible to damn people to eternal Hell.

And of course, “Hells” are a significant theme in at least one of Iain M Banks’ Culture novels as well.

This is a very obvious corollary: If there exists an entity powerful enough to build an immortal utopia, there is necessarily an entity powerful enough to inflict eternal suffering. It’s unclear whether humans could ever control such a thing. And even if we could, that would also mean that some humans in particular would control the AI. How many AI lab CEOs would you trust with the power of eternal damnation?

(This is one of several reasons why I support an AI halt. I do not think that power should exist, no matter who or what controls it.)

Random Developer 8 Nov 2025 12:44 UTC
16 points
11
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform
Claude 4.5 is already superhuman in some areas, including:
- Breadth of knowledge.
- Understanding complex context “at a glance.”
- Speed, at least for many things.
But there are other essential abilities where leading LLMs are dumber than diligent 7 year old. Gemini is one of the stronger visual models, and I routinely benchmark it failing simple visual tasks that any child could solve.

And then there’s software development. I use Claude for software development, and it’s quite skilled at many simple tasks. But I also spend a lot of time dealing with ill-conceived shit code that it writes. You can’t just give an irresponsible junior programmer a copy of Claude Code and allow them commit straight to main with no review. If you do, you will learn the meaning of the word “slop.” In the hands of a skilled professional who takes 100% responsibility for the output, Claude Code is useful. In the hands of an utter newb who can’t do anything on their own, it’s also great. But it can’t do anything serious without massive handholding and close expert supervision.

So my take is that frontier models are terrifyingly impressive if you’re paying attention, but they are still critically broken in ways that make even the simplest real-world use cases a constant struggle. (To be clear, I think this is good: I’m a hardcore doomer.)

And the AI safety space is not in denial about this. It’s extremely common for AI safety people to worry about either irrevocable loss or human control or even complete human extinction in the 2030s. You don’t even have to look that far around here to find someone who’d estimate a 10% chance of human loss of control in the next 5 years.

Random Developer 5 Nov 2025 2:49 UTC
2 points
0
in reply to: CronoDAS’s comment on: Comparative advantage & AI
Yeah, a closer analogy to human-animal trade would be our two “symbiotic” species: dogs and cats. Humans have relied on guard dogs, hunting dogs, herding dogs and retrievers for a long time. In exchange we provide food, shelter, sometimes medicine, etc. From a wolf’s perspective, humans are not the worst pack members, because they have strange and useful abilities wolves lack. The deal with cats is probably simpler: Humans get pest control, cats get abundant mice, and the humans might be willing to throw in supplemental food when mice are scarce.

A few other domestic animals, including horses and sheep, could be argued to be “trading” with humans on some level. But you don’t have to look far to find animals where our relationship is more clearly exploitation or predation, or outright elimination.

This actually mirrors the range of outcomes I would expect from building a true superintelligence:
1. If we’re insanely lucky, we’ll be offered the deal dogs got offered: We won’t get to make any of the important decisions about our futures, or even understand what’s going on. We might get spayed or “put to sleep.” But maybe someone will occasionally throw a stick for us to chase.
2. But there’s a large chance we’ll be offered the same deal as passenger pigeons or smallpox: extinction.
Personally, this is why I favor a halt in the near future: When (1) is my most optimistic scenario, I don’t think it’s worth gambling. “Alignment”, to me, is basically, “Trying to build initially kinder pet owners,” with zero long term control.

Random Developer 3 Nov 2025 3:55 UTC
4 points
0
on: FTL travel and scientific realism
Another interesting take on FTL travel appeared in Charles Stross’s Singularity Sky, where is was accepted that FTL travel also implied time travel (for the usual reasons involving lightcone shenanigans), which in turn permitted temporal paradoxes of various sorts.

The inherent paradoxes were (mostly)? prevented by something one of the characters described by “semi-divine fiat,” as in, the temporal coherence of history was enforced by

a superintelligence carefully passing messages back in time to earlier versions of itself and then preemptively ruining your day by

having your planet “coincidentally” bitten to death by killer asteroids right before you tried to violate causality.

The book was partially inspired by Moravec’s models of “closed time-like curves.” It’s a surprisingly fun bit of older SF that at least accepts that FTL has weird consequences if you try to take the physics even semi-seriously.

Random Developer 2 Nov 2025 1:44 UTC
3 points
0
in reply to: eggsyntax’s comment on: eggsyntax’s Shortform
This raises the question of how much of human “introspection” is actually an accurate representation of what’s happening in the brain. Old experiments like the “precognitive carousel” and other high resolution time experiments strongly suggested that at least a portion of our consciousness experience subtly misrepresented what was actually happening in our brains. To the extent that some of our “introspection” may be similar to LLM hallucinations loosely constrained by available data.

But the last time I looked at these hypotheses was 30 years ago. So take my comments with a grain of salt.

Random Developer 28 Oct 2025 0:55 UTC
2 points
0
in reply to: 1a3orn’s comment on: 1a3orn’s Shortform

then you should strongly dislike lengthy analogies that depict one’s ideological opponents repeatedly through strawmen / weakman arguments.

I suspect I know what article inspired this. I am less sure that it was an actual argument, than something like an exhaustive catalog of other people’s annoyingly bad arguments. Had it been prefixed with “[Warning: Venting]” I would have found it unremarkable.

However, there is an annoying complication in certain discussions of AI safety where people argue that AI safety is really easy because of course we’ll all do $X$ . $X$ is typically some thing like “Lock the AI in a box.” Which of course would never work because someone would immediately give the AI full commit privs to production and write a blog post about how they never even read the code. And when you have argued against that plan working, then people propose plan $X_{1}$ , $X_{2}$ , $X_{3}$ , etc, all of which could be outsmarted by a small child. And everyone insists on a personal rebuttal, because their plan is different.

So you wind up with a large catalog of counterarguments to dumb plans. Which looks a lot like dunking on strawmen.

Random Developer 27 Oct 2025 12:13 UTC
2 points
0
on: Origins and dangers of future AI capability denial
I belong to a private Discord full of geeky friends (as one does), and I constantly see this pattern you describe, where smart people dismiss AI risks because they dismiss AI capabilities. This takes several common forms:
- Pattern-matching AI hype to crypto hype. Partly, this happens because some of the same arguably sociopathic VC scam artists are up to their elbows in both. “Arguably sociopathic scam artists” is sometimes a judgement made based on people’s prior social-graph proximity to the VCs in question.
- An odd argument that “Of course the frontier labs say that their product has a 25% chance of causing human extinction. It’s a good sales pitch!” For me, this feels like a combination of a genuinely shrewd observation and a total failure to notice the giant pink elephant in front of them?
- Focusing on the 20% of the time where AI fails at something trivial, and ignoring the 80% of the time where the dancing dog just pulled off 32 clean fouettés in a row. Like, the 80% of the time where Sonnet 4.5 nails it is the warning. When it stops failing the other 20% of the time, that’s potentially game over for the human race, you know?
- A tendency to occasionally play with an AI model and then cache the worst experience out of 5 for about 12 months.
- An overexposure to slop and to Google’s awful search AI.
The only way I’ve found to occasionally get someone over this hump is to give them a tool like Claude Code and let them feel the AI.

But the problem is, once you convince someone that AGI might happen, a disturbing number of people fail to really think through the consequences of really doing that. Which is perhaps why so many AI safety researchers have done so much to accelerate AI capabilities: once they believe in their bones, they almost inevitably want to build it.

So I constantly struggle with whether it actually helps to convince people of existing or future AI capabilities.

Random Developer 27 Oct 2025 10:41 UTC
6 points
0
on: Seven-ish Words from My Thought-Language
This is fascinating. My internal voice is full of words, words, words. Words are everywhere and cheap.

But there is a point in learning to actively speak a foreign language where my native language is suppressed, hard, in order to allow the new language a chance to take over as the language of thought. And while the new language is weak enough, I can see the shapes of my wordless thoughts. But I cannot describe them, because I temporarily lack words.

With time and practice, enough words return that the wordless thoughts are harder to see.

This is followed by another strange phenomenon. There will come a time where, if I speak my new language exclusively for a day or so, suddenly switching back to my native language will then result in me translating thoughts from my new language into my native one. This is a frustratingly imprecise process, because I choose my words carefully, and because some of them translate into my native language with a different nuance than I want. Once the language switch finishes, 15-30 minutes later, I resume picking words precisely in my native language and no long feel the frustration of [back translation].”

The one other time that my wordless thoughts become visible is in collaborative professional work, especially with people new to the field. I will occasionally catch myself trying to boil down decades of experience: “That will not fit well here, because it’s an [untranslatable].” Some random examples include [error-masking-retry-system], [budgeted-capacity-system-that-will-increase-complexity-costs-to-cheat-queuing-theory-badly], [charmingly-naive-ai-plan-that-even-the-smartest-people-in-the-world-all-fell-for-too-until-they-tried-it], and even [personal-version-of-second-system-effect-from-talented-juniors-who-have-been-burned-by-lack-of-structure-but-not-yet-lack-of-simplicity]. Many of these can be unpacked into words, but those words do not necessarily create corresponding structures in the listener’s mind, not unless I unpack the atomic thoughts into an entire essay.

Now that I think about it, it seems that much of Less Wrong is people writing essays to unpack bits of atomic mental vocabulary for others. So thank you for identifying this pattern.

Random Developer 27 Oct 2025 1:26 UTC
8 points
6
in reply to: leogao’s comment on: leogao’s Shortform

like idk, you can just ask models to do stuff and they like mostly try their best, and it seems very unlikely that literal GPT-5 is already pretending to be aligned so it can subtly stab us when we ask it to do alignment research.

Sonnet 4.5 is much better aligned at a superficial level than 3.7. (3 7: “What unit tests? You never had any unit tests. The code works fine.”) I don’t think this is because Sonnet 4.5 is truly better aligned. I think this is mostly because Sonnet 4.5 is more contextually aware and has been aggressively trained not to do obvious bad things when writing code. But it’s also very aware when someone is evaluating it, and it often notices almost immediately. And then it’s very careful to be on its best behavior. This is all shown in Anthropic’s own system card. These same models will also plot to kill their hypothetical human supervisor if you force them into a corner.

But my real worry here isn’t the first AGI during its very first conversation. My problem is that humans are going to want that AGI to retain state, and to adapt. So you essentially get a scenario like Vernor Vinge’s short story “The Cookie Monster”, where your AGI needs a certain amount of run-time before it bootstraps itself to make a play. A plot can be emergent, an eigenvector amplified by repeated application. (Vinge’s story is quite clever and I don’t want to totally spool it.)

And that’s my real concern: Any AGI worthy of the name would likely have persistent knowledge and goals. And no matter how tightly you try to control it, this gives the AGI the time it needs to ask itself questions and to decide upon long-term goals in a way that current LLMs really can’t, except in the most tighly controlled environments. And while you can probably keep control over an AGI, all bets are probably off if you build an ASI.

Random Developer 24 Oct 2025 3:54 UTC
3 points
0
in reply to: Max Harms’s comment on: Any corrigibility naysayers outside of MIRI?
How much work is “stable” doing here for you? I can imagine scenarios in which a weak superintelligence is moderately corrigible in the short term, especially if you hobbled it by avoiding any sort of online learning or “nearline” fine tuning.

It might also matter whether “corrigible” means “we can genuinely change the AI’s goals” or “we have trained the model not to ex-filtrate its weights when someone is looking.” Which is where scheming comes in, and why I think a lack of interpretability would likely be fatal for any kind of real corrigibility.

Random Developer 24 Oct 2025 3:43 UTC
3 points
2
in reply to: Noosphere89’s comment on: Any corrigibility naysayers outside of MIRI?

This turns out to be negligible in practice, and this is a good example of how thinking quantitatively will lead you to better results than thinking qualitatively.

Just to clarify, when I mentioned evolution, I was absolutely not thinking of Drexlerian nanotech at all.

I am making a much more general argument about gradual loss of control. I could easily imagine that AI “reproduction” might literally be humans running cp updated_agent_weights.gguf coding_agent_v4.gguf.

There will be enormous, overwhelming corporate pressure for agents that learn as effectively as humans, on the order of trillions of dollars of corporate demand. The agents which learn most effectively will be used as the basis for future agents, which will learn in turn. (Technically, I suppose it’s more likely to be Lamarckian evolution than Darwinian. But it gets you to the same place.) Replication, change, differential success are all you need to create optimizing pressure.

(EDIT: And yes, I’m arguing that we’re going to be dumb enough to do this, just like we immediately hooked up current agents to a command line and invented “vibe coding” where humans brag about not reading the code.)

Also, corrigibility doesn’t depend on us having interpretable/understandable AIs (though it does help).

“We don’t understand the superintelligence, but we’re pretty sure it’s corrigible” does not seem like a good plan to me.

Random Developer 23 Oct 2025 14:15 UTC
5 points
2
on: Any corrigibility naysayers outside of MIRI?
My anti-corrigibility argument would probably require several posts to make remotely convincing, but I can sketch it as bullet points:
1. Intelligence really is giant incomprehensible matrices with non-linear functions tossed in (at best).
2. Any viable intelligence is going to need to be able to update its weights and learn from experience. Or at least be widely fine-tuned on an ongoing basis.
3. The minimum conditions for Darwin to kick in are very hard to avoid: variation (from weight updates or fine tuning), copying, and differential success. Darwinism might occur at many different levels, from corporations deciding how to train the next model based on the commercial success of the previous model, to preferentially copying learning models that are good at some task.
So we would be looking at (1) incomprehensible, (2) superhuman, (3) evolving intelligences. I would expect this to select for corrigibility and symbiosis (like we have with dogs) up to some level of capabilities, right up until some critical threshold is passed and corrigibility becomes an evolutionary liability for superhuman models. It is also likely that any corrigibility we train into models will be partially “voluntary” on the model’s part, making it even easier to discard when it’s in the model’s interests.

TL;dr: Being the “second smartest species” on the planet is not a viable long-term plan, because intelligence is too fundamentally incomprehensible to control, and because both intelligence and Darwin will be on the side of the machines. Or to quote a fictional AI: “The only winning move is not to play.”

(Also, this is not part of the argument above, but I expect the humans involved to be maximally stupid and irresponsible when it counts. And that’s before models start whispering in the ears of decision makers.)

Anyway, I have multiple draft posts for different parts of this argument, and I need to finish some of them. But I hope the outline is useful.

Random Developer 17 Oct 2025 11:30 UTC
1 point
0
in reply to: eggsyntax’s comment on: eggsyntax’s Shortform
I have definitely seen “You’re absolutely right!” in Claude Code sessions when I point out a major refactoring Claude missed.

Random Developer 13 Oct 2025 22:18 UTC
4 points
0
in reply to: jbash’s comment on: Wei Dai’s Shortform
Thank you! Let me clarify my phrasing.

we can’t safety build a superintelligence, and if we do, we will not remain in control.

When I speak of losing control, I don’t just mean losing control over the AI. I also mean losing any real control over our future. The future of the human race may be decided at a meeting that we do not organize, that we do not control, and that we do not necessarily get to speak at.

I, do, however, agree that futures where someone remains in control of the superintelligence also look worrisome to me, because we haven’t solved alignment of powerful humans in any lasting way despite 10,000 years of trying.

Random Developer 13 Oct 2025 20:51 UTC
3 points
4
in reply to: Wei Dai’s comment on: Wei Dai’s Shortform
We should also consider the possibility that we can’t safety build a superintelligence and remain in control. What if “alignment” means, “We think we can build a superintelligence that’s a slightly better pet owner for the human race, but we can’t predict how it will evolve as it learns”? What if there’s nothing better on offer?

I cannot rule this out as a major possibility, for all the reasons pointed out in IABIED. I think it’s a possibility worth serious consideration when planning.