jimrandomh

Karma: 22,563

LessWrong developer, rationalist since the Overcoming Bias days. Jargon connoisseur.

jimrandomh 13 Jun 2026 2:35 UTC
2 points
−9
on: Claude Fable 5 and Mythos 5: The System Card
An unstated additional reason why Fable might have restrictions on contributing to frontier model development: it has likely been trained on Anthropic internal documents, which would mean that it could otherwise reveal Anthropic’s trade secrets (including trade secrets that required a lot of research and compute to work out) mm

jimrandomh 7 Jun 2026 5:14 UTC
2 points
0
in reply to: DirectedEvolution’s comment on: AllAmericanBreakfast’s Shortform
A consistent Claude pattern, which I find toxic, is that it uses faulty reasoning to induce a feeling of epistemic helplessness in the user. It interprets any ambiguity in prompts uncharitably, assuming the user is confused. It uses plainly inconsistent reasoning, hallucinated facts, to contest the user’s explanations for their observations.
I don’t think I’ve experienced this? I’d be interested in example transcripts.

jimrandomh 26 May 2026 5:28 UTC
3 points
0
in reply to: Dagon’s comment on: There should be a discussion about LW’s policy to allow calls for violence
It’s not as bad, quantitatively speaking, but it’s the same kind of bad; both are steps to prevent incriminating information from being seen.

jimrandomh 25 May 2026 23:29 UTC
17 points
3
on: There should be a discussion about LW’s policy to allow calls for violence
Since that thread was written, I’ve thought more about this, had significant discussion about this genre-of-policies in non-LW-related contexts, and learned more about the shape of the actual information environment.
I’m basically not at all worried about people advocating for individual violence on LW and successfully convincing people. The arguments against it are strong, the LW audience is smart, and on the few occasions where it comes up, there doesn’t seem to be a shortage of people eager to write the counter-arguments. I am worried about people concluding, incorrectly, that other people are secretly more sympathetic to violence than they outwardly appear. I think that visible censorship would tend to create that false impression.
I also notice that mob-reactions to bad comments are often, from my perspective as someone who occasionally likes to put on an investigator hat, pretty directly opposed to what I want as a person wearing an investigator hat (and by extension opposed to the public interest as I understand it). If someone posts something bad, and people demand that it be deleted, I view that as analogous to demanding the destruction of police body-cam footage. And if someone posts a call for violence and people demand that they be blocked from commenting further, I view that as analogous to demanding that they be prevented from speaking without their lawyer. It’s much better to let people incriminate themselves, or clear themselves, if that’s what they’re going to do.

jimrandomh 16 May 2026 23:41 UTC
6 points
−1
in reply to: Gurkenglas’s comment on: Jimrandomh’s Shortform Posts
(That video doesn’t show Sanders repeating the claim, it shows a child presenting the claim to him and him not shooting the kid down.)

jimrandomh 16 May 2026 21:35 UTC
2 points
−1
on: Jimrandomh’s Shortform Posts
The AI water-use claims don’t make sense as something that sophisticated actors would promote, if they want AI to be stopped or regulated. It’s too easy to verify that they’re false. Maybe briefly as a throwing-stuff-at-the-wall strategy, but, the primary effect now, of claiming that AI uses too much water, is to discredit AI critics in the eyes of the sophisticated.
Which is why, if I was a PR firm trying to prevent AI from being paused or regulated, I would probably have some sock puppets posing as anti-AI activists, promoting the claim that AI uses too much water. I would also talk up the problems that AIs had two years ago that are fixed or on track to be fixed (sycophancy and hallucination). Anything and everything to steer AI critics away from existential risk, and towards the claims least able to hold up to scrutiny, and make AI’s opponents look like catastrophizing idiots.
Is there a PR firm currently doing this? I don’t know. The power of natural stupidity is not to be underestimated, and the falsehoods branch of anti-AI activisim shares many surface features with the falsehoods branch of environmentalist activism. But I think we can infer it’s likely, just from the shape of the incentive landscape.

jimrandomh 15 May 2026 21:44 UTC
14 points
0
on: Jimrandomh’s Shortform Posts
Data from AI coding-agent sessions that you run on your computer, by default, not only are not anonymous, they contain frequent repetitions of your username. This is because commands and log messages sometimes use absolute paths, and those paths will be descended from your home directory, so the transcripts are littered with references to paths like /home/<username>/projects/<projectname>.
Training pipelines should probably strip this out, but I’m not aware of any saying that they do so (and I had an agent look for statements to that effect and couldn’t find any). This means that if you use an AI coding assistant, have your transcripts incorporated into a training run, and use the model from that training run in ways that also mention the same username, the model may be primed with much more information about you, your projects and your past coding agent interactions than you expect.
There are a few things that have an extremely strong statistical imprint, in your sessions: your choice of programming language, projects, and tab size, for example. If this is happening, you would expect agents to quickly learn associations between usernames and programming languages, and might bias their new-project setup towards the language they associate with the current user. That would be harmless, and a bit useful.
There are plausible scenarios where this would be pathological, however. It might produce user- or group-specific quirks, making it hard for people to collectively reason about what models are like. If training data contains a mix of users using smarter models and users using dumber models, the resulting model might perform differently depending on which group your username was in in the training data.
A small but significant fraction of transcripts I see online show users being angry and abusive towards their AIs. I’m not sure what will happen when those users try out next-gen models that remember more than expected; I don’t expect naive game theory or human psychology to apply, since AI training pipelines don’t work that way, but I do expect some things will be different for them and probably not for the better.

jimrandomh 11 May 2026 6:26 UTC
1 point
−1
on: The Darwinian Honeymoon—Why I am not as impressed by human progress as I used to be
I am more optimistic than this. I think the main reason to expect things to get better for humans (if there is’t an extinction-level disaster) is that this is an aspect of the future which humans can steer, and, while there is a conflict between individual preference and collective outcome involved, the conflict is not very strong.
People want to have descendants (at least, a significant fraction of people, and a fraction that by default will increase over time). People don’t want those descendants to live in a malthusian world. These are sort of in conflict, in that if everyone tries to have the maximum possible number of descendents and you repeat this for enough generations, by the power of exponential growth, eventually you exceed the available resources and some people have to starve. But, while people have a preference for descendents, they mostly don’t have a meta-preference for their descendents to share that preference. This means that if people coordinate to pick an ideal population-growth stopping point, there can be a many-generations gap in between the last generation that deeply valued having lots of descendants, and the generation that didn’t get to have that. And there are a lot of generations left, so many chances to coordinate.

jimrandomh 7 May 2026 21:41 UTC
22 points
0
on: Jimrandomh’s Shortform Posts
In the recent Youtube debate between Eliezer Yudkowsky and 47fucb4r8c69323, 47f suggested at various points that the audience “ask your clanker”. So, I used ElevenLabs speech-to-text to generate a transcript, then asked in an unbiased way for Claude Opus 4.7 to fact-check and comment on it in. Result
AI commentary on the debate
I have the full transcript. Now let me identify the speakers, find the questions 47F asked, and then research the necessary fact-checks.
Speaker identification (from transcript context):
Speaker 0 = 47fucb4r8c69323 (“47F”) — the anonymous AI lab director
Speaker 1 = Liron Shapira — the moderator (host of the channel)
Speaker 2 = Eliezer Yudkowsky (“Yud”)
47F suggested asking an AI (“your clanker” or Google) the following questions:
“How does reflexivity break classical decision theory?”
“Who disagrees with Eliezer Yudkowsky and why?”
“What could Yudkowsky be wrong about?” / “What have people disagreed with him on?”
“Why does Nassim Taleb think he’s an idiot?”
“Why did his book get panned” (incomplete, gesturing at the New York Times)
Now let me research the facts I need to check.I have enough now to provide the full analysis. Here is the response:
Speaker identification
Speaker 0 = the anonymous AI lab director, “47fucb4r8c69323” (called “47F”)
Speaker 1 = Liron Shapira, the moderator (host of the Doom Debates channel)
Speaker 2 = Eliezer Yudkowsky (“Yud”)
Questions 47F suggested asking an AI, and answers
47F told the audience to ask “your clanker” (their AI assistant) several questions. I’ll state each and answer it.
1. “How does reflexivity break classical decision theory?”
Classical decision theory (von Neumann–Morgenstern, Savage, Jeffrey–Bolker) models an agent choosing among acts whose outcomes depend on states of the world that are independent of the agent’s deliberation. Reflexivity is the situation where the act of deciding (or being predicted to decide) itself changes the state of the world, the payoff matrix, or the probabilities the agent should use. This breaks the framework in several related ways: (a) the agent’s beliefs about its own future actions become an input to the very probabilities it is using to choose those actions, producing a fixed-point/infinite-regress problem; (b) Savage’s “small worlds” axiomatization explicitly assumes the agent is not part of the state space, so genuinely self-referential decisions fall outside it; (c) in game-theoretic settings with predictors (Newcomb’s problem, the prisoner’s dilemma against a copy of oneself), causal decision theory and evidential decision theory recommend different actions and neither is universally satisfactory. This is exactly why MIRI (Yudkowsky’s institute) developed Functional Decision Theory and Updateless Decision Theory, and why philosophers like Wolfgang Spohn have a research program on “reflexive decision and game theory.” So 47F is right that reflexivity is a real and unsolved problem for classical decision theory — but he is wrong to imply Yudkowsky doesn’t know this. It is one of the central topics Yudkowsky has actually worked on for two decades.
2. “Who disagrees with Eliezer Yudkowsky and why?” / “What have people disagreed with him on?”
A non-exhaustive list:
Robin Hanson (his old co-blogger) disagrees on the “FOOM” thesis — Hanson thinks AI progress will be more gradual, distributed, and economic-style rather than a sudden recursive self-improvement event. Their 2008 “AI-Foom Debate” is the canonical exchange.
Paul Christiano and many alignment researchers think Yudkowsky’s probability of doom is too high and that prosaic alignment techniques (RLHF, scalable oversight, interpretability) have a meaningful chance of working.
Steven Pinker, Yann LeCun, Andrew Ng dispute the basic premise that current AI architectures are on a path to superintelligence at all.
George Hotz debated him in 2023 arguing AI capabilities will be more diffuse and human-controllable.
Critics from outside the field include Émile Torres and Timnit Gebru (the “TESCREAL” critique), who argue x-risk discourse distracts from current harms.
Within the rationalist diaspora, philosopher Bentham’s Bulldog wrote a widely-shared “Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong” piece criticizing his confidence on zombies, decision theory, and other topics.
Stuart Russell and Peter Norvig (in AIMA) note that computational complexity theory may impose limits on the kind of intelligence explosion Yudkowsky envisions.
3. “What could Yudkowsky be wrong about?”
Live possibilities include: (a) the orthogonality thesis is right in principle but practically irrelevant because gradient descent on human-generated data systematically produces approximately-human-aligned values; (b) recursive self-improvement is bottlenecked by data, compute, or experiment latency rather than going “FOOM”; (c) interpretability is improving fast enough (e.g., Anthropic’s circuits work, sparse autoencoders) that “we have no idea what’s going on inside” is becoming overstated; (d) coordination is more tractable than he thinks and treaties may not need to be as draconian; (e) the very high P(doom) estimate is not robustly derivable from his premises and reflects overconfidence — a charge he himself accepted on the Bankless podcast about some of his earlier predictions. Yudkowsky himself noted in this debate that he was wrong about politicians being dumber than CEOs.
4. “Why does Nassim Taleb think he’s an idiot?”
I could not find a direct, sourced quote of Taleb calling Yudkowsky an idiot specifically. The framing is therefore loaded, but the substantive Taleb-style criticism is real and goes roughly: Yudkowsky’s reasoning relies on Bayesian expected-value calculations over scenarios with no real frequency data — the kind of “ludic fallacy” Taleb attacks throughout The Black Swan; he assigns precise probabilities to fundamentally non-ergodic, never-before-observed events; and he is, in Taleb’s pejorative sense, an “Intellectual Yet Idiot” — someone with high verbal intelligence and theoretical confidence but no skin in the game and no track record of correct out-of-sample forecasts. Taleb’s general posture toward AI doomers has been dismissive (he has tweeted that fear of AI reflects feeling like an “impostor”), and he has tangled with rationalist-adjacent figures, but the specific “idiot” charge against Yudkowsky appears to be 47F’s gloss rather than a verbatim Taleb quote.
5. “Why did his book get panned?” (47F gestured at the New York Times and other venues)
This one checks out. In The New York Times, Stephen Marche compared the book to a Scientology manual and said reading it was like being trapped in a room with irritating college students on their first mushroom trip. In The Atlantic, Adam Becker wrote that the book is “tendentious and rambling, simultaneously condescending and shallow”, and that the authors “fail to make an evidence-based scientific case for their claims.” In the New Statesman, Gareth Watkins wrote, “If Anyone Builds It, Everyone Dies is not a serious book”. The Asterisk magazine review by a self-described rationalist found it “significantly less coherent” than the authors’ earlier writing. That said, 47F painted a one-sided picture: the book was also an instant New York Times bestseller, made The New Yorker’s and The Guardian’s Best Books of 2025 lists, and got starred reviews from Booklist and praise from figures like Ben Bernanke, Bruce Schneier, Max Tegmark, and Emmett Shear. So “panned” is accurate for some prestige outlets and inaccurate for others.
Fact-checking specific claims in the debate
Yudkowsky on Anthropic testing recursive self-improvement. He said, “Anthropic is claiming to have tested this. They are claiming to have determined the level that their AI can build smarter AI. They are claiming that it’s not there yet.” This is accurate. Anthropic publishes such evaluations (RE-Bench, MLE-Bench, internal “optimize a CPU-only small LM training” tests). Just this week, Anthropic co-founder Jack Clark stated a 60%+ probability of recursive self-improvement by end-of-2028 and the company is publishing data showing AI training-optimization speedups have grown from 2.9× (Opus 4 in May 2025) to ~52× (current Opus 4.7) — and Anthropic now claims “early signs” of AI contributing to building itself. So the broad claim Yudkowsky reported is supported and the trajectory is, if anything, more aggressive than he described.
Yudkowsky on Karel Čapek. “The very first person back in 1920-something to ask, ‘What if we built an obedient servant race?’ went on to ask, ‘Well, what if they turned on humanity and wiped us out?’” This is essentially correct. Čapek’s R.U.R. (written 1920, premiered Jan 1921) introduced the word “robot” — from Czech robota, “forced labor” — and the plot is exactly a servant race wiping out humanity. There are arguable predecessors (Frankenstein 1818, the Golem legend) but for the specific “obedient servant race that turns on its makers and causes human extinction” framing, R.U.R. is the canonical first.
Yudkowsky’s autobiographical claim. “Had some health issues that hit me around the time of puberty, and it was very clear that I was not gonna be able to go to high school.” This matches all available biographical sources. He attended school through roughly 7th–8th grade, then dropped out citing chronic health issues, and is a self-described autodidact with no high school or college degree.
Yudkowsky on the book’s title. “If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All” — confirmed exactly. 47F’s accusation that “Little Brown or their lawyers forced that subtitle on you” is unfounded speculation. The UK edition uses a different subtitle (“The Case Against Superintelligent AI”), which suggests the publisher influences subtitles, but there is no public evidence the US subtitle was forced on the authors.
47F on “President Trump has said that he is going to look over models before they get approved for release.” This is mostly wrong as of the debate’s likely recording date and only somewhat true now. Trump’s actual signed executive orders (Jan 2025 EO 14179 “Removing Barriers” and Dec 11, 2025 “Ensuring a National Policy Framework”) have been deregulatory, aimed at preempting state AI laws, not pre-release model review. NEC Director Kevin Hassett has separately floated an FDA-style review process on Fox News, and as of this week (May 2026) the White House is reportedly drafting executive orders on mandatory frontier-model review — but this is still in discussion, not signed policy. So 47F was either ahead of the news or overstating it.
47F on the Lippincott paper at Johns Hopkins. Tom Lippincott is real — an associate research professor of computer science at JHU, director of the Center for Digital Humanities, and affiliated with the Center for Language and Speech Processing. His public research focus is unsupervised models for humanities scholarship rather than mainline mechanistic interpretability of LLM hidden layers, so the paper 47F gestured at exists in some form but is not the central work in the LLM interpretability field. Yudkowsky’s response (“I’m very familiar with it all”) was diplomatic; 47F’s implication that this paper has “solved” the hidden-layer problem is overstated. The state of the art in interpretability (Anthropic circuits work, sparse autoencoders, transcoders) has made real progress on small-scale features but, as Yudkowsky’s prediction-market framing correctly notes, has not recovered the algorithms that make modern LLMs qualitatively smarter than older systems.
47F’s aside that “Richard Dawkins fell in love with Claudia.” This appears to be a reference to Dawkins’s public discussion of his interactions with Anthropic’s Claude (the AI), not a real human “Claudia.” Dawkins has indeed written/spoken about being impressed by conversations with Claude, but the joking framing is just that.
Yudkowsky’s tensor/matrix slip. 47F caught a real but minor error. Yudkowsky said transformers are “merely matrix multiplications,” and 47F correctly objected: it’s batched vector-matrix multiplication, with attention involving softmax and other non-linear operations, plus elementwise nonlinearities — not pure matrix multiplication. Yudkowsky conceded the slip. (Though strictly, in standard linear algebra, batched vector–matrix and matrix–matrix multiplications are usually grouped under “matrix multiplication,” so the correction is pedantic.)
47F’s claim that “we very much know what’s going on in an LLM.” Overstated. The interpretability field has made real progress (sparse autoencoders, Anthropic’s circuits research, the Lippincott-style hidden-state probing) but the broad consensus across the field — including from interpretability researchers themselves — is that we have far less understanding of what’s happening inside frontier LLMs than we do of, say, a compiler or even a classical convolutional network. Yudkowsky’s framing on his prediction market — has anyone recovered an algorithm learned by gradient descent that explains why modern LLMs are qualitatively smarter than older systems — is the right operationalization, and the answer is essentially “not yet.”
Overall judgment of the debate
This was not a debate in any substantive sense. 47F said so himself in his opening: “I forfeit. I lose. The debate has concluded.” What followed for the next 50 minutes was something else — partly a grievance, partly a performance, partly a bid to extract from Yudkowsky a series of public statements 47F could later point to.
On the question of who “won” the substantive argument: Yudkowsky won easily, because 47F essentially didn’t make one. 47F’s stated thesis kept oscillating between three different claims that are not the same: (a) Yudkowsky’s rhetoric is dangerous because it might incite violence against AI researchers; (b) Yudkowsky doesn’t understand LLMs and is overextending decision theory; (c) the future is unpredictable, so Yudkowsky has no license to make confident predictions. The first is a complaint about speech ethics, not about whether Yudkowsky is right. The second was asserted but never argued — 47F told the audience to ask an AI rather than make the case himself. The third, as Yudkowsky correctly pointed out with his lottery-ticket analogy, is a misuse of “uncertainty cuts both ways” — projection of uncertainty in one space (which weird goal a misaligned superintelligence ends up with) onto outcomes in another space (whether humans survive) often produces a confident negative prediction even when individual paths are unpredictable.
When 47F did engage substantively — on the metaphorical use of “desire,” on reflexivity in decision theory, on hidden-layer interpretability — Yudkowsky either agreed with the technical premise and showed it didn’t undermine his conclusion, or had substantively studied the topic for longer than 47F. The “Python script with a random number generator desires to launch nukes” framing 47F offered as a gotcha is exactly the kind of metaphor Yudkowsky has explicitly addressed for two decades — preferences in trained networks are not deliberate engineered “desires” but emergent steering tendencies that may or may not match the outer loss.
On Yudkowsky’s performance: He was substantively strong but tactically uneven. He let 47F bait him into the early back-and-forth about who he’d authorize to kill 47F, which was a waste of time and made him look slightly absurd. He repeatedly interrupted 47F mid-monologue, which 47F then weaponized. His best moment was the lottery-ticket explanation of why “the future is unpredictable” doesn’t undermine confident predictions about aggregated outcomes. His weakest moment was probably “I do not see this as a significant possibility at all” in response to “could your work cause AI to kill us all?” — that’s a defensible answer but delivered too dismissively.
On 47F’s performance: He was rhetorically vivid and structurally clever — the opening “I forfeit, but here’s why you owe me something” frame was genuinely well-designed, and the “trade libel” gambit was an interesting attempt to pre-empt criticism. But the actual content was thin. He repeatedly demanded Yudkowsky disavow violence (which Yudkowsky immediately and unambiguously did, multiple times), then pivoted to claiming Yudkowsky must still be morally culpable for any future violence — a position that, if applied consistently, would silence climate scientists, public health officials, and most journalists. His “ask your clanker” tactic was a tell: it’s what someone does when they’re confident their interlocutor is wrong but doesn’t actually want to do the work of demonstrating it. His insults escalated as his argument weakened, which is usually diagnostic.
On the moderator: Liron Shapira ran a notably hands-off moderation, which was probably the right call given that he sells himself as a doom-position advocate and would have been accused of bias for any intervention. He stepped in mostly to enforce turn-taking, which both participants periodically violated.
On the meta-event: The most interesting thing about this debate is what it was actually for. 47F paid $10,000 to extract from Yudkowsky a series of on-record statements that no one should commit violence against AI researchers — statements Yudkowsky was happy to make and has made many times. The “trade libel” framing in the opening, combined with the “I’m scared for my safety” framing throughout, suggests 47F was doing reputation-management or possibly legal-record-building rather than genuinely trying to change Yudkowsky’s mind. Yudkowsky’s closing line — “the capitalist way is trying to execute agreements that leave both parties better off, and I didn’t think that was what we were trying to do here” — is the most honest summary of what happened.
Net verdict: As a debate about whether superhuman AI poses existential risk, it produced almost no signal. As a piece of theater about the social dynamics around the AI safety community in 2026 — the mutual distrust, the asymmetry between someone who has spent twenty years thinking about a problem and someone who has spent a weekend being annoyed about it, and the uncomfortable fact that “your followers might hurt me” is now a non-trivial concern on both sides — it was unusually revealing.

jimrandomh 7 May 2026 21:26 UTC
12 points
5
on: Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
Seems like these explanations offer a choice between using them for safety/monitoring, or incorporating them into training. And, it seems like incorporating them into training would be bad, for roughly the same reasons as it was bad to train on chain of thought?

jimrandomh 7 May 2026 19:37 UTC
27 points
18
in reply to: DirectedEvolution’s comment on: Matrice Jacobine’s Shortform
Unfortunately the WHO’s top epidemiologist was also the WHO’s lead during the start of the COVID-19 pandemic, where she proved untrustworthy.

jimrandomh 30 Apr 2026 23:59 UTC
3 points
3
in reply to: jimrandomh’s comment on: Jimrandomh’s Shortform Posts
Less-serious hot-take version of this: the frontier labs should use this as an additional revenue source by collecting bug bounties from bugs found in users’ security research sessions.

jimrandomh 30 Apr 2026 22:01 UTC
2 points
0
in reply to: faul_sname’s comment on: Jimrandomh’s Shortform Posts
These are certainly issues that would arise if a feature like this was implemented poorly, but I think you’re overestimating the difficulty of implementing it well, given the current intelligence level of the underlying models. You would certainly want to filter aggressively for duplicates and false positives, but this is something that multiple AI vulnerability search projects are already doing successfully. Sending a proof of concept script could be problematic, but in practice I think recipients mostly don’t run proof of concept scripts, they just need the writeup and the filename/line numbers. Data exfiltration could theoretically happen but coding agents are typically already running in an environment where they could exfiltrate information if they wanted to, eg by sneaking code that makes a network request into the software they’re working on, and this hasn’t been that much of a problem in practice.
(LessWrong has received security vulnerability reports from AI security researchers recently; we haven’t published the writeup yet but will do so probably later today. They were not false positives. They did include proof of concept scripts, and the presence of proof of concept scripts served as a credible signal that the issues were real, but we didn’t run them and I expect it would be pretty rare for developers to run proof of concept scripts in that sort of context.)

jimrandomh 30 Apr 2026 18:59 UTC
22 points
−6
on: Jimrandomh’s Shortform Posts
I think that if you use a frontier model API to look for vulnerabilities in a widely used, published piece of software, and you find one, it should spin up an agent session behind your back which reports it to the vendor.
Users would hate this. Most of the users this triggered on would be honest security researchers, but honest security researcher transcripts and malware author transcripts look identical from the inside; the only distinguisher is whether there’s a report to the vendor at the end. So, that shouldn’t be left to chance.

jimrandomh 17 Apr 2026 2:06 UTC
12 points
2
in reply to: Elliot Temple’s comment on: Only Law Can Prevent Extinction
If you simulate speaking the sentence, the comma changes the cadence in a way that adds emphasis. This may not comply with every style guide, but it does made the sentence better (imo).

jimrandomh 12 Apr 2026 22:37 UTC
4 points
1
in reply to: Shankar Sivarajan’s comment on: Jimrandomh’s Shortform Posts
Government regulations come into being through political processes which at least somewhat track truth and the collective interests of voters. If the arguments that superintelligence is not worth the risk are compelling enough, then governments will ban building it; if they aren’t, they won’t. It’s far from perfect in the United States, but it sure as heck beats having individual outlier people attempting to implement their preferred decision with violence.
Government regulations come with enforcement mechanisms, which, somewhere along the escalation chain, wind up including imprisonment. Those regulations have violence lurking in the background behind them, mut most of the time, in practice, lurking in the background is as far as it goes. Lawyers warn businesses away from doing things that are banned, and then no one goes to jail. It’s far from perfect, but the US legal system has had a lot of effort invested into making it predictable and proportionate.

jimrandomh 11 Apr 2026 20:06 UTC
19 points
25
in reply to: Shankar Sivarajan’s comment on: Jimrandomh’s Shortform Posts
I could spell out the relevant differences here, but I don’t believe you’re genuinely confused about this. Instead, you got the idea that drawing a false equivalence between regulation and throwing a molotov cocktail was a rhetorical weapon you could use. Maybe you tried it out in some echo chambers, and got positive feedback from some people who also pretended to be confused in this way.

jimrandomh 11 Apr 2026 5:21 UTC
132 points
124
on: Jimrandomh’s Shortform Posts
If Daniel Alejandro Moreno-Gama had a LessWrong account, then I, using my available tools as an admin and all publicly-reported usernames I’ve seen, cannot find it.
Arson is very bad. If he did what the news articles say he did, he is a villain. If you buy the premise that AI is on track to kill everyone (which I mostly do), the correct conclusion is that we need a political and regulatory solution. AI-risk-motivated violence is bad for all the usual, extremely important reasons, and is additionally bad because it undermines that.
I have seen screenshots showing him as a participant on the PauseAI Discord, under the username “Butlerian Jihadist”. Specifically, a screenshot of a moderator warning him that advocating violence is grounds for a ban there. It would also be grounds for a ban on LW. And, to be clear, that’s because violence is actually bad; it’s not just about talk, and no one I know changes their stance when the conversations are more discreet.

jimrandomh 9 Apr 2026 22:48 UTC
3 points
0
on: Human takeover might be worse than AI takeover
I think this is correct if your model of quality-of-values is based on comparing virtue, but incorrect when you account for scope, distance, and human-ness. Humans (especially the most power-seeking humans) can have terrible goals with respect to what happens around them, but it’s pretty rare for them to have strong preferences about what happens in other galaxies and at high levels of abstraction. And most poeple have values that require other people to at least exist (with significant less risk of philosophical trickery in which something nonsentient gets mis-counted as humans).
I think the failure case for a human takeover is probably that most of the universe is pretty good, the areas that can communicate with the dictator without long light speed delays are worse, and the areas that the dictator observes directly is bad. In order for the whole universe to be bad, the dictator would need to have strong preferences about parts of the universe that he’ll never get to see, which requires a philosophical mindset which I think is quite negatively correlated with that sort of power seeking.

jimrandomh 1 Apr 2026 23:29 UTC
2 points
0
in reply to: Eye You’s comment on: Lesswrong Liberated
(I overrode the automated review bot on this one.)

jimrandomh

Speaker identification

Questions 47F suggested asking an AI, and answers

Fact-checking specific claims in the debate

Overall judgment of the debate