johnswentworth

Karma: 63,306

johnswentworth 28 Apr 2026 22:47 UTC
6 points
0
on: Causal inference diary: skiing causes snow
Try cranking up the number of data points. The spacing between the four equivalent best DAGs and the next best should increase ~logarithmically as the number of data points increases.

johnswentworth 26 Apr 2026 17:21 UTC
11 points
10
on: The paper that killed deep learning theory
This post was useful to me! I’ve heard people talk about this paper a lot, but I never quite understood why people were so interested in it. By the time it came out, I had already long considered statistical learning theory basically-useless in practice, and I already knew (from Jaynes) that overparameterized systems can generalize just fine if you do the full Bayesian math. But I hadn’t realized that this paper specifically hit people over the head with facts in that general cluster.

johnswentworth 21 Apr 2026 16:48 UTC
9 points
0
in reply to: Roman Malov’s comment on: johnswentworth’s Shortform
I mean, it easily could be, that would not be a huge surprise. But it was originally generated as a stylized description of actual experiences I’ve had with Claude. For instance, a week ago I asked it for data on the biggest ETFs by trade volume, and it just hardcoded some numbers into a file without actually looking anything up.

johnswentworth 21 Apr 2026 1:13 UTC
32 points
2
on: johnswentworth’s Shortform
But they were, all of them, deceived, for another csv file was made. In the subfolder of a subfolder, the LLM hardcoded a bunch of data, and never wrote code to pull the actual values. And into this csv file the LLM poured vaguely reasonable-sounding numbers, which unfortunately did not match the real world. One subtle csv file of made-up data which invalidated all of the complicated calculations and projections.

johnswentworth 17 Apr 2026 1:47 UTC
4 points
0
in reply to: Jonas Hallgren’s comment on: Specialization is a Driver of Natural Ontology
Do you think this is a good argument for multi-scale modularity in biology?
I think the multi-scale aspect is mostly orthogonal to this argument, but yeah the argument could be applied at multiple scales.
Also thoughts on multi-agent models of mind with this model in the background? Finally any thoughts on applying this to whether we will have a group of AI systems doing RSI or a singular one?
Typically, the load-bearing aspect of multi-agent models is that the multiple agents have different utility functions. This argument is compatible with multiple utility functions in principle, but that’s not really what it’s about; it would apply arguably-more-cleanly in systems with just one optimization objective (but multiple subgoals, e.g. multiple resources like apples and bananas).
I guess there also has to be the aspect of trade already present in the system?
Nope.
(The other questions I don’t yet have much to say about.)

johnswentworth 16 Apr 2026 21:45 UTC
3 points
0
in reply to: jelly’s comment on: Specialization is a Driver of Natural Ontology
Fixed. Thanks, and sorry for the confusion.

johnswentworth 16 Apr 2026 21:45 UTC
2 points
0
in reply to: Mateusz Bagiński’s comment on: Specialization is a Driver of Natural Ontology
Fixed. Thanks.

johnswentworth 15 Apr 2026 15:33 UTC
24 points
2
in reply to: TsviBT’s comment on: TsviBT’s Shortform
I read an article in Fortune magazine twenty years ago about this, for blood gold. According to the story, the industry had so many layers of middlemen that it was impossible in practice to figure out where any given gold came from. The big change was when Walmart decided they wanted to offer clean gold products. They’re such a large buyer that they could negotiate for source tracking through the whole chain, and it was worthwhile for suppliers to put that tracking in place.
… though it’s not not a puff piece for Walmart, so take with a lot of salt.

johnswentworth 1 Apr 2026 17:28 UTC
3 points
0
on: Lesswrong Liberated
Facebookish
In the style of facebook’s feed, including a high proportion of randomly generated ads.

johnswentworth 29 Mar 2026 17:12 UTC
12 points
3
on: Parkinson’s Law of Worry
This is a short but decent articulation which strongly agrees with my own observations; good job.
With my ex, it was a common-knowledge-to-the-two-of-us phenomenon that she would always find something to worry about. So when she felt worried or anxious about relatively minor things, I would sometimes jokingly respond “Oh good! If you’re worrying about that then there must not be any serious problems right now.”.

johnswentworth 27 Mar 2026 1:29 UTC
71 points
17
on: My hobby: running deranged surveys
How much money did these cost?

johnswentworth 12 Mar 2026 13:47 UTC
37 points
16
on: AI for Agent Foundations etc.?
David tries to punt things to LLMs at least once a day on average when we’re working. So far, they continue to work best when they can act as Google Search Plus Plus—i.e. when there’s some already-known fact relevant to what we’re doing, and they surface that fact to us. Occasionally they can complete a conceptually-simple-but-technically-dense proof by combining a few such facts, and very often they can write some useful code by combining a few already-known pieces.
For anything novel, they remain almost always useless in our experience; they string together words which sound relevant but the semantics don’t make any sense.
What links here?
- StanislavKrym's comment on ‘Human Slop’ and a Captive Audience: Why No Book will Ever Have to Go Unread Again by Savannah Harlan (12 Mar 2026 15:52 UTC; 3 points)

johnswentworth 6 Mar 2026 15:25 UTC
7 points
3
on: Playing Possum: The Variability Hypothesis
The evolutionary explanation I thought was standard for sex differences in variability is that males (of most species) face way more potential upside—i.e. they can potentially mate with a whole lot of different females and have far more kids than any single female can. Females typically have a much lower ceiling on kid-count, so they face less potential fitness upside from high-variance strategies.
I liked the part of this post about X-inactivation a lot, even though it turned out wrong. It would be really interesting if most of the variability difference across sex turned out to share one simple mechanistic basis, though that doesn’t seem very likely on my current models of the underlying selection pressures.

johnswentworth 3 Mar 2026 21:24 UTC
5 points
5
in reply to: Mordechai Rorvig’s comment on: Question: Why is the goal of AI safety not ‘moral machines’?
Quite the opposite: the subject-we-gesture-at-with-the-word-”alignment” is not particularly provocative or controversial when you think about it deeply, at least not along the axes people generally argue over in the context of morality/ethics, because those axes just aren’t that technically central or relevant.
Personally, my guess is that morality and ethics themselves would not be particularly controversial or provocative if people usually approached them with a goal of deep technical understanding. That’s just not the goal with which approximately-anybody, including nearly all professional philosophers, approaches the subject—as we see e.g. on that Stanford Encyclopedia page. Those are people trying to have the equivalent of fun house party conversations, or in some cases write manifestos, not people seriously trying to achieve deep technical understanding.

johnswentworth 3 Mar 2026 19:29 UTC
19 points
5
in reply to: Mordechai Rorvig’s comment on: Question: Why is the goal of AI safety not ‘moral machines’?
Indeed, invoking the words “good” or “right” also tend to make people dumber (though less so than “morality” or “ethics”), and trying to do philosophical analysis of what is “good” or “right” is exactly the thing which seems to insta-brain-kill people; it’s exactly the lever which “morality” and “ethics” pull.
For example, let’s look at two pages in the Stanford Encyclopedia of Philosophy. I picked these by pulling up the table of contents, and then clicking the first one which seemed not-very-morality-loaded and the first one which seemed very-morality-loaded.
First up, abduction. No morality talk here. It’s describing a feature of human reasoning, which seems functionally load-bearing for epistemics in some cases and would probably generalize to other kinds of minds (like aliens or AI). It doesn’t trivially fit a couple common frames of epistemics, which is why it’s interesting. A lot of the discussion is centered around pretty narrow or outdated models of reasoning, but it’s a technically interesting and sensible article, which inspires good questions at least.
In contrast, the ethics of abortion. Before we even get to the actual content, note the topic. Abduction is a topic relevant to understanding minds and reasoning in general, a topic which would likely be relevant even to AIs; it belongs in a generalizable world-model. Abortion, by contrast, would be irrelevant to many other kinds of minds—e.g. human-level-intelligent platypuses would lay eggs, and therefore the whole issue of abortion would not have a clean analogue for them. (And human-level intelligent ants would be in a whole different frame!) Almost certainly, the reason why a Stanford Encyclopedia page exists for abortion at all is that it was a major hot-button political topic for a while, which won the memetic competition for attention in US politics. But in the grand scheme of things, it is just not that important of a question at all even for humans, and entirely irrelevant to many other kinds of minds. The very fact that people pay so much attention to it is itself a strong sign of mindkill.
Looking at the content of the page… the entire thing is a string of analogies and attempts to generalize various heuristics to the case of abortion. Notably sparse or absent is:
- Technical engagement with the developmental process, when various things come online for a fetus/baby (like e.g. pain, self-awareness).
- Technical engagement with the way humans’ preferences/values actually typically form. Spoiler: it ain’t usually by thought-experiments involving a violinist.
- Technical engagement with the first and second-order actual effects of abortion laws/norms (though laws/norms are of course distinct from morality, consequentialism still matters).
More vibe-ishly, compared to the abduction article, the whole thing very much has a bikeshed vibe to it. It’s all the sort of stuff which would make good fodder for conversation at a house party, not the sort of stuff which involves dense technical study and deep understanding.

johnswentworth 3 Mar 2026 18:36 UTC
1 point
1
on: Question: Why is the goal of AI safety not ‘moral machines’?
I personally avoid even using the words “morality” or “ethics” in the context of AI alignment, because both of those words reliably turn the vast majority of otherwise-sensible people into morons the moment they are spoken.

johnswentworth 2 Mar 2026 17:55 UTC
3 points
0
in reply to: XelaP’s comment on: Towards a Less Bullshit Model of Semantics
Yeah, that would probably capture the intuition, though this isn’t something which really needs precise operationalization in order to be useful.

johnswentworth 2 Mar 2026 17:54 UTC
3 points
0
in reply to: XelaP’s comment on: Towards a Less Bullshit Model of Semantics
That is a cool and relevant domain of research I had not encountered before, thank you!

johnswentworth 2 Mar 2026 17:52 UTC
3 points
2
in reply to: XelaP’s comment on: Towards a Less Bullshit Model of Semantics
Good question, we don’t yet know the answer. It is definitely the right kind of question to ask.

johnswentworth 28 Feb 2026 13:36 UTC
6 points
3
in reply to: Lorxus’s comment on: What‘s in your list of unsolved problems in AI alignment?
My framing has evolved over time; this is a scattered list as opposed to a unified picture. But it’s still basically-right as the motivating list for the bigger picture.
Our own work touches on several of them, but I generally see very few people working on these kinds of problems.