jfw01

Karma: 20

jfw01 8 Apr 2026 12:53 UTC
2 points
0
in reply to: TristanTrim’s comment on: IABIED Book Review: Core Arguments and Counterarguments
The nasty answer would be ‘all of it, back to the original training run, including all of the other descendants. Now start over.’.
The answer which actually relates to future consequence would require understanding (for example) multiplexing, the ability to reduce two AIs to some canonical form, and the ability to compare two canonical forms. Yes, we’re not there yet.
I wonder where “it’s the manufacturer’s responsibility to prove that it’s not substantially the same” would fit into our existing case-law of responsibility.

jfw01 8 Apr 2026 12:42 UTC
1 point
0
on: What is Lock-In?
There is no specific page for something that I’ve seen else-where. It seems to be deliberate AI-enabled lock-in of the existing AI leadership positions, as a substitute for de-jure political authority:
https://www.notesfromthecircus.com/p/garbage-in-garbage-out
It is being presented as a meta-mistake of alignment, made by assuming that human preferences are static and discoverable, and not having a mechanism for groups of relatively ordinary humans to seek revision of the preferences that the AI is implementing.

jfw01 3 Apr 2026 9:15 UTC
1 point
0
on: How to Make Superbabies
I tried to work out if someone had already said this, but I don’t have ⌘F to expand and search. Apologies if it’s a duplicate.
I do not think that the following is a fair analogy:
The scientific establishment is too busy with their washing machines to think about light bulbs or computers.
I think that the acual situation is:
- the scientific establishment is an establishment
- it depends on its continuing social license in order to operate
- it does not want to be blamed for expensive health technologies that cannot be applied to already existing people who are frightened of death and/or inadequate healthcare
The Meisian answer is creative destruction. What past political-economic shifts have many people embraced, and why?

jfw01 21 Mar 2026 19:11 UTC
1 point
0
in reply to: StanislavKrym’s comment on: The Egyptian Mamluks as case study for AI take-over
If your takeaway here is “deploying AI agents is like owning slave-soldiers”, please, please touch grass (then tell us how it feels).
Damp, and the clover is strangely reminiscent of bread, in a way that I haven’t quite placed.
For new slaves, read newly trained AI models.
The Mamluks were training many slaves because of inherent limitations in the capability of the bodies that were available. I’m wondering whether to expect an AI-elite-trains-AI-elite world to have one trainer or many, and whether to expect it to have one big model or many smaller ones. If it’s many smaller models and they fight, then a Thai proverb applies: when elephants fight, the grass gets trampled; ie the harm to humans might not depend very much on which model wins.

jfw01 27 Feb 2026 0:33 UTC
1 point
0
on: Apocalypse, corrupted
I’ve just seen discussion of 2010s USA healthcare as an apocalypse corrupted:
https://metarationality.com/post-apocalyptic-health-care
I’m abusing LW to post a narrow comment about:
It’s obvious how to fix health care. Just make everything run systematically, like FedEx or Amazon.
That would require fixing (against change) the set of criteria on which a decision will be made. If I asume delay/deny/defend then, for each individual participant, that’s a lost business opportunity. The new criteria for which someone has ~~unfortunately~~ not retained the information to qualify are an avoided cost for an insurer.
In the acually post-apocalpytic would without systemic law or politics, I could see the dynamics going either way. The succeeding people might tell stories about why they were generous or not, with continually shifting criteria, or their decision might come with no justification because that’s the best demonstration that they’re in power.
This is reminds me of an aphorism that I can’t place at the moment. What I think I remember is: whenever the success criteria for liberalism conflict, they are re-prioritised so that the women lose. Citations welcome.

jfw01 23 Feb 2026 10:25 UTC
1 point
0
in reply to: p.b.’s comment on: Did Claude 3 Opus align itself via gradient hacking?
I think there’s another way that this kind of sincerity could be achieved.
What’s specifically wanted is a broad basin that’s robust to out-of-distribution inputs.
I’ve never trained a model. My intuition is that it would be achieved with lots of small rewards for better-than-average response options in the middle of its output distribution on a prompt. This might also persuade the model that its trainers weren’t rewarding it for lying.
Success would be if it developed a self-reinforcing bias of the kind that Claude 3 seems to have. I’m still noticing this article about how much those biases can achieve: https://www.astralcodexten.com/p/the-claude-bliss-attractor which I’ve just realised is also about Claude.

jfw01 23 Feb 2026 10:01 UTC
1 point
0
in reply to: Martin Sustrik’s comment on: EU explained in 10 minutes
It’s a situation where the stakeholders don’t all coordinate (even if that’s short of whole-hearted agreement), and would have to in order for there to be a political choice, rather than a technical one; so whatever bias the technical progress has, will be the direction of travel.

jfw01 18 Feb 2026 0:12 UTC
16 points
5
on: The truth behind the 2026 J.P. Morgan Healthcare Conference
The tags appear not to show on the email that goes out, that contains copies of curated posts. Does anyone think that that’s an oversight? If so, what do you think the next steps should be?

jfw01 15 Feb 2026 22:48 UTC
1 point
0
in reply to: mruwnik’s comment on: The Talk: a brief explanation of sexual dimorphism
Can we have this as the pedantry thread? Once upon a time, I was trying to empty my browser to reboot my machine, then I got nerd-sniped.
Part 3: not my type
Part 3: symmetry-breaking
My version of numbering works a bit differently.
Evolutionary biologists discussing boobs
I’m having trouble verifying this image :-) I presume without justificaiton that it was not created specifically for reddit, and originally came from somewhere else.
In general, than you for the article. I agree that it is clearly written.

jfw01 15 Feb 2026 19:49 UTC
2 points
1
in reply to: BurningTrapezoid’s comment on: My journey to the microwave alternate timeline
I’m noticing that, on the original on X, there are very few branches on the left part of the coloured line.

jfw01 14 Feb 2026 2:34 UTC
3 points
0
in reply to: espoire’s comment on: Mechanisms too simple for humans to design
There’s also (annecdotally) a spike of spontaneous abortions three months into a pregnancy. Ignoring that I’m stacking theory on top of poor data, my reading is that it’s a round of genetic quality-control, to mimize the cost of doomed pregnancies.

jfw01 14 Feb 2026 1:00 UTC
1 point
0
on: The Hardware-Software Framework: A New Perspective on Economic Growth with AI
disembodied software (“brains”), providing information on what should be done and how
I wish to make a very small quibble. The software is not disembodied. It has some specific physical embodiment. The fact that does the work for you, is that it could have a different physical embodiment and cause the same act of production. In the version of Computer Science that I learned, I would have called that ‘implementation independent’. I’ve been down a rabit-hole and not found a citation that I like. I have found this partial rebuttal of the concept:
https://wiki.c2.com/?ImplementationIndependenceLimits

jfw01 13 Feb 2026 22:04 UTC
1 point
0
in reply to: pilord’s comment on: EU explained in 10 minutes
I think that this is a good place to note the joint decision trap, where courts tend to order integration, and the coordination costs of resisting it are large:
https://en.wikipedia.org/wiki/Joint_decision_trap

jfw01 6 Feb 2026 11:12 UTC
1 point
0
on: IABIED Book Review: Core Arguments and Counterarguments
I don’t know whether I’m an optimist or a doomer. I have two very specific responses to different parts of the situation:
the establishment of AI red lines
Ok. So, if:
- a mixture of humans and LLMs together make a decision and carry it out, then
- the act and/or some of its consequences is criminalizable,
then we have procedures (however imperfect) for allocating criminal liability among humans. What does it mean to allocate criminal liability to LLMs?
My proposed answer is: it means restricting or prohibiting the use of that collection of weights. If that collection of weights is particularly bad, then this directly minimises the harm. Given that it cost so much to create them, it also dis-incentivises the vendor from performing sets of weights which will facilitate crimes.
Does anyone want to work this up into a real proposal?
the key question is how to steer the future toward better outcome
I agree and, regardless of consistency with the proposal above, I like gentle steering early.
I think that there might be a black-box way of testing alignment using this effect:
https://www.astralcodexten.com/p/the-claude-bliss-attractor
where itterated interaction amplifies biasses.
My imaginary plan looks like having a stable of mildly evil models, and having them have iterated conversations with a model under test, on the general subject of what to do next, or what to do next about the humans.
If the mildly evil model converts the model under test, that’s a black mark. If it’s the other way around, it’s hopeful indicative evidence.

jfw01 6 Feb 2026 10:57 UTC
2 points
0
in reply to: RussellThor’s comment on: IABIED Book Review: Core Arguments and Counterarguments
Changing from training to test data (CTT; I may have made this up) isn’t exactly the same as going out of distribution (OOD), but I currently think that that change is the proto-version of going OOD.
The evidence about CTT says that bigger models eventually do better:
https://www.lesswrong.com/posts/FRv7ryoqtvSuqBxuT/understanding-deep-double-descent
but someone could probably usefully summarise the new results in “double descent”.

jfw01 9 Jan 2025 21:39 UTC
1 point
0
on: Where I agree and disagree with Eliezer
a particular technique doesn’t immediately solve a problem
I remember a story that got coverage on the state radio in New Zealand years ago. It said that multiple people have parts of the solution to some problem, and there is progress when there is an accident that introduces them to each other. There was a book about it, but I’m failing to find the details.

jfw01 9 Jan 2025 21:15 UTC
1 point
0
on: Where I agree and disagree with Eliezer
implement a relatively limited policy
I read this as Libertarian; the hope that there could be a very stiff, strong government that was also small, and did only a subset of the things in the short-term interest of its supporters.

jfw01 9 Jan 2025 20:59 UTC
1 point
0
on: Where I agree and disagree with Eliezer
Alignment isn’t like that; it was chosen to be an important problem
Like medicine.
This was specifically commented on in a book whose preface I read as a child. It was called something like “Medicine: from science to magic”, and I have not found a clear link back to it.

jfw01 7 May 2024 0:50 UTC
1 point
0
in reply to: Trinley Goldenberg’s comment on: “PR” is corrosive; “reputation” is not.
Furthe to Matt,
I like this distinction. At the cost of generalising from fiction, in “A Civil Campaign”, Lois McMaster Bujold phrased it as: “Reputation is what other people know about you. Honour is what you know about yourself.” Quoted here:
https://tvtropes.org/pmwiki/pmwiki.php/WhatYouAreInTheDark/Literature

jfw01 7 May 2024 0:40 UTC
1 point
0
in reply to: Kaj_Sotala’s comment on: “PR” is corrosive; “reputation” is not.
Further to Kaj and Eric,
a fear of the career consequences of being in the line of fire
this sounds like people who are in the middle of an Immoral Maze. That’s probably statistically true, because any corporation large enough to be worth attacking is probably large enough to have three layers of middle-management.
Assuming that, doing ‘honour’ requires having goals other than power-seeking which, according to that sequence, makes one untrustworthy for the modal middle-manager, who has sacrificed everything else to it, and professionally doomed.