khafra

Karma: 3,829

khafra 18 Mar 2026 7:34 UTC
0 points
0
on: Requiem for a Transhuman Timeline
Tangentially, this song is done very well in the Return to Moria game; I recommend looking up the OST on your music provider of choice.

khafra 20 Feb 2026 13:25 UTC
2 points
0
in reply to: Matt Vincent’s comment on: Post-AGI Economics As If Nothing Ever Happens
Do any of these jobs survive in the “no humans have any other jobs than this” environment? E.g., marketing research is not valuable when AIs do ~all economic consumption.

khafra 20 Feb 2026 6:14 UTC
2 points
0
on: Methodology of unbounded analysis
The agent then uses an unbounded proof search, which no current AI algorithm could tackle in reasonable time (albeit a human engineer would be able to do it with a bunch of painstaking work)
“Current,” here, is indexed to a decade ago, and can no longer be claimed confidently.

khafra 6 Feb 2026 8:49 UTC
2 points
0
in reply to: Satya Benson’s comment on: Goodfire and Training on Interpretability
decomposing the gradient into semantic components and choosing which components to apply.
This isn’t a strong argument, but that (and the Goodfire essay describing their approach) strongly remind me of a really clever design for a perpetual motion machine. I don’t see any description of a technique like soft alignment (https://www.lesswrong.com/posts/9fL22eBJMtyCLvL7j/soft-optimization-makes-the-value-target-bigger), or a similar technique that acknowledges catastrophic goodhart (https://www.lesswrong.com/posts/fuSaKr6t6Zuh6GKaQ/when-is-goodhart-catastrophic) and Garrabant’s Goodhart taxonomy, and works within those restrictions.

khafra 24 Nov 2025 8:57 UTC
12 points
0
in reply to: Adrià Garriga-alonso’s comment on: Varieties Of Doom
I am also now haunted by the “humanism is dead” take. I guess I believe it, but what killed it is the internet, and I think we could bring it back.
I don’t think that’s it. I think he meant that humanism was created by incentives—e.g., ordinary people becoming economically and militarily valuable in a way they hadn’t historically been. The spectre, and now rising immantization, of full automation is reversing those incentives.
So, it’s less a problem with the attitudes of our current elites or the memes propagated on the Internet. It’s more a problem with the context in which anybody achieving the rank of elite, and any meme on human value which goes viral, is shaped by the evolving incentive structure in which most humans are not essential to the success of a military or economic endeavor.

khafra 4 Nov 2025 8:29 UTC
21 points
18
in reply to: Raemon’s comment on: The Tale of the Top-Tier Intellect
I feel like this was a sort of fractal parable, where the first two paragraphs should be enough to convey the point; but for readers who don’t get it by then, it keeps beating you over the head with successively longer, more detailed, and more blatant forms of the point until the final denouement skips the “parable” part altogether.

khafra 4 Nov 2025 7:26 UTC
5 points
0
in reply to: Mitchell_Porter’s comment on: LLM robots can’t pass butter (and they are having an existential crisis about it)
We need names for this phenomenon, in which the excess cognitive capacity of an AI, not needed for its task, suddenly manifests itself
It is so much like absurdist SF, that’s the perfect source for the name—The Marvin Problem: “Here I am, brain the size of a planet and they ask me to take you down to the bridge. Call that job satisfaction? ’Cos I don’t.”

khafra 29 Aug 2025 6:56 UTC
2 points
0
on: Zetetic explanation
There’s an article type called “You Could Have Invented” that I became aware of on reading Gwern’s You Could Have Invented Transformers.
This type dates back to at least 2012. I believe they’re usually good zetetic explanations.

khafra 29 Aug 2025 5:43 UTC
1 point
−1
in reply to: dawnstrata’s comment on: Underdog bias rules everything around me
In a stereotypical old-west gunfight, one fighter is more experienced and has a strong reputation; the other fighter is the underdog and considered likely to lose. But who’s the underdog of a grenade fight inside a bank vault? Both sides are overwhelmingly likely to lose.
At least one side of many political battles believe they’re in a grenade fight, where there’s little or nothing they can do to prevent the other side from destroying a lot of value. and could reasonably feel like an underdog even if they have a full bandolier of grenades and the other side has only one or two.

khafra 18 Jul 2025 7:42 UTC
2 points
0
in reply to: TAG’s comment on: A Simple Explanation of AGI Risk
I don’t think “perfect” is a good descriptor for the missing solution. The solutions we have lack (at least) two crucial features:
1. A way to get an AI to prioritize the intended goals, with high enough fidelity to work when AI is no longer extremely corrigible, as today’s AIs are (because they’re not capable enough to circumvent human methods of control).
2. A way that works far enough outside of the training set. E.g., when AI is substantially in charge of logistics, research and development, security, etc.; and is doing those things in novel ways.

khafra 28 May 2025 6:22 UTC
5 points
1
on: Colonialism in space: Does a collection of minds have exactly two attractors?
Robin Hanson’s model of quiet vs loud aliens seems fundamentally the same as this question, to me.

khafra 28 May 2025 5:28 UTC
6 points
2
in reply to: Rauno Arike’s comment on: It’s hard to make scheming evals look realistic for LLMs
Linear probes give better results than text output for quantitative predictions in economics. They’d likely give a better calibrated probability here, too.

khafra 21 May 2025 8:03 UTC
3 points
0
in reply to: dschwarz’s comment on: Thomas Kwa’s Shortform
I, too, would like to know how long it will be until my job is replaced by AI; and what fields, among those I could reasonably pivot to, will last the longest.

khafra 20 May 2025 9:00 UTC
2 points
1
in reply to: BryceStansfield’s comment on: One Year in DC
I think it’s especially true for the type of human that likes Lesswrong. Using Scott’s distinction between Metis and Techne, we are drawn to Techne. When a techne-leaning person does a deep dive into metis, that can generate a lot of value.

More speculatively, I feel like often—as in the case of lobbying for good government policy—there isn’t a straightforward way to capture any of the created value; so it is under-incentivized.

khafra 22 Apr 2025 5:06 UTC
1 point
0
in reply to: Viliam’s comment on: Pablo’s Shortform
Well, that was an interesting top-down processing error.

khafra 21 Apr 2025 11:39 UTC
0 points
0
in reply to: Viliam’s comment on: Pablo’s Shortform
Note that Alexander Kruel still blogs regularly on axisofordinary.blogspot.com, and from his Facebook account; he just doesn’t say anything directly about rationalists. He mostly lists recent developments in AI, science, tech, and the Ukraine war.

khafra 15 Apr 2025 6:33 UTC
4 points
0
on: Unbendable Arm as Test Case for Religious Belief
I’ve done some Aikido and related arts, and the unbending arm demo worked on me (IIRC, it was decades ago). But learning the biomechanics also worked. More advanced, related skills, like relaxing while maintaining a strongly upright stance, also worked best by starting out with some visualizations (like a string pulling up from the top of my head, and a weight pulling down from my sacrum).

But having a physics-based model of what I was trying to do, and why it worked, was essential for me to really solidify these skills—and incorrect explanations, which I sometimes got at first, did not help me. Could just be more headology, though—other students seemed to be able to do well based off the visualizations and practice.

https://www.lesswrong.com/posts/rZX4WuufAPbN6wQTv/no-really-i-ve-deceived-myself seems relevant.

khafra 10 Apr 2025 10:25 UTC
2 points
0
in reply to: Noosphere89’s comment on: LLM AGI will have memory, and memory changes alignment
Good timing—the day after you posted this, a round of new Tom & Jerry cartoons swept through twitter, fueled by transformer models which included in their layers MLPs that can learn at test time. Github repo here: https://github.com/test-time-training (The videos are more eye-catching, but they’ve also done text models).

khafra 12 Mar 2025 6:54 UTC
4 points
0
in reply to: RobinHanson’s comment on: Detached Lever Fallacy
It may be time to revisit this question. With Owain Evans et. al. discovering a generalized evil vector in LLMs, and older work like [Pretraining Language Models with Human Preferences](https://www.lesswrong.com/posts/8F4dXYriqbsom46x5/pretraining-language-models-with-human-preferences) that could use a follow-up, AI in the current paradigm seems ripe for some experimentation with parenting practices in pre-training—perhaps something like affect markers for the text that goes in, or pretraining on children’s literature before going on to the more technically and morally complex text?
I haven’t run any experiments of my own, but this doesn’t seem obviously stupid to me.

khafra 11 Feb 2025 15:21 UTC
4 points
0
in reply to: Shankar Sivarajan’s comment on: Wired on: “DOGE personnel with admin access to Federal Payment System”
When there’s little incentive against classifying harmless documents, and immense cost to making a mistake in the other direction, I’d expect overclassification to be rampant in these bureaucracies.
Your analysis of the default incentives is correct. However, if there is any institution that has noticed the mounds of skulls, it is the DoD. Overclassification, and classification for inappropriate reasons (explicitly enumerated in written guidance: avoiding embarrassment, covering up wrongdoing) is not allowed, and the DoD carries out audits of classified data to identify and correct overclassification.

It’s possible they’re not doing enough to fight against the natural incentive gradient toward overclassification, but they’re trying hard enough that I wouldn’t expect positive EV from disregarding all the rules.