Dweomite

Karma: 1,504

Dweomite 15 Mar 2026 21:32 UTC
6 points
0
on: On The Independence Axiom
I think you do a good job of arguing (in the earlier part of the article) that it is logically possible to drop the independence axiom without being money-pumped by giving up logical consequentialism but keeping dynamic consistency. However, I think you do a poor job of arguing (in the later parts) that we should give up consequentialism.
You examine 3 in-depth examples to try to show that we’d be fine if we dropped independence: ergodicity economics, the Allais Paradox, and the Ellsburg Paradox. In all 3 cases, it think your argument is missing a critical step that is required for its validity.
1.
In the section on ergodicity economics, you claim ergodicity follows resolute choice because it forms a plan based on the entire decision tree and then sticks to that plan. But this isn’t sufficient to carry your point, because agents that obey the independence axiom can also be described as sticking to their original plan. (In fact, any agent with dynamic consistency can be described this way, and you agreed we need dynamic consistency.)
What you’d need to show in order to carry your point is that ergodicity violates consequentialism. For example, you could show this by constructing an example where a local re-evaluation would deviate from the original plan, but ergodicity follows the original plan anyway. Without showing that, this example fails to support your case.
2.
In the section on the Allais Paradox, you give the following reasoning for why the common human answer is rational:
This is precisely the point we made with the example in the introduction to section 3. If the common component C is a large safety net, you can afford to take more risk on the remaining branch. If C is negligible, you should be more conservative. Your preference between A and B should depend on what else is in the package, because you are one agent facing the total distribution, not a collection of independent sub-agents each evaluating one branch in isolation.
But this reasoning seems to be exactly backwards from the actual result: When component C provides a safety net of $1M, humans choose the lower-risk option A, but when component C provides nothing, humans choose the higher-risk option B. Your argument in this paragraph undermines, rather than supports, the rationality of the choice you are defending.
And aside from this one backwards paragraph, you don’t seem to offer any basis at all for how the context ought to change the answer. You have several paragraphs of philosophical hand-waving about how it is good and appropriate that it should, but don’t appear to offer anything like an algorithm saying how we should take it into account. Without a model predicting the preference for A over B, you fail to win any Bayes points.
Nothing in this section sounds like a logical reason to consider the common human choice in the Allais Paradox more rational than I previously did.
Sidenote: Empirical Money Pumps?
This discussion also suggests the question: Can you actually, in real life, use the Allais Paradox to money-pump humans? If you can, then the behavior of humans does not provide evidence of the rationality of their choices in this scenario, regardless of any theoretical arguments about how we could avoid money pumps while keeping this preference. My brief Google search failed to immediately turn up any experiments involving actual money pumps, but I haven’t done a careful literature review.
Sidenote: Can the Allais Paradox result be justified in other ways?
There’s two defenses for this that I somewhat credit:
A.
Eliminating a possible outcome makes it cognitively cheaper to plan for what happens after the lottery, because you don’t need to consider as many distinct cases.
Notice this reasoning only applies if there is an “after”, which is usually true in real life but usually false in abstract formal examples.
B.
Suppose you are living among a population of similar agents that compete for resources, and all of the other agents get to make a similar choice between lotteries. Then, the outcome where you get nothing is always the same in terms of absolute resources, but not in terms of relative resources when comparing to other people.
If you choose between a 1% chance and a 0% chance of getting nothing, then the few agents who end up with nothing will be out-competed by almost everyone around them. They will lose approximately all competitions and will be the obvious choice for predators to target.
If you choose between a 90% chance and a 89% chance of getting nothing, then agents who win millions will still out-compete the ones who get nothing, but they’ll have a harder time monopolizing all opportunities because there won’t be as many winners. Many of the “losers” will still have a decent relative standing.
This reasoning doesn’t apply if you somehow know this lottery is a special one-time opportunity for you only, but it seems plausible that our instincts evolved mostly to deal with non-unique opportunities.
However, notice that these two reasons justify different things. Reason A justified zero-risk bias, i.e. paying a premium to reduce a risk to ~zero; it has a sharp change in your preferences at a specific probability. Contrariwise, reason B would remain nearly as strong if we changed “1% or 0%” to “2% or 1%”.
3.
In the section on the Ellsberg Paradox, I think you make some clever points about why the standard human answer might be rational, but I don’t see how any part of this section ties into logical consequentialism or the violation thereof. For example, you have not explained how a money pump could be constructed based on this scenario.
4.
The argument in favor of logical consequentialism is obvious: If you violate it, you are leaving money on the table. (Violating it implies that you are making a choice that satisfies your own preferences less than another choice you could have made in the current circumstances.)
In fact, this is essentially the same reason that we think that vulnerability to money pumps is bad (you end up with less money than you predictably could have). So it seems pretty weird to argue that we need to keep all the axioms that prevent money pumps but it’s somehow ok to drop consequentialism. I’m not sure what set of assumptions would validly lead to that combination of conclusions.

Dweomite 22 Jan 2026 8:17 UTC
2 points
0
in reply to: jimmy’s comment on: Why Motivated Reasoning?
Sure, give me meta-level feedback.

Dweomite 19 Jan 2026 0:16 UTC
2 points
0
in reply to: jimmy’s comment on: Why Motivated Reasoning?
If it has an integral gain, it will notice this and try to add more and more heat until it stops being wrong. If it can’t, it’s going to keep asking for more and more output, and keep expecting that this time it’ll get there. And because it lacks the control authority to do it, it will keep being wrong, and maybe damage its heating element by asking more than they can safely do. Sound familiar yet?
From tone and context, I am guessing that you intend for this to sound like motivated reasoning, even though it doesn’t particularly remind me of motivated reasoning. (I am annoyed that you are forcing me to guess what your intended point is.)
I think the key characteristic of motivated reasoning is that you ignore some knowledge or model that you would ordinarily employ while under less pressure. If you stay up late playing Civ because you simply never had a model saying that you need a certain amount of sleep in order to feel rested, then that’s not motivated reasoning, it’s just ignorance. It only counts as motivated reasoning if you, yourself would ordinarily reason that you need a certain amount of sleep in order to feel rested, but you are temporarily suspending that ordinary reasoning because you dislike its current consequences.
(And I think this is how most people use the term.)
So, imagine a scenario where you need 100J to reach your desired temp but your heating element can only safely output 50J.
If you were to choose to intentionally output only 50J, while predicting that this would somehow reach the desired temperature (contrary to the model you regularly employ in more tractable situations), then I would consider that a central example of motivated reasoning. But your model does not seem to me to explain how this strategy arises.
Rather, you seem to be describing a reaction where you try to output 100J, meaning you are choosing an action that is actually powerful enough to accomplish your goal, but which will have undesirable side-effects. This strikes me as a different failure mode, which I might describe as “tunnel vision” or “obsession”.
I suppose if your heating element is in fact incapable of outputting 100J (even if you allow side-effects), and you are aware of this limitation, and you choose to ask for 100J anyway, while expecting this to somehow generate 100J (directly contra the knowledge we just assumed you have), then that would count as motivated reasoning. But I don’t think your analogy is capable of representing a scenario like this, because you are inferring the controller’s “expectations” purely from its actions, and this type of inference doesn’t allow you to distinguish between “the controller is unaware that its heating element can’t output 100J” from “the controller is aware, but choosing to pretend otherwise”. (At least, not without greatly complicating the example and considering controllers with incoherent strategies.)
Meta-level feedback: I feel like your very long comment has wasted a lot of my time in order to show off your mastery of your own field in ways that weren’t important to the conversation; e.g. the stuff about needing to react faster than the thermometer never went anywhere that I could see, and I think your 5-paragraph clarification that you are interpreting the controller’s actions as implied predictions could have been condensed to about 3 sentences. If your comments continue to give me similar feelings, then I will stop reading them.

Dweomite 16 Jan 2026 19:50 UTC
2 points
0
in reply to: jimmy’s comment on: Why Motivated Reasoning?
At some point, a temperature control system needs to take actions to control the temperature. Choosing the correct action depends on responding to what the temperature actually is, not what you want it to be, or what you expect it to be after you take the (not-yet-determined) correct action.
If you are picking your action based on predictions, you need to make conditional predictions based on different actions you might take, so that you can pick the action whose conditional prediction is closer to the target. And this means your conditional predictions can’t all be “it will be the target temperature”, because that wouldn’t let you differentiate good actions from bad actions.
It is possible to build an effective temperature control system that doesn’t involve predictions at all; you can precompute a strategy (like “turn heater on below X temp, turn it off above Y temp”) and program the control system to execute that strategy without it understanding how the strategy was generated, and in that case it might not have models or make predictions at all. But if you were going to rely on predictions to pick the correct action, it would be necessary to make some (conditional) predictions that are not simply “I will succeed”.

Dweomite 15 Jan 2026 1:17 UTC
10 points
1
on: Why Motivated Reasoning?
Your explanation about the short-term planner optimizing against the long-term planner seems to suggest we should only see motivated reasoning in cases where there is a short-term reward for it.
It seems to me that motivated reasoning also occurs in cases like gamblers thinking their next lottery ticket has positive expected value, or competitors overestimating their chances of winning a competition, where there doesn’t appear to be a short-term benefit (unless the belief itself somehow counts as a benefit). Do you posit a different mechanism for these cases?
I’ve been thinking for a while that motivated reasoning sort of rhymes with reward hacking, and might arise any time you have a generator-part Goodharting an evaluator-part. Your short-term and long-term planners might be considered one example of this pattern?
I’ve also wondered if children covering their eyes when they get scared might be an example of the same sort of reward hacking (instead of eliminating the danger, they just eliminate the warning signal from the danger-detecting part of themselves by denying it input).

Dweomite 5 Jan 2026 3:33 UTC
2 points
0
in reply to: jbash’s comment on: The Weirdness of Dating/Mating: Deep Nonconsent Preference
… except that you have a natural immunity (well, aversion) to adopting complex generators, and a natural affinity for simple explanations. Or at least I think both of those are true of most people.
It seems pretty important to me to distinguish between “heuristic X is worse than its inverse” and “heuristic X is better than its inverse, but less good than you think it is”.
Your top-level comment seemed to me like it was saying that a given simple explanation is less likely to be true than a given complex explanation. Here, you seem to me like you’re saying that simple explanation is more likely to be true, but people have a preference for them that is stronger than the actual effect, and so you want to push people back to having a preference that is weaker but still in the original direction.

Dweomite 9 Dec 2025 0:56 UTC
2 points
0
in reply to: J Bostock’s comment on: The Most Common Bad Argument In These Parts
“Possible” is a subtle word that means different things in different contexts. For example, if I say “it is possible that Angelica attended the concert last Saturday,” that (probably) means possible relative to my own knowledge, and is not intended to be a claim about whether or not you possess knowledge that would rule it out.
If someone says “I can(not) imagine it, therefore it’s (not) possible”, I think that is valid IF they mean “possible relative to my understanding”, i.e. “I can(not) think of an obstacle that I don’t see any way to overcome”.
(Note that “I cannot think of a way of doing it that I believe would work” is a weaker claim, and should not be regarded as proof that the thing is impossible even just relative to your own knowledge.)
If that is what they mean, then I think the way to move forward is for the person who imagines it impossible to point out an obstacle that seems insurmountable to them, and then the person who imagines it possible to explain how they imagine solving it, and repeat.
If someone is trying to claim that their (in)ability to imagine something means that the laws of the universe (dis)allow it, then I think the person imagining it is impossible had better be able to point out a specific conflict between the proposal and known law, and the person imagining it is possible had better be able to draw a blueprint describing the thing’s composition and write down the equations governing its function. Otherwise I call bullshit. (Yes, I’m aware I am calling bullshit on a number of philosophers, here.)

Dweomite 8 Dec 2025 22:55 UTC
4 points
0
in reply to: plex’s comment on: The Most Common Bad Argument In These Parts
I interpreted the name as meaning “performed free association until the faculty of free association was exhausted”. It is, of course, very important that exhausting the faculty does not guarantee that you have exhausted the possibility space.

Dweomite 15 Nov 2025 1:51 UTC
8 points
0
on: Please, Don’t Roll Your Own Metaethics
Alas, unlike in cryptography, it’s rarely possible to come up with “clean attacks” that clearly show that a philosophical idea is wrong or broken.
I think the state of philosophy is much worse than that. On my model, most philosophers don’t even know what “clean attacks” are, and will not be impressed if you show them one.
Example: Once in a philosophy class I took in college, we learned about a philosophical argument that there are no abstract ideas. We read an essay where it was claimed that if you try to imagine an abstract idea (say, the concept of a dog), and then pay close attention to what you are imagining, you will find you are actually imagining some particular example of a dog, not an abstraction. The essay went on to say that people can have “general” ideas where that example stands for a group of related objects rather than just for a single dog that exactly matches it, but that true “abstract” ideas don’t exist.^[1]
After we learned about this, I approached the professor and said: This doesn’t work for the idea of abstract ideas. If you apply the same explanation, it would say: “Aha, you think you’re thinking of abstract ideas in the abstract, but you’re not! You’re actually thinking of some particular example of an abstract idea!” But if I’m thinking of a particular example, then there must be at least one example to think of, right? So that would prove there is at least one member of the class of abstract ideas (whatever “abstract ideas” means to me, inside my own head). Conversely, if I’m not thinking of an example, then the paper’s proposed explanation is wrong for the idea of abstract ideas itself. So either way, there must be at least one idea that isn’t correctly explained by the paper.
The professor did not care about this argument. He shrugged and brushed it off. He did not express agreement, he did not express a reason for disagreement, he was not interested in discussing it, and he did not encourage me to continue thinking about the class material.
On my model, the STEM fields usually have faith in their own ideas, in a way where they actually believe those ideas are entangled with the Great Web. They expect ideas to have logical implications, and expect the implications of true ideas to be true. They expect to be able to build machines in real life and have those machines actually work. It’s something like taking ideas seriously, and something like taking logic seriously, and taking the concept of truth seriously, and seriously believing that we can learn truth if we work hard. I’m not sure if I’ve named it correctly, but I do think there’s a certain mental motion of genuine truth-seeking that is critical to the health of these fields and that is much less common in many other fields.
Also on my model, the field of philosophy has even less of this kind of faith than most fields. Many philosophers think they have it, but actually they mostly have the kind of faith where your subconscious mind chooses to make your conscious mind believe a thing for non-epistemic reasons (like it being high-status, or convenient for you). And thus, much of philosophy (though not quite all of it) is more like culture war than truth-seeking (both among amateurs and among academics).
I think if I had made an analogous argument in any of my STEM classes, the professor would have at least taken it seriously. If they didn’t believe the conclusion but also couldn’t point out a specific invalid step, that would have bothered them.
I suspect my philosophy professor tagged my argument as being from the genre of math, rather than the genre of philosophy, then concluded he would not lose status for ignoring it.
1. ^
  I think this paper was clumsily pointing to a true and useful insight about how human minds naturally tend to use categories, which is that those categories are, by default, more like fuzzy bubbles around central examples than they are like formal definitions. I suspect the author then over-focused on visual imagination, checked a couple of examples, and extrapolated irresponsibly to arrive at a conclusion that I hope is obviously-false to most people with STEM backgrounds.

Dweomite 13 Nov 2025 5:53 UTC
6 points
0
on: Human Values ≠ Goodness
An awful lot of people, probably a majority of the population, sure do feel deep yearning to either inflict or receive pain, to take total control over another or give total control to another, to take or be taken by force, to abandon propriety and just be a total slut, to give or receive humiliation, etc.
This is rather tangential to the main thrust of the post, but a couple of people used a react to request a citation for this claim.
One noteworthy source is Aella’s surveys on fetish popularity and tabooness. Here is an older one that gives the % of people reporting interest, and here is a newer one showing the average amount of reported interest on a scale from 0 (none) to 5 (extreme), both with tens of thousands of respondents.
Very approximate numbers that I’m informally reading off the graphs:
- Giving pain: 30% of people interested (first graph), ²⁄₅ average interest (second graph)
- Receiving pain: 35% and ²⁄₅
- Being dominant: 30% and ³⁄₅
- Being submissive: 40% and ³⁄₅
- Rapeplay: >10% giving, 20% receiving, the second graph combines these at ²⁄₅
- Slut Humiliation (first graph): 25%
- Humiliation (second graph): ²⁄₅
Note that a ³⁄₅ average interest could mean either that 60% of people are extremely into it or that nearly everyone is moderately into it (or anything in between). Which seems to imply the survey used in the more recent graph has significantly kinkier answers overall, unless I’m misunderstanding something. (I’m fairly certain that people with zero interest ARE being included in the average, because several other fetishes have average interest below 1, which should be impossible if not.)
If we believe this data, it seems pretty safe to guess that a majority of people are into at least one of these things (unless there is near-total overlap between them). The claim that a majority “feel a deep yearning” is not strongly supported but seems plausible.
(I was previously aware that BDSM interest was pretty common for an extremely silly reason: I saw some people arguing about whether or not Eliezer Yudkowsky was secretly the author of The Erogamer, one of them cited the presence of BDSM in the story as evidence in favor, and I wanted to know the base rate to determine how to weigh that evidence.
I made an off-the-cuff guess of “between 1% and 10%” and then did a Google search with only mild hope that this statistic would be available. I wasn’t able today to re-find the pages I found then, but according to my recollection, my first search result was a page describing a survey of ~1k people claiming a ~75% rate of interest in BDSM, and my second search result was a page describing a survey of ~10k people claiming ~40% had participated in some form of BDSM and an additional ~40% were interested in trying it. I was also surprised to read (on the second page) that submission was more popular than dominance, masochism was more popular than sadism, and masochism remained more popular than sadism even if you only looked at males. Also, bisexuality was reportedly something like 5x higher within the BDSM-interested group than outside of it.)

Dweomite 12 Nov 2025 23:39 UTC
6 points
0
in reply to: Nina Panickssery’s comment on: Human Values ≠ Goodness
If you’re a moral realist, you can just say “Goodness” instead of “Human Values”.
I notice I am confused. If “Goodness is an objective quality that doesn’t depend on your feelings/mental state”, then why would the things humans actually value necessarily be the same as Goodness?

Dweomite 12 Nov 2025 23:27 UTC
3 points
0
in reply to: Wei Dai’s comment on: Human Values ≠ Goodness
What would you want such a disclaimer or hint to look like?
(I am concerned that if a post says something like “this post is aimed at low-level people who don’t yet have a coherent foundational understanding of goodness and values” then the set of people who actually continue reading will not be very well correlated with the set of people we’d like to have continue reading.)

Dweomite 9 Nov 2025 0:59 UTC
20 points
19
on: Insofar As I Think LLMs “Don’t Really Understand Things”, What Do I Mean By That?
A smart human-like mind looking at all these pictures would (I claim) assemble them all into one big map of the world, like the original, either physically or mentally.
On my model, humans are pretty inconsistent about doing this.
I think humans tend to build up many separate domains of knowledge and then rarely compare them, and even believe opposite heuristics by selectively remembering whichever one agrees with their current conclusion.
For example, I once had a conversation about a video game where someone said you should build X “as soon as possible”, and then later in the conversation they posted their full build priority order and X was nearly at the bottom.
In another game, I once noticed that I had a presumption that +X food and +X industry are probably roughly equally good, and also a presumption that +Y% food and +Y% industry are probably roughly equally good, but that these presumptions were contradictory at typical food and industry levels (because +10% industry might end up being about 5 industry, but +10% food might end up being more like 0.5 food). I played for dozens of hours before realizing this.
What links here?
- Seth Herd's comment on Insofar As I Think LLMs “Don’t Really Understand Things”, What Do I Mean By That? by johnswentworth (10 Nov 2025 2:11 UTC; 4 points)

Dweomite 6 Nov 2025 22:22 UTC
4 points
2
in reply to: Veedrac’s comment on: On Fleshling Safety: A Debate by Klurl and Trapaucius.
I don’t think Eliezer’s actual real-life predictions are narrow in anything like the way Klurl’s coincidentally-correct examples were narrow.
Also, Klurl acknowledges several times that Trapaucius’ arguments do have non-zero weight, just nothing close to the weight they’d need to overcome the baseline improbability of such a narrow target.

Dweomite 5 Nov 2025 21:32 UTC
2 points
0
in reply to: Veedrac’s comment on: On Fleshling Safety: A Debate by Klurl and Trapaucius.
Thank you for being more explicit.
If you write a story where a person prays and then wins the lottery as part of a demonstration of the efficacy of prayer, that is fictional evidence even though prayer and winning lotteries are both real things.
In your example, it seems to me that the cheat is specifically that the story presents an outcome that would (legitimately!) be evidence of its intended conclusion IF that outcome were representative of reality, but in fact most real-life outcomes would have supported the conclusion much less than that. (i.e. there are many more people who pray and then fail to win the lottery, than there are people who pray and then do win.)
If you read a story where someone tried and failed to build a wooden table, then attended a woodworking class, then tried again to build a table and succeeded, I think you would probably consider that a fair story. Real life includes some people who attend woodworking classes and then still can’t build a table when they’re done, but the story’s outcome is reasonably representative, and therefore it’s fair.
Notice that, in judging one of these fair and the other unfair, I am relying on a world-model that says that one (class of) outcome is common in reality and the other is rare in reality. Hypothetically, someone could disagree about the fairness of these stories based only on having a different world-model, while using the same rules about what sorts of stories are fair. (Maybe they think most woodworking classes are crap and hardly anyone gains useful skills from them.)
But I do not think a rare outcome is automatically unfair. If a story wants to demonstrate that wishing on a star doesn’t work by showing someone who needs a royal flush, wishes on a star, then draws a full house (thereby losing), the full house is an unlikely outcome, but since it’s unlikely in a way that doesn’t support the story’s aesop, it’s not being used as a cheat. (In fact, notice that every exact set of 5 cards they might have drawn was unlikely.)
If your concern is that Klurl and Trapaucius encountered a planet that was especially bad for them in a way that makes their situation seem far more dangerous than was statistically justified based on the setup, then I think Eliezer probably disagrees with you about the probability distribution that was statistically justified based on the setup.
If, instead, your concern is that the correspondence between Klurl’s hypothetical examples and what they found when reaching the planet was improbably high, then I agree that is very coincidental, but I do not think that coincidence is being used as support for the story’s intended lessons. The story is not trying to convince you that Klurl can narrowly predict exactly what they’ll find, and in fact Klurl denies this several times.
The coincidence could perhaps cause some readers to conclude a high degree of predictability anyway, despite lack of intent. I’d consider that a bad outcome, and my model of Eliezer also considers that a bad outcome. I’m not sure there was a good way to mitigate that risk without some downside of equal or greater severity, though. I think there’s pedagogical value in pointing out a counter-example that is familiar to the reader at the time the argument is being made, and I don’t think any simple change to the story would allow this to happen without it being an unlikely coincidence.

Dweomite 4 Nov 2025 22:27 UTC
5 points
0
in reply to: Veedrac’s comment on: On Fleshling Safety: A Debate by Klurl and Trapaucius.
I notice I am confused about nearly everything you just said, so I imagine we must be talking past each other.

Dweomite 2 Nov 2025 21:27 UTC
5 points
0
in reply to: Veedrac’s comment on: On Fleshling Safety: A Debate by Klurl and Trapaucius.
On the contrary: This is perhaps the only way the story could avoid generalizing from fictional evidence. Your complaint about Klurl’s examples are that they are “coincidentally” drawn from the special class of examples that we already know are actually real, which makes them not fictional. Any examples that weren’t special in this way would be fictional evidence, and readers could object that we’re not sure if those examples are actually possible.
If you think that the way the story played out was misleading, that seems like a disagreement about reality, not a disagreement about how stories should be used. Any given story must play out in one particular way, and whether that one way is representative or unrepresentative is a question of how it relates to reality, not a question of narrative conventions. If Trapaucius had arrived at the planet to find Star Trek technology and been immediately beamed into a holding cell, would that somehow have been less of a cheat, because it wasn’t real?

Dweomite 2 Nov 2025 1:00 UTC
5 points
−1
in reply to: johnswentworth’s comment on: Why Is Printing So Bad?
I would agree that, while reality-in-general has a surprising amount of detail, some systems still have substantially more detail than others, and this model applies more strongly to systems with more detail. I think of computer-based systems as being in a relatively-high-detail class.
I also think there are things you can choose to do when building a system to make it more durable, and so another way that systems vary is in how much up-front cost the creator paid to insulate the system against entropy. I think furniture has traditionally fallen into a high-durability category, as an item that consumers expect to be very long-lived...although I think modernity has eroded this tradition somewhat.

Dweomite 2 Nov 2025 0:30 UTC
11 points
2
on: Why Is Printing So Bad?
I have a tentative model for this category of phenomenon that goes something like:
1. Reality has a surprising amount of detail. Everyday things that you use all the time and appear simple to you are actually composed of many sub-parts and sub-sub-parts all working together.
2. The default state of any sub-sub-part is to not be in alignment with your purpose. There are many more ways for a part to be badly-aligned than for it to be well-aligned, so in order for it to be aligned, there has to be (at some point) some powerful process that selectively makes it be aligned.
3. Even if a part was aligned, the general nature of entropy means there are many petty, trivial reasons that it could stop being aligned with little fanfare. (Though the mean-time-to-misalignment can vary dramatically depending on which part we’re talking about.)
4. So, it shouldn’t be surprising when find that a complex system is broken in seven different ways for trivial and banal reasons. That’s the default outcome if you just put a system in a box and leave it there for a while.
OK, but if that’s the default state, then how do I explain the systems that aren’t like that?
1. Suppose we have a system that is initially working perfectly until, one day, one tiny thing goes wrong with the system.
2. If people use the system frequently and care about the results, then someone will promptly notice that there is one tiny thing wrong.
3. If the person who discovers this expects to continue using the system in the future, they have an incentive to fix the problem.
4. If there is only one problem, and it is tiny, then the cost to diagnose the fix the problem is probably small.
5. So, very often, the person will just go ahead and fix it, immediately and at their own expense, just to make the problem go away.
6. No one keeps careful track of this—not even the person performing the fix. So this low-level ongoing maintenance fades into the background and gets forgotten, creating the illusion of a system that just continues working on its own.
  1. This is especially true for multi-user systems where no individual user does a large percentage of the maintenance
I don’t think this invisible-maintenance situation describes the majority of systems, but I think it does describe the majority of user-system interactions, because the systems that get this sort of maintenance tend to be the ones that are heavily used. This creates the illusion that this is normal.
Some of the ways this can fail include:
- Users cannot tell that the system has developed a small problem
  - Maybe the system’s performance is too inconsistent for a small problem to be apparent
  - Maybe the operator is not qualified to judge the quality of the output
  - Maybe the system is used so infrequently that there’s time for several problems develop between uses
- For an individual user, the cost (to that user) of fixing a problem is higher than the selfish benefits to that particular user of the problem being fixed
  - Maybe no single user expects to use the system very many times in the future
  - Maybe users lack the expertise or the authority to perform the fix (and there is no standard channel for maintenance requests that is sufficiently cheap and reliable)
  - Maybe the system is just inherently expensive to repair or to debug (relative to the value the system provides to a single user)

Dweomite 1 Nov 2025 23:26 UTC
8 points
4
in reply to: Veedrac’s comment on: On Fleshling Safety: A Debate by Klurl and Trapaucius.
On my reading, most of Klurl’s arguments are just saying that Trapaucius is overconfident. Klurl gives many specific examples of ways things could be different than Trapaucius expects, but Klurl is not predicting that those particular examples actually will be true, just that Trapaucius shouldn’t be ruling them out.
“I don’t recall you setting an exact prediction for fleshling achievements before our arrival,” retorted Trapaucius.
“So I did not,” said Klurl, “but I argued for the possibility not being ruled out, and you ruled it out. It is sometimes possible to do better merely by saying ‘I don’t know’”
Eliezer chooses to use many specific examples that do happen to be actually true, which makes Klurl’s guesses extremely coincidental within the story. This is bad for verisimilitude, but reduces the difficulty to the reader in understanding the examples, and makes a clearer and more water-tight case that Trapaucius’ arguments are logically unsound.