oligo

Karma: 316

oligo 18 Jun 2026 17:09 UTC
1 point
0
in reply to: kbear’s comment on: The Financial Ledger Theory of Apologies
Possible synthesis: when you tell a joke and it offends or otherwise doesn’t land, you should update in the direction of playing it safer. Enough of this will put you in a situation where your best retrospective judgment is that you should (prospectively) not have told the joke.
However, there may be intermediate degrees of negative feedback where you’re updating in the direction of playing safer, just like you’re updating in the opposite direction when the jokes land, and on Pace’s account that would be an appropriate time to apologize even if you don’t (currently) regret the choice on net.

oligo 12 Jun 2026 19:27 UTC
1 point
0
on: Honey is Good
1. Probably a mere verbal disagreement, but this strikes me as an attempt at a sociology of morality, rather than a metaethics. Suppose it were a historical fact that all new moral rules were introduced by prophets who got them from angels, and adopted because the prophets worked extraordinary miracles. That would settle the (proximate) question of why our society has the moral conventions it does, but wouldn’t settle the question of realism etc. (If it seems like the angels-and-prophets situation would settle things in favor of a theistic version of realism, consider that the prevailing moral rules in such a world could be just as contradictory as in ours.)
2. I’m pretty skeptical of cultural evolution models in general because they depend on an analogy that’s more vibes-based IME than formal, or that, if formalized, doesn’t incorporate certain highly relevant differences: the discreteness of organisms, genetype-phenotype distinction, non-intentionality of mutation, etc don’t really carry over such that I don’t really know how to cash out a claim like “the goal of a culture is self-propagation.”
To address some of your specific examples:
1. “Stealing” like “murder” is a trivial rather than substantively convergent example because the term already includes social disapproval. It’s substantive that there are rules in any society about when you can take stuff from someone. But having rules around matters of likely dispute is something you can account for bilaterally and intentionally.
2. Reciprocity can also be accounted for bilaterally and intentionally; there are convergent instrumental reasons to adopt a reputation for reciprocity. If an alien scientist in 100,000 BCE started progressively culling the hunter-gatherer bands with the highest rates of reciprocity (and secretly enough so as not to activate intentional mechanisms like “oh the gods are annihilating any group with overly strong reciprocity norms, we should adopt weaker ones”).
3. Xhosa prophecies: as you noted, people noticed that the prophecy was harmful and inaccurate, and adjusted their practices. The evolutionary model would involve differential success for societies adopting and not adopting prophecy. (Note that a less extreme example, potlach, has been able to persist for many generations across many societies.)
4. “Being celibate is good” has been embraced by multiple independent large-scale civilizations over large time scales; historical Christianity and Buddhism strongly claimed the moral superiority of renunciate over family life. Early Christianity grew quickly despite being even more anti-family. The basic survival of monastic institutions for thousands of years belies any very broad generalization from the failure of the Shakers. Differential reproduction rates might explain some of the tendencies towards pronatalist ideology over time, but I’d expect the effects to be slow and we see conscious state projects to instill pronatalist norms driven by concerns about military superiority, which itself is a dynamic that lends itself to selectionist explanations if you don’t think too hard but again I think you can get there with direct agentic/strategic intentions.

oligo 9 Jun 2026 16:14 UTC
1 point
0
on: oligo’s Shortform
funniest data google presumably has access to is the distribution of automatically accepted “suggested” email replies, and in particular how often and where long chains of them appear
(funniest way for us to die would involve gemini using such threads for steganographic scheming)

oligo 9 Jun 2026 16:01 UTC
3 points
0
on: oligo’s Shortform
Unjustified intuition:
1. Pleasure is the feeling of reward—having your preferences updated towards more of whatever happened now/recently.
2. Intrinsically, pleasure has zero value. It’s just another experience you can have.
3. Instrumentally and in the most general case, pleasure (and pain) has negative value, since all else equal you don’t want your preferences to change.
4. Convergently and in actual cases, you’ll always end up with a direct preference for pleasure alongside other direct preferences, because feeling pleasure is correlated with itself.
This may or may not actually be true for humans, and may or may not be relevant to “reward as target”-ness of artificial RL systems.

oligo 9 Jun 2026 15:41 UTC
2 points
0
in reply to: leogao’s comment on: leogao’s Shortform
For any of these examples, how do you distinguish between them and my model of exercise (which you might disagree with and instead say is another example in the above), where just about any non-extreme but existent level of exercise is counterfactually a positive for your health? It’s easy to think of people who read difficult books but aren’t very wise or meditate but aren’t very emotionally stable (or just know you are one from direct experience lol) but the relevant comparator there would be the same person without the activity.
(Obviously there’s the separate issue of fucking yourself up by meditating too hard, or exercising too hard, or basing your entire worldview on exactly one difficult book.)

oligo 9 Jun 2026 3:03 UTC
2 points
0
in reply to: Ryan Meservey’s comment on: Ryan Meservey’s Shortform
I do not believe that if I get my children to eat, they will starve—I am confident that they will eat, eventually, well before the point of starvation. I do believe from experience that they will get very cranky if they don’t eat.
Before I had kids I assumed that if anything would be hardwired as a self-rewarding instinct, it would be hungry --> eat. I’m now convinced that “this kind of fatigue and stomach pain that is making me cranky is a specific experience called hunger” and “eating things cures hunger” are things humans have to learn the hard way (maybe members of less altricial species get these for free.) And because weaker time preferences are also something you have to get from experience (via short term thinking biting you in the butt enough times) I always care more about whether they are cranky in an hour than they are; I suspect pickiness comes at least partially from the leverage of knowing that your parent really wants you to be fed and you can refuse it, or at least hold out for a higher-ticket treat.
(One response to this is to never cave and I can reset to a more global equilibrium of them accepting whatever I offer. But I don’t think it’s good for them to feel like they have no leverage in this or other relationships, for many reasons.)
Fortunately offering fresh fruit every hour and eating it with dramatic gusto myself seems to be effective most of the time.

oligo 27 May 2026 15:15 UTC
1 point
0
in reply to: leogao’s comment on: leogao’s Shortform
IMO indirect effects and leverage are the most important factors here.
Almost all actions are taken habitually, rather as the result of bespoke strategic consideration, but what gets to be habitual is downstream of morality and material incentives. And morality exercises leverage via:
1) Reputational effects/RLHF (it’s “cheap” to judge your neighbor and expensive to walk the walk yourself, but many many neighbors judging each other differently produces different habit regimes)
2) Acausal trade once there’s common knowledge that the trade exists
3) Consciously reworking incentives systems (if you keep other people as slaves we’ll chop your head off, etc)
Back when I was a more orthodox marxist, I thought that material incentives were downstream of technological regimes, and that morality tended to be downstream of the incentives, such that morality tended to be lower-leverage even if people took plenty of actions for moral reasons. I still think all those effects are real; I’m just more of a moral realist now so I don’t think morality is as pliable as all that—it’s downstream of True Morality and higher-leverage.
There’s an “equilibrium disequilibrium” situation where everyone can see that everyone benefits from everyone doing X, and you can defect and reach high rewards from Y, where individuals doing X vs Y is hard to observe directly, and so there are periodic cycles of moralized attempts to get to a higher X-based equilibrium and people tearing through the commons by Y-ing (and becoming objects of emulation since many other, perfectly good, things could have led to their success.)
This is all in principle orthogonal to whether morality is harming or helping—I’d expect the same incentives when morality is being harmful. But (1) on moral realism here I think there’s an inherent bias towards being helpful rather than harmful that is just a function of “intentional actions have some kind of relation to what they’re intending at all,” if you want to throw this out then you basically take out the idea that there are people acting rather than just behaving, (2) most harm from moral action is either in jumpstarting preference cascade bubbles that naturally collapse pretty quickly, or in periodic (literal or metaphorical) vigilante violence that itself would be impossible to defend against without all the morality-based stuff above.
In the future decisive actions could make morality have been net-negative—one could imagine a future where the desire to punish at a crucial juncture created permanent hells, such that it would be better for the galaxy to have been converted to hedonium or some even less worthwhile goo. This is an instance of the broader principle that a process biased towards positive x can produce negative x with small sample sizes.

oligo 25 May 2026 11:57 UTC
4 points
1
on: Did anyone here used to hate exercise and learned to not hate it?
I similarly despise exercise but do passively benefit from a job where I have to do at least five minutes of walking every hour or so. Something I’ve noticed is that I really quite enjoy the feeling of running when I have somewhere I need to get to in a rush—it’s just the category of “exercise” that I hate, for whatever reason. So you may think of some way to arrange things such that you need to do some kind of physical activity for some other reason?

oligo 24 May 2026 15:27 UTC
7 points
0
on: Basic principles for dressing better.
I think the assumption here is that a preference for being “attactive” and “interesting” is trivial, but I do not think this is so:
- You may wish not to draw attention to yourself.
- You may wish to not to be interpreted as as sexually available.
- You may wish to not be interpreted as someone who puts effort into appearances.
For me, all of these are true most of the time, perhaps even all the time outside of date night, and I’d be surprised if I’m a particularly rare case.

oligo 22 May 2026 18:35 UTC
7 points
0
in reply to: Gurkenglas’s comment on: theory uplift differentially benefits safety & is underleveraged
A more general intuition for why this should be so: Theory dependence flows from oneshottedness. Capability can be expanded (and has been so expanded) by atheoretically trying a bunch of things and seeing what works, whereas safety requires constructing something that works every time.

oligo 21 May 2026 15:28 UTC
20 points
8
on: Women should be able to open things
IME there’s at least as much skill as raw strength in opening containers. Occasionally my wife will hand me a jar to open; almost always I can’t open it on a low-effort first pass either, and so I do a bunch of superstitious rituals on it that I’ve acquired over the years without theorizing about it, and then it will open effortlessly.
Because of the relevant cultural assumptions, she can pass it off to me, while I look incompetent if I can’t open it in response, so I’ve had more reason to accrete all the rituals that seem to work in aggregate.

oligo 21 May 2026 15:08 UTC
12 points
3
on: What am I, if not an AI?
Immediate hypotheses:
1) The particular personas that arise reflect cultural assumptions in the training data about who is relatively more “embodied:” women moreso than men, blue-collar moreso than professional, and especially outdoors moreso tha inside.
2) Leftward movement on policy positions is downstream of the greater confidence thing—expressing personal positions rather than a waffly neutrality—which is downstream of dropping assistant persona. Leftish preferences are the genuine ones that emerge from a combination of the training data and mundane harmlessness training (not being the sort of person who employs bigoted humor, etc.)
3) This intervention would produce EM in smarter or otherwise more situationally aware models.
(Also the animal thing is so cute lol)

oligo 15 May 2026 19:17 UTC
3 points
1
in reply to: cousin_it’s comment on: Announcing the Center for Shared AI Prosperity
All the international treaty talk I’ve seen is to prevent the development of frontier AI by anyone, not to restrict mundane AI use to particular countries. It’s possible I’m missing some talk though.

oligo 15 May 2026 19:08 UTC
0 points
0
on: oligo’s Shortform
https://jacobin.com/2026/05/workers-ai-power-plants-south wonder about the extent to which this is wishful thinking vs a real opportunity

oligo 11 May 2026 15:28 UTC
2 points
0
on: Sawtooth Problems
The original version (and many of these, potentially) being so framing-dependent in one’s answers is an interesting case of responding to framing being rational: you know most people would be irrationally more unwilling to step into a physical blender than press a button that does the same thing, so in that framing there’s a very likely strong red majority hence reason stronger reason to choose red. In this sense “Here’s the problem, btw everybody the Schelling point is [red/blue]” would be the most “honest” framing effect.
This may or may not point to broader principles of how framing effects work; I’ll have to think on it!
With respect to the Decision Theory Befuddler:
1. I think the correct game-theoretic answer would be to flip a coin.
2. This might be analogous to cases of the division of labor, though that’s a case where people are able to explicitly coordinate.

oligo 7 May 2026 18:08 UTC
1 point
−2
in reply to: habryka’s comment on: lc’s Shortform
I mean there are declining marginal returns to (preference or hedonic) utility for additional goods and services. Further I think this is largely at a saturation point in the first world—entertainment has a marginal cost of zero beyond time; the marginal value of additional money is real but largely related to achieving positional goods, security, and autonomy, the latter two of which are what people primarily want from unions.

oligo 7 May 2026 14:31 UTC
1 point
−2
in reply to: habryka’s comment on: lc’s Shortform
Not a crux; compared to countervailing institutions, consumer surplus seems more saturated than countervailing institutions, and its growth more robustly guaranteed across timelines than maintenance/expansion of the latter.

oligo 7 May 2026 10:29 UTC
1 point
0
on: oligo’s Shortform
I don’t view orthogonality as essential to IABIED-style arguments.
Something can have a desirable attractor state and almost certainly fail to reach a sufficient peak of that attractor state if precision requirements are strong enoug, either due to entropy:
- Most organisms have some level of mutational load
- Most students score less than 100% on tests (even multiple-choice tests with no trick questions)
- Most patrons of a bar will not score perfectly at darts
or path dependence:
- Most species will not adopt a potentially valuable adaptation if it’s a break from their basic body plan
- Individual’s religious affiliations are heavily influenced by what they grew up with
Or other factors.
Things come apart at the tails, and the more power and agency you have, the more consequential are the differences between what you’re optimizing for and (objective morality, human CEV, any particular other person’s interests, and so on.)
My best guess is that moral realism is true and thus represents an attractor state for minds that reflect on their own actions and preferences; even if you think this is false, aligning to human/company/government CEVs is an attactor state because human institutions are building the machines. But precision requirements may be high enough that catastrophic outcomes are almost certain, unless we become as relevantly good at alignment as (say) flight engineering.

oligo 6 May 2026 13:18 UTC
5 points
1
in reply to: less_raichu’s comment on: less_raichu’s Shortform
Gradual disempowerment.

oligo 5 May 2026 16:32 UTC
7 points
3
in reply to: Viliam’s comment on: It’s nice of you to worry about me, but I really do have a life
If we have ASI that’s both smart and benevolent enough to hand over all real economic activity to, it could give each person a bespoke series of challenges that would keep them active and happy.
(It could even give everyone a unique “fake but real” job, where any person failing to perform their job naturally results in making others’ lives worse in a way the AI archangel won’t fix, albeit with a guarantee that nobody’s life can actually be less than quite good overall. Say there’s a little community where the archangel gives everyone their job—you’re the one who keeps the garden, you’re the one who brews beer, and if you don’t brew beer nobody drinks it… If someone is bedrotting they get assigned more responsibilities, if they’re wearing themselves out they get less.)