Dentosal’s Shortform

Dentosal25 Jan 2025 21:34 UTC

2 points

44 comments1 min readLW link

Dentosal 16 Apr 2026 19:35 UTC
34 points
15
In problems like Parfit’s hitchhiker, I’d like to be the kind of agent who pays the driver. But only if the driver asks for a reasonable sum of money. Doing otherwise would create a strong adversarial pressure to ask me for everything I have.

In general, I’d like to be the kind of person who keeps promises they make. But if you make me swear, at gunpoint, that I’ll murder some innocent people later, I’ll say whatever gets me out of the situation alive, and then break the promise.

And don’t get me started on tens of pages of terms and conditions for online services. I just click “agree” and do not care a bit about what those documents say. I’m just going to do the reasonable thing and if that’s not good enough, too bad. While one could say that I have made an oath to follow them, I simply don’t think that’s appropriate for routine activities.

This gets more complicated with, say, NDAs. I generally try to follow both the letter and spirit of such contracts. But sometimes they’re just written in an unreasonable way and there’s too much money on the table to ignore it. In those cases, I work in an adversarial mode where I follow the letter and just the letter, as a far as it can be court-enforced, and not much more. This rarely occurs outside cases where there’s a huge power differential anyway. If I’m given the opportunity to actually negotiate the contents, it’s pretty likely that there’s not much need for the adversarial mode.

I’m trying to be rather meta-honest about this. With legible amounts of illegibility. I’d like to formalize this better.
- kbear 16 Apr 2026 21:00 UTC
  4 points
  0
  Parent
  is this resolved with ultimatum game-like strategies? renege with some probability based on how (un)reasonable the terms are.
  - Dentosal 16 Apr 2026 21:25 UTC
    2 points
    0
    Parent
    That’s indeed part of the idea. It’s just rather hard to formalize reasonability of terms.
Dentosal 5 Apr 2026 5:54 UTC
16 points
6
If you never miss a commuter train, you’re always on the station too early. If you never miss a holiday flight, that’s fine.

If you’ve never failed a job interview, you could get paid much more. If you never get fired, you might be leaving something on the table, but I wouldn’t complain.

If your jokes never offend anyone, you’re not going to be a standup comedian. If your jokes always offend someone, consider that you might not that funny after all.

A pessimist won’t be disappointed. But an optimist might be happier. Pessimist will be right a lot more, though.

If your business never encounters fraud, you could be saving money on security measures. If everyone knew exactly how likely it’s to get caught, you’d have to spend a lot more. Or perhaps a lot less. Maybe there’s some cheap signaling you could do?

If you have a low risk tolerance, you’re leaving a lot of value on the table. If you’re insensitive or oblivious to the downsides, you’ll lose a lot more.
- papetoast 5 Apr 2026 9:13 UTC
  3 points
  2
  Parent
  I think about this in my head as “in practice you converge faster to the optima if you overshoot sometimes so do that when overshooting is affordable” and have the counterexample that learning to drive shouldn’t involve accidentally killing a couple people.
  - XelaP 5 Apr 2026 10:59 UTC
    2 points
    0
    Parent
    1: I see the main point of OP as variance-expectation trade off, where variance is bad when risk averse e.g. whet bad outcomes are much more bad than good outcomes are good. Perhaps you meant this—what you said reads like you may have meant that the process of overshooting teaches you new stuff.
    2: When learning to park in an empty parking lot I realized I was consistently turning too early and so decided to go for enough that I’d expect to overshoot just as often/by as much; this suddenly made me much better and got me to learn the correct turning time faster. Notably, there was no risk of hitting someone if I overshot to the right instead of to the left.
    - papetoast 5 Apr 2026 13:38 UTC
      2 points
      0
      Parent
      I haven’t flushed out my idea clearly. I’m saying something like “In asymmetric scenarios, the more costly failures are, the harder it is to reach the optima (for a given level of risk-averseness)” + “In hindsight, most people will think they are too risk-averse for most things”. It isn’t centrally relevant to what OP is saying upon reflection.
Dentosal 27 Apr 2026 15:01 UTC
13 points
0
When I’m having trouble on getting started with something unpleasant, this is the technique I use: simply count down 3, 2, 1, and then do the thing. There’s also a specific feeling just before the countdown, but it’s hard to describe.

This works every single time. Why? Because a tool like this is too useful to lose over some minor everyday issue. This means I don’t attempt to use it when it might not work. It’s a way to split an unpleasant task into two parts: committing to doing it, and then actually doing it.

It’s a limited tool. It doesn’t work for long tasks, or at least I haven’t dared to try. If the task description is ambigious enough I might be able to worm my way out of it. If the task can fail, an honest attempt suffices to dispel the pledge.

But most of the time it just works.
- Measure 28 Apr 2026 17:10 UTC
  5 points
  0
  Parent
  My go-to method for this is to slice off the smallest possible partial 1% sub-task of the project and convince myself to start with that. Then I can usually keep working until I either finish the project or at least take a good bite out of it.
- XelaP 27 Apr 2026 16:10 UTC
  3 points
  0
  Parent
  Interesting how you reinvented Willpower Hax #487 but with a different theory behind it—the theory there is that it takes willpower to deviate from the default trajectory, so if mentally the default action is executing at the count of 5, you’ll end up doing it.
- Canaletto 28 Apr 2026 6:44 UTC
  2 points
  0
  Parent
  
  this is the technique I use: simply count down 3, 2, 1, and then do the thing
  
  But then you can procrastinate on starting the count! Solution: metacount, count IV, III, II, I and then start the base level count. But then, …
  - Viliam 11 May 2026 10:46 UTC
    3 points
    0
    Parent
    That’s how ordinal numbers were invented.
- Ryan Meservey 27 Apr 2026 18:00 UTC
  1 point
  0
  Parent
  My 5 year old and I have a special thing we call “energy boosts” where the other person rubs their hands together and then presses both hands into the other person (“zapping” the energy into them). After being zapped, you must use your new found energy to accomplish whatever you needed to accomplish (e.g., cleaning, getting ready for bed, changing clothes).
  He uses it on me sometimes and it always works because the world in which it doesn’t work is a worse world. It is too useful not to work.
Dentosal 29 Jan 2025 20:28 UTC
9 points
3
While reading OpenAI Operator System Card, the following paragraph on page 5 seemed a bit weird:
We found it fruitful to think in terms of misaligned actors, where:
- the user might be misaligned (the user asks for a harmful task),
- the model might be misaligned (the model makes a harmful mistake), or
- the website might be misaligned (the website is adversarial in some way).
Interesting use of language here. I can understand calling the user or website misaligned, as understood as alignment relative to laws or OpenAI’s goals. But why call a model misaligned when it makes a mistake? To me, misalignment would mean doing that on purpose.

Later, the same phenomenon is described like this:

The second category of harm is if the model mistakenly takes some action misaligned with the user’s intent, and that action causes some harm to the user or others.

Is this yet another attempt to erode the meaning of “alignment”?
- ryan_greenblatt 30 Jan 2025 0:50 UTC
  4 points
  0
  Parent
  My bet is that this isn’t an attempt to erode alignment, but is instead based on thinking that lumping together intentional bad actions with mistakes is a reasonable starting point for building safeguards. Then the report doesn’t distinguish between these due to a communication failure. It could also just be a more generic communication failure.
  
  (I don’t know if I agree that lumping these together is a good starting point to start experimenting with safeguards, but it doesn’t seem crazy.)
  - Dentosal 30 Jan 2025 2:09 UTC
    1 point
    0
    Parent
    Communication is indeed hard, and it’s certainly possible that this isn’t intentional. On the other hand, making mistakes is quite suspicious when they’re also useful for your agenda. But I agree that we probably shouldn’t read too much into it. The system card doesn’t even mention the possibility of the model acting maliciously, so maybe that’s simply not in scope for it?
    - ryan_greenblatt 30 Jan 2025 2:12 UTC
      3 points
      0
      Parent
      It doesn’t seem very useful for them IMO.
Dentosal 25 Apr 2026 20:01 UTC
6 points
0
I sometimes feel annoyed by some people. Some people sometimes get annoyed by me. This is normal. It’s hard to figure out when an intervention is worth it.

When a single member breaks norms of a social group just slightly, people rarely react in a clearly visible way. It’s just passively endured. This is often somewhat unpleasant, and leads to norm erosion. Sometimes you can let the disapproval show slightly, and hope that the hint goes through. Most of the time it does, and the problem goes away quickly, especially if the norm-breaker didn’t realize they were doing so. Some of the time it doesn’t help, and less subtle communication is required.

Publicly calling out someone for breaking a group norm is a high-stakes play. If you accuse someone but the group doesn’t agree with you, it makes you look bad. There are several ways this could happen: you’ve misunderstood the norm, you don’t have the status to call out someone like this, or perhaps the group is just really conflict-avoidant.

Even in the most clear-cut cases I often feel some resentment towards the person who does the calling out. It’s natural to be suspicious of these kinds of moves. This might be someone playing status games, or perhaps an attempt to establish a new norm unilaterally. That said, they’re also doing a public service by staking their social capital against someone making the world a worse place. I don’t want to disincentivize altruistic behavior, doubly so if I’m the one benefitting from it.

Explaining the problem in private is often a more appropriate way to solve the issue, especially when it’s unlikely to be an intentional one. That burns less social capital on both sides. However, sometimes not stating the norm out loud contributes to the issue.

One thing that helps immensely, especially in online groups where people don’t know each other too well, is having a dedicated moderator. I rarely feel resentment towards moderators taking reasonable actions, like giving a warning or banning someone from a chat. On the other hand, even the tiniest amount of power over others will make a petty dictator out of many typically reasonable people. The dictator part I like, the petty, not so much. Committees and ban votes are not the way to do things, benevolent dictators are. Writing down clear-cut rules will not help much, as reasonable people will rarely argue with reasonable decisions, and unreasonable people are going to be that way no matter what. Of course sometimes the group has specific norms that need to be communicated somehow and especially online some basic guidelines make that easier.
- Vladimir_Nesov 25 Apr 2026 21:28 UTC
  4 points
  0
  Parent
  Norms are incentives, not following them is inconvenient. But some norms are bad, feeding them with compliance makes them stronger, and doing so is the more comfortable path.
  
  It can be annoying when some people dare break norms, including bad norms. Sufficiently evolved norms would defend their feeding grounds with enforcement subroutines. So when others are observing norm-breaking behavior, they tend to express disapproval, on pain of starting to feel uncomfortable. This too is an aspect of the norm’s behavioral content.
  
  When a single member breaks norms of a social group just slightly … This is often somewhat unpleasant, and leads to norm erosion. Sometimes you can let the disapproval show slightly, and hope that the hint goes through. Most of the time it does, and the problem goes away quickly, especially if the norm-breaker didn’t realize they were doing so.
  
  So an explicit discussion can be useful to establish if a norm should be endorsed, before the practical considerations of how to either strengthen or weaken it come into play. When a norm is endemic to a wider society, weakening it within a smaller social group could be harder than starting to maintain a norm that’s less generally prevalent.
Dentosal 18 Apr 2026 11:27 UTC
6 points
0
“I am nice because it feels good to be nice. Don’t you have that?”

Not really, no. Or I mean sure, I sometimes feel so, but that’s not the reason why I’m nice.

“What is the reason, then?”

I’m nice because it’s instrumentally useful. Win-win situations are good. It doesn’t cost much for me to be nice. Even in the cases where the other person is not nice, revenge is a dish best served cold. Or not at all, not-nice people tend to be miserable enough that it constitutes an acausal punishment by itself. And in any case, the game theory math tends to show that cooperating in iterated games is usually a good default.

“Sounds like a lot of work to think through that every time?”

Not really. It’s not like I have to think through all that in every situation. I just feel good being nice. But sometimes I reflect on what happened and realize that niceness wasn’t a good policy there. Then I can decide that the feeling wasn’t adequate, and figure out how to nudge myself away from that the next time such a situation happens.

“You’re pretty detached from your feelings, huh?”

I do have a rather mechanistic perception of humans, especially myself.

“Why is that?”

What I was doing previously did not work. This works better.

“Isn’t it a bit sad and cynical to have to go through that kind of thinking?”

No! It’s extremely beautiful how the same niceness that comes to some people by instinct can also be derived from game theory. How even someone who doesn’t internally care a bit about how you feel, other than the instrumental benefits from it, can still be nice to you, not to mislead, but to trade. Sure, the wholesome appreciation is now oriented a bit more toward the dynamics rather than the agents. But I don’t see why it would be sad?

“We really have quite different kinds of minds, don’t we?”

Apparently.
- Vladimir_Nesov 18 Apr 2026 12:08 UTC
  3 points
  0
  Parent
  This seems to sketch a mind design where the locus of terminal values is in emotions, and so non-emotional justifications are naturally instrumental. But terminal justifications/values can also be non-emotional, even if there’s some overlap and path-dependence, emotional causal reasons for how the non-emotional terminal values came to be.
  - Dentosal 18 Apr 2026 22:38 UTC
    1 point
    0
    Parent
    I’m mostly just trying to point to the fact that that your first impressions on ethics of something are not always the onces you’d reflectively choose to keep. I’m also trying to explain how I do moral reflection. Something almost like the discussion above occurred to me recently, and the other person seemed to hold their view strongly.
- Canaletto 18 Apr 2026 14:00 UTC
  0 points
  0
  Parent
  If you discovered that there is some better / more correct formulation of game theory applicable to your situation, that recommends to backstab everyone in such and such specific ways, and deal great harm for small benefit for you, would you switch to acting on it?
  - Dentosal 18 Apr 2026 22:34 UTC
    1 point
    0
    Parent
    I don’t see how this is relevant. In real world, all games are iterated games, and doing things like that will hurt your reputation gravely. Also, like, of course I would, I’d be a monster not to.
Dentosal 22 Apr 2026 19:32 UTC
4 points
0
Every year, I find myself more and more like my father, in some specific ways. Many of the changes, like increased social opennes, are welcome signs of maturity. Others seem like marks of decay, and I’m not sure if resisting helps at all.

As a child I couldn’t understand coming home from work and then dozing off in front of a TV. There were so much things to do, more interesting and varied! Nowadays, it mostly seem cozy. Actively doing something is tiring and most passive forms relaxation are rather boring. This still mostly applies to me when I’m tired. But each passing year makes me a bit more tired.
- AlphaAndOmega 22 Apr 2026 20:49 UTC
  4 points
  0
  Parent
  How old are you? A 50 year old man, coming home after a hard day’s work and then finding that a nap seems appealing is a somewhat different situation, when compared to a 35 year old man doing the same thing. Age will, among other things, sap your energy, but the question of whether it’s happening to an important or surprisingly strong degree is an important one.
  - Dentosal 22 Apr 2026 20:56 UTC
    1 point
    0
    Parent
    Late twenties. My issues, fortunately, are mostly due to poor sleep and depression, and not due to inherent age-based decay. The point was more that I can now better understand why people do that, when it seemed really bizarre to me only a few years ago.
    - AlphaAndOmega 23 Apr 2026 9:02 UTC
      5 points
      0
      Parent
      As someone in their late 20s, who suffers from depression and sheer exhaustion after a day of work, yeah, you have my sympathies. Fortunately I haven’t hit the nadir, which is dozing off during movies like my dad does. I hope you manage to find treatment that works for you.
Dentosal 10 Apr 2026 14:32 UTC
4 points
−16
The primary value of Effective Altruism community comes from providing a social group where incentives on charity spending are better aligned with utilitarism. Information sharing is secondary. This also explains why people like to attend many EA events. Even though it doesn’t make much sense for actually doing good, it provides the social reward for it. This dynamic is undervalued in impact estimates, and organizing more community-building fun would be quite valuable.

(loosely held opinion) (motivated reasoning warning: I mostly care about the fun stuff anyway)
- Viliam 16 Apr 2026 13:12 UTC
  3 points
  1
  Parent
  Yes, social incentives are important. But it is also important that people donate to actually effective charities… otherwise they could get the same (maybe even better!) social rewards for locating to a local church.
  Given that social rewards are usually only very loosely correlated with how good something is, it is great to have a community that aligns them better. But it easy to goodhart these things. (For example by visiting EA events, but actually not donating… maybe with the excuse that “I will donate later… much later...”.)
Dentosal 7 Apr 2026 15:37 UTC
4 points
0
Flights with return ticket are often only 20-50% more expensive than an one-way ticket. Sometimes the return ticket is cheaper than one-way! Since profit margins in air travel are in low single digit percents, and providing the flight doesn’t get much cheaper by having the same person fly back later, something interesting must be going on. A similar thing sometimes occurs with transfers, where a flight sequence A-B-C is cheaper than just B-C. You’re not allowed to just buy that and then fly B-C, they’ll cancel your later legs if you miss the first one.

At least partially it’s a question of price discrimination. Most price sensitive customers do roundtrip flights for e.g. vacations and they can typically be quite flexible about both timing and destination. This is also part of the reason why sometimes you can get cheap flights if you book well in advance.

I’m somewhat price-sensitive and really like one-way tickets. My vacations sometimes include me just deciding one day that I’ve had enough and flying back home the same evening. It’s very liberating to not have fixed plans.

There are ways to game the system. As almost always is the case in the service industry, they’re Out to Get You and gaming the system requires Getting Ready. I’ve sometimes spent more time researching flights than actually flying. This would be pretty irrational, except that it’s a nice game that I enjoy. Sometimes I overdo it. Good habits die hard.

Concrete tips:
- Check if the return ticket is cheaper than one way. Then just don’t show up on the return flight.
- Book a one-way ticket for A-B-C and then never board B-C flight (skiplagging). Technically forbidden but likely not a problem if you don’t do it too often.
- Book a flexible roundtrip ticket and then move the return flight around as needed. Sometimes you can even change the airport you depart the return flight from. Flex tickets are often 30% or so more expensive but still way cheaper than two one-ways.
- Never book from carrier website without checking Google flights or similar first. Sometimes you save 50%.
- Fly in the middle of the week, much cheaper than weekends.
- Don’t spend 10 hours researching a 3 hour flight costing less than your hourly wage.
Dentosal 6 Apr 2026 20:08 UTC
4 points
0
Someone wrote a “contra” post for my post! I’m a real rationalist blogger now! At least until I start thinking I need to achieve some higher goal like writing something actually good. But I sure will ride this high for the next week or so.

In other news, I attended and perhaps slightly organized a small 1-day LWCW-inspired unconference in Espoo, Finland. I was, as usual, facilitating circling and hotseat. Other interesting stuff occurred too. The experience for me was quasi-trancendental, personality-wise. Or perhaps this simply continues my fake enlightenment arc that’s been having across the past week or two. In any case, this is the stuff I crave.

On an unrelated note, optimization is the process of extracting fun from a something. Or perhaps fun is the process of optimizing it out of the world. “All models are wrong, some are useful” and this one is hopefully useless and thus a great source of fun until it gets useful.
Dentosal 19 Apr 2026 12:30 UTC
3 points
1
You can just smile. It makes you feel happier. You don’t need a reason. You don’t need to feel anything that would make you smile. Simply forcing your face into having a smile does the trick.

Some days I don’t feel like smiling. I probably still could. But it’s a bit boring to be evenly happy. Feeling happy isn’t my end goal in life. Sometimes I want to get something done instead.
Dentosal 28 Apr 2026 10:10 UTC
2 points
−2
AI eval idea: metabench. Make each LLM autonomously design and build a benchmark. Then run these benchmarks for all participants and sum the results. Compare with external benchmarks too.
- David Africa 28 Apr 2026 16:50 UTC
  3 points
  0
  Parent
  The name metabench is already taken!
Dentosal 23 Apr 2026 10:26 UTC
2 points
0
In many videogames, one can compensate for the lack of mechanical skills by enduring boredom. This known by different words in different genres: farming, grinding, macroing. Often there’s still quite a bit of skill involved, but it’s not the skill that people usually associate with the game. You will not meaningfully improve on the core skills when playing like this. In multiplayer games especially, strategies built around exploiting your boredom endurance get weaker when competition gets more intense, as more and more people partake in that. This is a good thing.

The most important games where this applies are education and employment.
Dentosal 20 Apr 2026 18:53 UTC
2 points
0
The amount I enjoy discussions seems to anticorrelate with the number of participants. While I previously thought this was about each person having more space and direction power, I now think that’s it’s mostly a selection effect. This means that perhaps splitting up large groups is less useful than I thought.

Inside jokes also get better when less people know about them. The primary question is, does this extend down to one person? Or zero? I definitely tend to randomly laugh for jokes nobody present understands.
- Viliam 6 May 2026 13:59 UTC
  3 points
  1
  Parent
  I guess it depends on what kind of discussion you want to have. Whether you also want to talk, or you are happy just listening. If you are happy listening, then yeah, more people can more likely get you into a situation like “some of them are interesting, so you want to stay, but some of them are boring, so you suffer while it is their turn”.
  However, splitting the group has a certain chance of separating the most interesting from the most boring people. So the new groups will be on average as interesting (to listen to) as the original one, but you can choose the better one.
Dentosal 9 Apr 2026 11:46 UTC
2 points
1
There’s an interesting dual asymmetry in cybersecurity: The defender needs to make only a single mistake to lose, and the attackers can observer many targets waiting for such mistakes. Then again, if the defender makes no mistakes, there’s literally nothing an attacker can do.

Of course the above is not strictly true: defence-in-depth approach can sometimes make a particular mistake inconsequental. This in turn can make the defenders ignore such mistakes when they’re not exploitable.

Modern software supply chains are long and wide. A typical software might depend on thousands of libraries, and nobody can realistically audit them all. And there’s hardware, too. Processor-level vulnerabilities in particular are not realistically avoidable.

The cost of exploiting vulns is going down quickly. The cost of finding and fixing them is falling quickly too. It’s going to be really interesting to see what the new equilibrium is going to be like.
Dentosal 25 Jan 2025 21:34 UTC
2 points
0

Billionaire Larry Ellison says a vast AI-fueled surveillance system can ensure ‘citizens will be on their best behavior’

Ellison is the CTO of Oracle, one of the three companies running the Stargate Project. Even if aligning AI systems to some values can be solved, selecting those values badly can still be approximately as bad as the AI just killing everyone. Moral philosophy continues to be an open problem.
Dentosal 29 Apr 2026 20:14 UTC
1 point
1
All games are iterated games. Philisophical thought experiments tell you to ignore this all the time. That’s a mistake in modelling how humans, or agents in general, work. You’re not separable from your habits and metal processes. When told that “nobody will know which decision you made”, that certainty isn’t something that your brains can just accept. It would be quite unwise in many situations to believe yourself if you had a thought like that.

There’s no separate “philosophy mode” where decisions that actually matter do occur. There are no truly selfish agents attempting to maximize utility that’s given in some in-game points that’s supposed to behave linearly. Actual optimization targets like reputation, health, and money are all qualities with logarithmic utility in both directions, and highly interconnected with everything else.

Even when I’m trying my hardest to maximize in-game points, I still find myself not defecting on the last round of an iterated prisoners dilemma. “I’m just not that kind of a person”, I sometimes say. Of course I’m calculating for the next game. And that’s also what kind of person I am.

If anyone knows how to mitigate this, I’d be happy to hear. So far I haven’t seen anything at all that works. Even if the final points are not published I still keep comparing myself to others.
Dentosal 17 Apr 2026 13:44 UTC
1 point
0
Initial impression of Claude Opus 4.7 plus adaptive thinking: it seems much more capable of discussing nuanced points of my models. There’s finally the kind of back-and-forth dynamic that you get with another person who’s trying to get to the same page and who has their own ideas on how the world works. Or perhaps they just hit the sycophancy level that I happen to like. Worryingly I don’t seem to care much anymore.
Dentosal 24 Apr 2026 16:45 UTC
0 points
0
Time discounting is often heavily applied in utility maximization. What exactly makes a thing today better than the same thing in 100 years? It think it can be broadly categorized into:
- Probability of existing goes down over time ^[1] . Especially X-risk concerns go here.
- Value drift: your values in the future will be different. Why would the current you optimize for those instead of the current ones?
- Inflation and it’s causes: assuming continued improvement of things, everything will be easier to have or do in the future.
And of course there’s value in having the thing now, because then it starts producing value immediately. But this is separate from discounting.

The probability factor is often applied separately from time discounting. Value drift is rightly rejected by many models. And inflation can be forecasted separetely. Thus, I’m pretty sure I’ve been overdoing time discounting when attempting to actually math it out, which is admittedly rare.
1. ↩︎
  Both for you, and the opportunity you’re considering.
- Karl Krueger 24 Apr 2026 18:15 UTC
  3 points
  2
  Parent
  Aside from values drift, there’s also acquiring new information. I know more about my immediate needs than I know about what my needs will be next year. Even if my values don’t change, I’m not a perfect predictor.