Systems programmer, security researcher and incentive design enthusiast.
Dentosal
AI eval idea: metabench. Make each LLM autonomously design and build a benchmark. Then run these benchmarks for all participants and sum the results. Compare with external benchmarks too.
When I’m having trouble on getting started with something unpleasant, this is the technique I use: simply count down 3, 2, 1, and then do the thing. There’s also a specific feeling just before the countdown, but it’s hard to describe.
This works every single time. Why? Because a tool like this is too useful to lose over some minor everyday issue. This means I don’t attempt to use it when it might not work. It’s a way to split an unpleasant task into two parts: committing to doing it, and then actually doing it.
It’s a limited tool. It doesn’t work for long tasks, or at least I haven’t dared to try. If the task description is ambigious enough I might be able to worm my way out of it. If the task can fail, an honest attempt suffices to dispel the pledge.
But most of the time it just works.
Thanks for the suggestion, I really liked it. A good piece of short-form philisophy scifi, one of my favourite genres. I feel like it takes a complementary angle to The Whispering Earring, focusing more on identity and less on agency. I wrote some of my reflections here.
I sometimes feel annoyed by some people. Some people sometimes get annoyed by me. This is normal. It’s hard to figure out when an intervention is worth it.
When a single member breaks norms of a social group just slightly, people rarely react in a clearly visible way. It’s just passively endured. This is often somewhat unpleasant, and leads to norm erosion. Sometimes you can let the disapproval show slightly, and hope that the hint goes through. Most of the time it does, and the problem goes away quickly, especially if the norm-breaker didn’t realize they were doing so. Some of the time it doesn’t help, and less subtle communication is required.
Publicly calling out someone for breaking a group norm is a high-stakes play. If you accuse someone but the group doesn’t agree with you, it makes you look bad. There are several ways this could happen: you’ve misunderstood the norm, you don’t have the status to call out someone like this, or perhaps the group is just really conflict-avoidant.
Even in the most clear-cut cases I often feel some resentment towards the person who does the calling out. It’s natural to be suspicious of these kinds of moves. This might be someone playing status games, or perhaps an attempt to establish a new norm unilaterally. That said, they’re also doing a public service by staking their social capital against someone making the world a worse place. I don’t want to disincentivize altruistic behavior, doubly so if I’m the one benefitting from it.
Explaining the problem in private is often a more appropriate way to solve the issue, especially when it’s unlikely to be an intentional one. That burns less social capital on both sides. However, sometimes not stating the norm out loud contributes to the issue.
One thing that helps immensely, especially in online groups where people don’t know each other too well, is having a dedicated moderator. I rarely feel resentment towards moderators taking reasonable actions, like giving a warning or banning someone from a chat. On the other hand, even the tiniest amount of power over others will make a petty dictator out of many typically reasonable people. The dictator part I like, the petty, not so much. Committees and ban votes are not the way to do things, benevolent dictators are. Writing down clear-cut rules will not help much, as reasonable people will rarely argue with reasonable decisions, and unreasonable people are going to be that way no matter what. Of course sometimes the group has specific norms that need to be communicated somehow and especially online some basic guidelines make that easier.
Time discounting is often heavily applied in utility maximization. What exactly makes a thing today better than the same thing in 100 years? It think it can be broadly categorized into:
Probability of existing goes down over time [1] . Especially X-risk concerns go here.
Value drift: your values in the future will be different. Why would the current you optimize for those instead of the current ones?
Inflation and it’s causes: assuming continued improvement of things, everything will be easier to have or do in the future.
And of course there’s value in having the thing now, because then it starts producing value immediately. But this is separate from discounting.
The probability factor is often applied separately from time discounting. Value drift is rightly rejected by many models. And inflation can be forecasted separetely. Thus, I’m pretty sure I’ve been overdoing time discounting when attempting to actually math it out, which is admittedly rare.
- ↩︎
Both for you, and the opportunity you’re considering.
In many videogames, one can compensate for the lack of mechanical skills by enduring boredom. This known by different words in different genres: farming, grinding, macroing. Often there’s still quite a bit of skill involved, but it’s not the skill that people usually associate with the game. You will not meaningfully improve on the core skills when playing like this. In multiplayer games especially, strategies built around exploiting your boredom endurance get weaker when competition gets more intense, as more and more people partake in that. This is a good thing.
The most important games where this applies are education and employment.
Late twenties. My issues, fortunately, are mostly due to poor sleep and depression, and not due to inherent age-based decay. The point was more that I can now better understand why people do that, when it seemed really bizarre to me only a few years ago.
Every year, I find myself more and more like my father, in some specific ways. Many of the changes, like increased social opennes, are welcome signs of maturity. Others seem like marks of decay, and I’m not sure if resisting helps at all.
As a child I couldn’t understand coming home from work and then dozing off in front of a TV. There were so much things to do, more interesting and varied! Nowadays, it mostly seem cozy. Actively doing something is tiring and most passive forms relaxation are rather boring. This still mostly applies to me when I’m tired. But each passing year makes me a bit more tired.
The amount I enjoy discussions seems to anticorrelate with the number of participants. While I previously thought this was about each person having more space and direction power, I now think that’s it’s mostly a selection effect. This means that perhaps splitting up large groups is less useful than I thought.
Inside jokes also get better when less people know about them. The primary question is, does this extend down to one person? Or zero? I definitely tend to randomly laugh for jokes nobody present understands.
You can just smile. It makes you feel happier. You don’t need a reason. You don’t need to feel anything that would make you smile. Simply forcing your face into having a smile does the trick.
Some days I don’t feel like smiling. I probably still could. But it’s a bit boring to be evenly happy. Feeling happy isn’t my end goal in life. Sometimes I want to get something done instead.
I’m mostly just trying to point to the fact that that your first impressions on ethics of something are not always the onces you’d reflectively choose to keep. I’m also trying to explain how I do moral reflection. Something almost like the discussion above occurred to me recently, and the other person seemed to hold their view strongly.
I don’t see how this is relevant. In real world, all games are iterated games, and doing things like that will hurt your reputation gravely. Also, like, of course I would, I’d be a monster not to.
“I am nice because it feels good to be nice. Don’t you have that?”
Not really, no. Or I mean sure, I sometimes feel so, but that’s not the reason why I’m nice.
“What is the reason, then?”
I’m nice because it’s instrumentally useful. Win-win situations are good. It doesn’t cost much for me to be nice. Even in the cases where the other person is not nice, revenge is a dish best served cold. Or not at all, not-nice people tend to be miserable enough that it constitutes an acausal punishment by itself. And in any case, the game theory math tends to show that cooperating in iterated games is usually a good default.
“Sounds like a lot of work to think through that every time?”
Not really. It’s not like I have to think through all that in every situation. I just feel good being nice. But sometimes I reflect on what happened and realize that niceness wasn’t a good policy there. Then I can decide that the feeling wasn’t adequate, and figure out how to nudge myself away from that the next time such a situation happens.
“You’re pretty detached from your feelings, huh?”
I do have a rather mechanistic perception of humans, especially myself.
“Why is that?”
What I was doing previously did not work. This works better.
“Isn’t it a bit sad and cynical to have to go through that kind of thinking?”
No! It’s extremely beautiful how the same niceness that comes to some people by instinct can also be derived from game theory. How even someone who doesn’t internally care a bit about how you feel, other than the instrumental benefits from it, can still be nice to you, not to mislead, but to trade. Sure, the wholesome appreciation is now oriented a bit more toward the dynamics rather than the agents. But I don’t see why it would be sad?
“We really have quite different kinds of minds, don’t we?”
Apparently.
Initial impression of Claude Opus 4.7 plus adaptive thinking: it seems much more capable of discussing nuanced points of my models. There’s finally the kind of back-and-forth dynamic that you get with another person who’s trying to get to the same page and who has their own ideas on how the world works. Or perhaps they just hit the sycophancy level that I happen to like. Worryingly I don’t seem to care much anymore.
Oops, seems like I was wrong here. The dynamic doesn’t extend to insurance, at least in general. Good point.
Then, my objection would be that the equvalence itself doesn’t hold. Insurance is, supposedly, priced based on the actual risk. It also doesn’t contain a negative feedback loop; people don’t get sick more often because they pay for the insurance [1] . This is not the case for a “no man left behind” policy. The rescue operations cost, on expectation, more lives than they save. Since no money is involved, the policy cannot price in the risk. Of course this policy isn’t absolute and in some cases isn’t followed, when the ratio looks too bad. Not that a medical insurance company would burn hundred millions for a single patient either.
- ↩︎
Not counting counterfactually using the money for preventative measures.
- ↩︎
That’s indeed part of the idea. It’s just rather hard to formalize reasonability of terms.
In problems like Parfit’s hitchhiker, I’d like to be the kind of agent who pays the driver. But only if the driver asks for a reasonable sum of money. Doing otherwise would create a strong adversarial pressure to ask me for everything I have.
In general, I’d like to be the kind of person who keeps promises they make. But if you make me swear, at gunpoint, that I’ll murder some innocent people later, I’ll say whatever gets me out of the situation alive, and then break the promise.
And don’t get me started on tens of pages of terms and conditions for online services. I just click “agree” and do not care a bit about what those documents say. I’m just going to do the reasonable thing and if that’s not good enough, too bad. While one could say that I have made an oath to follow them, I simply don’t think that’s appropriate for routine activities.
This gets more complicated with, say, NDAs. I generally try to follow both the letter and spirit of such contracts. But sometimes they’re just written in an unreasonable way and there’s too much money on the table to ignore it. In those cases, I work in an adversarial mode where I follow the letter and just the letter, as a far as it can be court-enforced, and not much more. This rarely occurs outside cases where there’s a huge power differential anyway. If I’m given the opportunity to actually negotiate the contents, it’s pretty likely that there’s not much need for the adversarial mode.
I’m trying to be rather meta-honest about this. With legible amounts of illegibility. I’d like to formalize this better.
That’s a more complicated case, especially assuming that the insurance is opt-in. In an ideal world that would mean that the inefficiency also acts as a tax on irrationality.
Your example of serviceman pay isn’t actually that far-fetched. For instance, Finland pays conscripts barely anything, between 6 and 14 euros per day depeding on rank [1] . Since the alternative is prison time [2] , there’s indeed not much reason to pay. This has surprisingly low impact on morale.
- ↩︎
Slightly more complicated: https://intti.fi/paivaraha-ja-varusraha
- ↩︎
In practice, something like house arrest with an ankle monitor
- ↩︎
The primary value of Effective Altruism community comes from providing a social group where incentives on charity spending are better aligned with utilitarism. Information sharing is secondary. This also explains why people like to attend many EA events. Even though it doesn’t make much sense for actually doing good, it provides the social reward for it. This dynamic is undervalued in impact estimates, and organizing more community-building fun would be quite valuable.
(loosely held opinion) (motivated reasoning warning: I mostly care about the fun stuff anyway)
All games are iterated games. Philisophical thought experiments tell you to ignore this all the time. That’s a mistake in modelling how humans, or agents in general, work. You’re not separable from your habits and metal processes. When told that “nobody will know which decision you made”, that certainty isn’t something that your brains can just accept. It would be quite unwise in many situations to believe yourself if you had a thought like that.
There’s no separate “philosophy mode” where decisions that actually matter do occur. There are no truly selfish agents attempting to maximize utility that’s given in some in-game points that’s supposed to behave linearly. Actual optimization targets like reputation, health, and money are all qualities with logarithmic utility in both directions, and highly interconnected with everything else.
Even when I’m trying my hardest to maximize in-game points, I still find myself not defecting on the last round of an iterated prisoners dilemma. “I’m just not that kind of a person”, I sometimes say. Of course I’m calculating for the next game. And that’s also what kind of person I am.
If anyone knows how to mitigate this, I’d be happy to hear. So far I haven’t seen anything at all that works. Even if the final points are not published I still keep comparing myself to others.