David Althaus

Karma: 1,775

David Althaus 16 Feb 2026 21:37 UTC
8 points
3
in reply to: cousin_it’s comment on: Long-term risks from ideological fanaticism
Hm, doesn’t the power imbalance thesis actually get the sign wrong? When many actors have comparable power, costly wars become more likely. When one agent has overwhelming dominance, costly conflict tends to decrease, as the weaker side either submits or gets crushed quickly. The unipolar period from ~1990–2015 is a strong example: the US had the most extreme power advantage in modern history, yet it was one of the most peaceful periods for interstate war and mass atrocities.^[1]

What’s most important, I think, are the values of whoever holds the power. Power imbalance + liberal democratic norms = Pax Americana. Power imbalance + fanatical ideology = genocide. And with ASI, the power imbalance might be near-infinite and the crucial question will be which values the ASIs or their principals have. That’s why fanaticism (and malevolence) matter.

On the elephant/rider point: I totally agree that “conflicts involving ideology should [not] always be blamed on ideology”, certainly not exclusively. We explicitly acknowledge this:
Of course, no single factor fully explains any historical atrocity. In addition to ideological fanaticism, other crucial causes and risk factors include political and economic instability (e.g., Weimar Germany), power-seeking and competition between individuals and groups (present in essentially all atrocities), inequality and exploitation (e.g., in Congo Free State), [...].
But it seems clear that ideological fanaticism at least contributed to many of the worst atrocities in recent history, and some atrocities like the Holocaust, the Great Purge, the Cultural Revolution, plausibly would not have happened if not for ideological fanaticism.
1. ^
  Of course, there were some wars during this period, such as in Iraq and Afghanistan, and the US certainly deserves significant blame for how these were conducted, particularly in Iraq. But casualty figures were orders of magnitude lower than the great power conflicts and ideologically-driven atrocities of the 20th century. And notably, these wars were partly responses to 9/11—itself a product of ideological fanaticism—and involved regimes like Saddam Hussein’s dictatorship and the Taliban.

David Althaus 14 Feb 2026 20:19 UTC
12 points
1
in reply to: cousin_it’s comment on: Long-term risks from ideological fanaticism
From a footnote in Appendix B:
Interestingly, Genghish Khan may have been (partly) motivated by religious reasons, according to scholars like Biran (2007) and Wikipedia: “Genghis came to believe the supreme deity Tengri had ordained a great destiny for him. Initially, the bounds of this ambition were limited only to Mongolia, but as success followed success and the reach of the Mongol nation expanded, he and his followers came to believe he was embodied with suu (lit. ″divine grace″). Believing that he had an intimate connection with Heaven, anyone who did not recognise his right to world power was treated as an enemy.”

“Congratulations, you’ve prevented the next Hitler...” Seems pretty cool to me. :) And the anti-fanaticism wand might have prevented (or at least ameliorated) 8 out of the 10 worst atrocities in recent history, so that’s a decent wand.

In any case, I think the very worst future risks come from ideological fanatics (and malevolent actors) rather than “pure conquerors”. Why would the latter be intrinsically motivated to inflict extreme, eternal suffering on anyone? (See the section on fanatical retributivism.) In contrast to fanatics, pure conquerors would also in principle be open to reflection and preference idealization, and would be open to moral trade and compromise, and generally seem to pose less of a threat to long-reflection-style proposals.

Long-term risks from ideological fanaticism

David Althaus, Jamie_Harris, Vanessa Sarre, Clare and _will_

12 Feb 2026 23:26 UTC

99 points

12 comments84 min readLW link

David Althaus 9 Feb 2025 15:22 UTC
2 points
0
in reply to: John Peng`’s comment on: What is malevolence? On the nature, measurement, and distribution of dark traits
So I guess the conundrum is, are bad people nessescary to do good things?
Hm, I don’t think so. What about Lincoln, JFK, Roosevelt, Marcus Aurelius, Adenauer, etc.?

David Althaus 27 Oct 2024 8:42 UTC
7 points
3
in reply to: cousin_it’s comment on: What is malevolence? On the nature, measurement, and distribution of dark traits
Thanks, I mostly agree.

But even in colonialism, individual traits played a role. For example, compare King Leopold II’s rule over the Congo Free State vs. other colonial regimes.

While all colonialism was exploitative, under Leopold’s personal rule the Congo saw extraordinarily brutal policies, e.g., his rubber quota system led soldiers to torture and cut off the hands of workers, including children, who failed to meet quotas. Under his rule,1.5-15 million Congolese people died—the total population was only around 15 to 20 million. The brutality was so extreme that it caused public outrage which led other colonial powers to intervene until the Belgian government took control over the Congo Free State from Leopold.

Compare this to, say, British colonial administration during certain periods which, while still overall morally reprehensible, saw much less barbaric policies under some administrators who showed basic compassion for indigenous people. For instance, Governor William Bentinck in India abolished practices like sati (widows burning themselves alive) and implemented other humanitarian reforms.

One can easily find other examples (e.g. sadistic slave owners vs. more compassionate slave owners).

In conclusion, I totally agree that power imbalances enabled systemic exploitation regardless of individual temperament. But individual traits significantly affected how much suffering and death that exploitation created in practice.^[1]
1. ^
  Also, slavery and colonialism were ultimately abolished (in the Western world). My guess is that those who advocated for these reforms were, on average, more compassionate and less malevolent than those who tried to preserve these practices. Of course, the reformers were also heavily influenced by great ideas like the Enlightenment / classic liberalism.

David Althaus 24 Oct 2024 9:20 UTC
2 points
0
in reply to: David Gross’s comment on: What is malevolence? On the nature, measurement, and distribution of dark traits
Thanks, good point! I suppose it’s a balancing act and depends on the specifics in question and the amount of shame we dole out. My hunch would be that a combination of empathy and shame (“carrot and stick”) may be best.

David Althaus 24 Oct 2024 9:18 UTC
8 points
3
in reply to: cousin_it’s comment on: What is malevolence? On the nature, measurement, and distribution of dark traits
I agree that the problem of “evil” is multifactorial with individual personality traits being only one of several relevant factors, with others like “evil/fanatical ideologies” or misaligned incentives/organizations plausibly being overall more important. Still, I think that ignoring the individual character dimension is perilous.
It seems to me that most people become much more evil when they aren’t punished for it. [...] So if we teach AIs to be as “aligned” as the average person, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history—which is to say, not very well.
Makes sense. On average, power corrupts / people become more malevolent if no one holds them accountable—but again, there seem to exist interindividual differences with some people behaving much better than others even when having enormous power (cf. this section).

What is malevolence? On the nature, measurement, and distribution of dark traits

David Althaus, Chi Nguyen and Clare

23 Oct 2024 8:41 UTC

94 points

22 comments52 min readLW link

David Althaus 1 Jul 2024 9:46 UTC
2 points
0
in reply to: Presley Graham’s comment on: David Althaus’s Shortform
Thanks. Sorry for not being more clear, I pasted a screenshot (I’m reading the book on Kindle and can’t copy-paste) and asked Claude to transcribe the image into written text.

Again, this is not the first time this happened. Claude refused to help me translate a passage from the Quran (I wanted to check which of two translations was more accurate), refused to transcribe other parts of the above-mentioned Kindle book, and refused to provide me with details about what happened at Tuol Sleng prison. I eventually could persuade Claude in all of these cases but I grew tired of wasting my time and found it frustrating to deal with Claude’s obnoxious holier-than-thou attitude.

David Althaus 30 Jun 2024 11:52 UTC
9 points
0
in reply to: Nathan Helm-Burger’s comment on: David Althaus’s Shortform
I downvoted Claude’s response (i.e., clicked the thumbs-down symbol below the response) and selected “overactive refusal” as the reason. I didn’t get in contact with Anthropic directly.

David Althaus 28 Jun 2024 12:22 UTC
31 points
13
on: David Althaus’s Shortform
I had to cancel my Claude subscription (and signed up for ChatGPT) because Claude (3.5 Sonnet) constantly refuses to transcribe or engage with texts that discuss extremism or violence, even if it’s clear that this is done in order to better understand and prevent extremist violence.

Example text Claude refuses to transcribe below. For context, the text discusses the motivations and beliefs of Yigal Amir who assassinated the Israeli Prime Minister in 1995.
God gave the land of Israel to the Jewish People,” he explained, and he, Yigal Amir, was making certain that God’s promises, which he believed in with all his heart and to which he had committed his life, were not to be denied. He could not fathom, he declared, how a Jewish state would dare renege on the Jewish birthright, and he could not passively stand by as this terrifying religious tragedy took place. In Amir’s thinking, his action was not a personal matter or an act of passion but a solution, albeit an extreme one, to a religious and psychological trauma brought about by the actions of the Rabin government. Though aware of the seriousness of his action, Amir explained that his fervent faith encouraged and empowered him to commit this act of murder. He told his interrogators, “Without believing in God and an eternal world to come, I would never have had the power to do this.” Rabin deserved to die because he was facilitating, in Amir’s and other militants’ view, the possible mass murder of Jews by consenting to the Oslo peace agreements. This made Rabin, according to halacha, or Jewish law, a rodef, someone about to kill an innocent person and whom a bystander may therefore execute without a trial. Rabin was also a moser, a Jew who willingly betrays his brethren, and guilty of treason for cooperating with Yasser Arafat and the Palestinian Authority in surrendering rights to the Holy Land. Jewish jurisprudence considers the actions of the rodef and moser among the most pernicious crimes; persons guilty of such acts are to be killed at the first opportunity.
This type of refusal has happened numerous times. Claude doesn’t change its behavior when I provide arguments (unless I spend a lot of time on this).

I haven’t used ChatGPT as much but it so far has never refused.

I hope Anthropic changes Claude so I can continue using it again; I certainly don’t like the idea of supporting OpenAI.
What links here?
- AI #71: Farewell to Chevron by Zvi (4 Jul 2024 13:40 UTC; 53 points)

David Althaus’s Shortform

David Althaus28 Jun 2024 12:22 UTC

6 points

6 comments1 min readLW link

David Althaus 29 Sep 2023 12:19 UTC
12 points
3
on: Making AIs less likely to be spiteful
Really great post!

It’s unclear how much human psychology can inform our understanding of AI motivations and relevant interventions but it does seem relevant that spitefulness correlates highly (Moshagen et al., 2018, Table 8, N 1,261) with several other “dark traits”, especially psychopathy (r = .74), sadism (r = .59), and Machiavellianism (r = .59).

(Moshagen et al. (2018) therefore suggest that “[...] dark traits are specific manifestations of a general, basic dispositional behavioral tendency [...] to maximize one’s individual utility— disregarding, accepting, or malevolently provoking disutility for others—, accompanied by beliefs that serve as justifications.”)

Plausibly there are (for instance, evolutionary) reasons for why these traits correlate so strongly with each other, and perhaps better understanding them could inform interventions to reduce spite and other dark traits (cf. Lukas’ comment).

If this is correct, we might suspect that AIs that will exhibit spiteful preferences/behavior will also tend to exhibit other dark traits (and vice versa!), which may be action guiding. (For example, interventions that make AIs less likely to be psychopathic, sadistic, Machiavellian, etc. would also make them less spiteful, at least in expectation.)

David Althaus 16 Feb 2023 15:46 UTC
8 points
4
on: Please don’t throw your mind away
Great post, thanks for writing!

Most of this matches my experience pretty well. I think I had my best ideas during phases (others seem to agree) when I was unusually low on guilt- and obligation-driven EA/impact-focused motivation and was just playfully exploring ideas for fun and out of curiosity.

One problem with letting your research/ideas be guided by impact-focused thinking is that you basically train your mind to immediately ask yourself after entertaining a certain idea for a few seconds “well, is that actually impactful?”. And basically all of the time, the answer is “well, probably not”. This makes you disinclined to further explore the neighboring idea space.

However, even really useful ideas / research angles start out being somewhat unpromising and full of hurdles and problems and need a lot of refinement. If you allow yourself to just explore idea space for fun, you might overcome these problems and stumble on something truly promising. But if you had been in an “obsessing about maximizing impact” mindset you would have given up too soon because, in this mindset, spending hours or even days without having any impact feels too terrible to keep going.

David Althaus 13 Jan 2023 11:03 UTC
2 points
0
in reply to: DanielFilan’s comment on: On Blogging and Podcasting
Lol, thanks. :)

David Althaus 12 Jan 2023 16:58 UTC
2 points
0
on: On Blogging and Podcasting
Thanks for this post, I thought this was useful.
I needed a writing buddy to pick up the momentum to actually write it
I’d be interested in knowing more how this worked in practice (no worries if you don’t feel like elaborating/don’t have the time!).

David Althaus 6 Jan 2023 14:29 UTC
13 points
8
in reply to: habryka’s comment on: Let’s think about slowing down AI
I think mostly I expect us to continue to overestimate the sanity and integrity of most of the world, then get fucked over like we got fucked over by OpenAI or FTX. I think there are ways to relating to the rest of the world that would be much better, but a naive update in the direction of “just trust other people more” would likely make things worse.

[...]
Again, I think the question you are raising is crucial, and I have giant warning flags about a bunch of the things that are going on (the foremost one is that it sure really is a time to reflect on your relation to the world when a very prominent member of your community just stole 8 billion dollars of innocent people’s money and committed the largest fraud since Enron), [...]
I very much agree with the sentiment of the second paragraph.

Regarding the first paragraph, my own take is that (many) EAs and rationalists might be wise to trust themselves and their allies less.^[1]

The main update of the FTX fiasco (and other events I’ll describe later) I’d make is that perhaps many/most EAs and rationalists aren’t very good at character judgment. They probably trust other EAs and rationalists too readily because they are part of the same tribe and automatically assume that agreeing with noble ideas in the abstract translates to noble behavior in practice.

(To clarify, you personally seem to be good at character judgment, so this message is not directed at you. (I base that mostly on your comments I read about the SBF situation, big kudos for that, btw!)

It seems like a non-trivial fraction of people that joined the EA and rationalist community very early turned out to be of questionable character, and this wasn’t noticed for years by large parts of the community. I have in mind people like Anissimov, Helm, Dill, SBF, Geoff Anders, arguably Vassar—these are just the known ones. Most of them were not just part of the movement, they were allowed to occupy highly influential positions. I don’t know what the base rate for such people is in other movements—it’s plausibly even higher—but as a whole our movements don’t seem to be fantastic at spotting sketchy people quickly. (FWIW, my personal experiences with a sketchy, early EA (not on the above list) inspired this post.)

My own takeaway is that perhaps EAs and rationalists aren’t that much better in terms of integrity than the outside world and—given that we probably have to coordinate with some people to get anything done—I’m now more willing to coordinate with “outsiders” than I was, say, eight years ago.
1. ^
  Though I would be hesitant to spread this message; the kinds of people who should trust themselves and their character judgment less are more likely the ones who will not take this message to heart, and vice versa.

David Althaus 19 Sep 2022 10:22 UTC
8 points
0
in reply to: Metaphysicist’s comment on: Many therapy schools work with inner multiplicity (not just IFS)
This is mentioned in the introduction.

I’m biased, of course, but it seems fine to write a post like this. (Similarly, it’s fine for CFAR staff members to write a post about CFAR techniques. In fact, I prefer if precisely these people write such posts because they have the relevant expertise.)

Would you like us to add a more prominent disclaimer somewhere? (We worried that this might look like advertising.)

David Althaus 18 Sep 2022 10:37 UTC
6 points
2
in reply to: Shmi’s comment on: Many therapy schools work with inner multiplicity (not just IFS)
A quick look through https://www.goodtherapy.org/learn-about-therapy/types/compassion-focused-therapy gives an impression of yet another mix of CBT, DBT and ACT, nothing revolutionary or especially new, though maybe I missed something.
In my experience, ~nothing in this area is downright revolutionary. Most therapies are heavily influenced by previous concepts and techniques. (Personally, I’d still say that CFT brings something new to the table.)

I guess what matters if it works for you or not.
Is this assertion borne out by twin studies? Or is believing it a test for CFT suitability only?
To some extent. Most human traits have a genetic component, including (Big-Five) personality traits, depressive tendencies, anxiety disorders, conduct disorders, personality disorders, and so on. (e.g., Polderman et al., 2015). This is also true for (self-)destructive tendencies like malevolent personality traits (citing my own summary of some studies here because I’m lazy, sorry).

(Also agree with Kaj’s warning about misinterpreting heritability.)

More generally speaking, I’d say this belief is borne out of understanding evolutionary psychology/history. Basically, all of our motivations and fears have an evolutionary basis. We fear death, because the ancestors who didn’t were eaten by lions. We fear being ostracized and care about being respected because in the Environment of Evolutionary Adaptedness our survival and reproductive success was dependent on our social status. Therefore, it’s to be expected that most humans, at some point or another, worry about death or health problems or feel emotions like jealousy or envy. They don’t have to be rooted in some trauma or early life experience—though they are usually exacerbated by them. In most cases, it’s not realistic to eliminate such emotions entirely. This doesn’t mean that one is an “abnormal” or “defective” person that experienced irreversible harm inflicted by another human sometime in one’s development. (Just to be clear, as mentioned in the main text, no one believes that life experiences don’t matter. Of course, they matter a great deal!)

But yeah, if you are skeptical of the above, it’s a good reason to not seek a CFT therapist.

David Althaus 18 Sep 2022 9:52 UTC
6 points
0
in reply to: waveman’s comment on: Many therapy schools work with inner multiplicity (not just IFS)
From studying and using all of the above my conclusion is that IFS offers the most tractable approach to this issue of competing ‘parts’. And in many ways the most powerful.
In our experience, different people respond to different therapies. I know several people for whom, say, CFT worked better than IFS. Glad to hear that IFS worked for you!
When you read about modern therapies, they all borrow from one another in a way that did not occur say 50 years ago where there were very entrenched schools of thought.
Yes, that’s definitely the case. My sense is that many people overestimate how revolutionary various therapies are because their founders downplay how many concepts and techniques they took from other modalities. (Though this can be advantageous because the “hype” increases motivation and probably fuels various self-fulfilling prophecies.)

David Althaus

Long-term risks from ide­olog­i­cal fanaticism

What is malev­olence? On the na­ture, mea­sure­ment, and dis­tri­bu­tion of dark traits

David Althaus’s Shortform

Long-term risks from ideological fanaticism

What is malevolence? On the nature, measurement, and distribution of dark traits