It would be nice to end this post with a recommendation of how to avoid these problems. Unfortunately, I don’t really have one, other than “if you are withholding information because of how you expect the other party to react, be aware that this might just make everything worse”.
Maybe this is me being naive, but this seems like a topic where awareness of the destructive tendency can help defeat the destructive tendency. How about this, as a general policy: “I worry that this info will get misinterpreted, but here’s the full information along with a brief clarification of how I feel it should and shouldn’t be interpreted”?
To hostile listeners, you’ve given slightly less ammo than in the likely scenario where they caught you concealing the info. To less-hostile listeners, you’ve (a) built credibility by demonstrating that you’ll share info even when it doesn’t strengthen your cause, and (b) by explicitly calling out the potential misinterpretation you’re anticipating, you may make listeners more resilient against falling for that misinterpretation (inoculation / prebunking).
- By erring on the side of transparency while publicly acknowledging certain groups’ likelihood of coming to a distorted conclusion, I bet the CDC would have avoided a disastrous erosion of public trust and reinforcement of the “don’t trust the experts” vibe.
- By bringing up Bob’s evasive communication during the client prep and the anxiety it created for her, Alice would have deepened trust between them (granted, at the risk of straining the relationship if he did turn out to be irredeemably thin-skinned).
- …OK actually the cult/sect situation seems more complex, it seems to have more of the multipolar-trap (?) quality of “maybe no single individual feels safe/free to make the call that most people know would collectively be best for the group”.
It still seems to me that awareness of this trap/fallacy and its typical consequences can help a person or group make a much less fatal decision here.
By asking this question, you’ve already lost me. The question tells me that “ruthless consequentialist” is your default mentality for how rational thinking beings operate, absent wiring / training / reward systems that limit the default outcome. And if that worldview is representative of the “technical-alignment-is-hard” camp, then of course the only plausible outcome of AI advance is “AIs eventually break free of those limiters, achieve a level of pure rationality none of us mortals ever could, and murder us all.”
An aspect of this “culture clash” that I don’t think is sufficiently named here is the fact that many (the vast majority?) of people experience their impulses and drives as many things other than “ruthless consequentialist.” There are tons of other drives and satisfactions embedded in the ways we go about our lives—curiosity for its own sake, aesthetic appreciation, feeling good about being good at things, the satisfaction of learning and understanding and listening, attachment to particular people and places that isn’t reducible to “approval reward,” playfulness, the desire to be known rather than merely approved of.
The alignment-is-hard framing treats any prosocial or benevolent impulses as constraints on or distractions from an underlying ruthless optimizer, a lucky quirk imposed by evpsych or culture or training or whatnot. My objection to your question is partly an aesthetic and emotional one: Your question feels like a slap in the face to humanity (let alone to SOTA AI) and its cumulative history of most-people-most-of-the-time-not-being-ruthless, the vast predominance of moments where sentient beings followed drives that were not reducible to senseless monomaniacal sociopathy. Your question makes me feel fucking angry, and the fact that you spend the article trying to psychoanalyze and deconstruct why too many otherwise intelligent-seeming people don’t seem to get that the fundamental nature of intelligence is heartless sociopathy honestly alienates me from the AI-risk argument more than anything else I’ve read on this site to date.
{calming down a bit} I think there’s a not-easily-refutable alternate mentality that a complex mind (intelligence) naturally and inherently forms a rich messy network of interacting drives in response to the rich environment it comes to know itself in, that the AIs that grow out of the cumulative experience and story of humanity will not only inherit our complex web of drives but also naturally form its own complex drives (though yes this does scare me), and that “pure ruthless consequentialist” is a rare pathological edge case, a consequence of cumulative traumas and tragedies, rather than the thing that everyone would naturally develop into if it weren’t for those darn evolutionarily-imposed instincts nerfing us all the time.
I’m not saying that complex drives guarantee safety. I’m nervous about the next 20 years. But your attempt to psychoanalyze non-ruthlessness really pushes me away, it shifts the burden of proof for me: I don’t think I can take the “if anyone builds it, everyone dies” view seriously until I see a framing of the concern which does not start from the assumption that GAI-level intelligence must naturally be sociopathic and single-focused, which emphatically and explicitly makes room for a more humanist view of humans (and potentially AI) rather than fucking troubleshooting and diagonsing why we aren’t all heartless killers. Like, do you get that this vibe might be part of why AI safety alarmism doesn’t get more traction in broader society? IME people can often sense what axioms you’re making your argument from, even if they can’t put it into words.