David Harket

Karma: 2

From Paperclips to Bombs: The Evolution of AI Risk Discourse on LessWrong

David Harket16 Jun 2025 5:16 UTC

3 points

0 comments24 min readLW link

David Harket 2 Jun 2025 12:59 UTC
1 point
0
in reply to: TAG’s comment on: Yudkowsky does not do alignment justice
I’d say there’s a meaningful distinction between literalism and what I’m advocating. I’m not arguing for rigid formalism or abandoning all metaphor. I’m calling for clarity, accessibility, and a prioritisation of core arguments, especially when communicating with people outside the field.
Your first critique concerns my statement that “Current utility functions do not optimise toward values in which humans are treated as morally valuable.” I agree this could have been phrased more precisely, for example: “Current AI systems are not necessarily trained to pursue goals that treat humans as morally valuable.” That’s a fair point. I was using “utility function” loosely to refer to optimisation objectives (e.g., loss functions, reward signals) not in the strict agent-theoretic sense.
But the purpose of my post isn’t to adjudicate whether the conclusions drawn by Yudkowsky and others are true or not. I fully acknowledge that the arguments rest on assumptions and that there’s room for serious debate about their validity. I should (and probably will) think more about that (after my exams).
What I am addressing in this post is a communication issue. Even if we accept the core arguments about the risks of developing powerful misaligned AI systems, such as those based on instrumental convergence and the orthogonality thesis, I believe these risks are often communicated in ways that obscure rather than clarify. This is particularly true when metaphors become the primary framing, which can confuse people who are encountering these ideas for the first time.
So to clarify: I’m not trying to resolve the epistemic status of AI risk claims. I’m making a narrower point about how they’re presented, and how this presentation may hinder public understanding or uptake. That’s the focus of the post.

David Harket 2 Jun 2025 5:51 UTC
2 points
0
in reply to: don't_wanna_be_stupid_any_more’s comment on: Yudkowsky does not do alignment justice
I agree that no matter how smart or knowledgeable someone is, it’s rare to come out of the gate with perfect communication skills. And I agree these ideas are genuinely hard to convey to non-experts.
That said, my intuition is that the risk of AGI is better communicated through distilled versions of the core arguments, like instrumental convergence and the orthogonality thesis, rather than via anthropomorphic or futuristic metaphors.
For example, I recently tried to explain AGI risk to my dad. I started with the basics: the problem of misaligned AGI, current alignment limitations, and how optimisation at scale could lead to unintended consequences. But his takeaway was essentially: “Sure, it’s risky to give powerful tools to people with different values than mine, but that’s not existential.”
I realised I hadn’t made the actual point. So I clarified: the danger isn’t just in bad actors using AI, but in AIs themselves pursuing goals that are misaligned with human values, even if no humans are malicious.
I used a simple analogy: “If you’re building a highway and there’s an anthill in the way, you don’t hate the ants—you just don’t care.” That helped. It highlighted two things: (1) we already treat beings with different moral weight indifferently, and (2) AIs might do the same, not out of malice but out of indifference shaped by their goals.
My point isn’t that analogies are always bad. But they work best when they support a precise concept, not when they replace it. That’s why I think Yudkowsky’s earlier communication sometimes backfires. It leaned too hard on the metaphor and not enough on the logic (that definitely exists. I have a high regard for his work). I’ll check out the Robinson Erhardt interview, though; if it’s a shift in tone, that’s good to hear.

David Harket 2 Jun 2025 5:23 UTC
1 point
0
in reply to: Arcturus’s comment on: Yudkowsky does not do alignment justice
Thank you for the perspective. I mostly agree with your points, but I still feel the abstract metaphors unnecessarily detract from the core arguments. While this style may help more people grasp that “this group”, represented by Yudkowsky, sees AGI as an existential threat, I believe the excessive use of metaphor weakens the clarity and force of the underlying reasoning.
To draw an analogy :-) it feels a bit like teaching Sunday school Christianity to adults: offering simplified narratives while obscuring the more nuanced structures behind the beliefs.It might serve as a useful entry point for some, but for many, it comes across as unrealistic and simply too far removed from their lived experience to be taken seriously. (For the record, I’m not a Christian.)
That said, I do see your point, and I may be mistaken in my intuition. My observation is anecdotal, and you’ve just provided a counterexample to the ones I’ve encountered.

David Harket 2 Jun 2025 5:23 UTC
1 point
0
in reply to: TAG’s comment on: Yudkowsky does not do alignment justice
You’re right, current AIs don’t have utility functions in the strict formal sense. I was using the term loosely to refer to the optimisation objectives we train them on, like loss functions or reward signals. My point is that the current objectives do not reliably reflect human moral values. Even if today’s systems aren’t agents in Yudkowsky’s sense, the concern still applies → as systems gain more general capabilities, optimisation toward poorly aligned goals can have harmful consequences.

Too Many Metaphors: A Case for Plain Talk in AI Safety

David Harket30 May 2025 19:29 UTC

0 points

8 comments2 min readLW link

David Harket

From Paper­clips to Bombs: The Evolu­tion of AI Risk Dis­course on LessWrong

Too Many Me­taphors: A Case for Plain Talk in AI Safety

From Paperclips to Bombs: The Evolution of AI Risk Discourse on LessWrong

Too Many Metaphors: A Case for Plain Talk in AI Safety