Joe Rogero comments on LLMs are badly misaligned

Joe Rogero 6 Oct 2025 22:41 UTC
4 points
0
Friendly and unfriendly attractors might exist, but that doesn’t make them equally likely. The first seems much more likely than the second. I have in mind a mental image of a galaxy of value-stars, each with their own metaphorical gravity well. Somewhere in that galaxy is a star or a handful of stars labeled “cares about human wellbeing” or similar. Almost every other star is lethal. Landing on a safe star, and not getting snagged by any other gravity wells, requires a very precise trajectory. The odds of landing it by accident are astronomically low.
(Absent caring, I don’t think “granting us rights” is a particularly likely outcome; AIs far more powerful than humans would have no good reason to.)
I agree that an AI being too dumb to recognize when it’s causing harm (vs e.g. co-writing fiction) screens off many inferences about its intent. I...would not describe any such interaction, with human or AI, as “revealing its CEV.” I’d say current interactions seem to rule out the hypothesis that LLMs are already robustly orbiting the correct metaphorical star. They don’t say much about which star or stars they are orbiting.