James Diacoumis comments on Aesthetic Preferences Can Cause Emergent Misalignment

James Diacoumis 26 Aug 2025 22:40 UTC
9 points
1
This is a super interesting result!

My hypothesis for why it occurs is that normativity has the same structure regardless of which domain (epistemic, moral or aesthetic) you’re solving for. As soon as you have a utility function that you’re optimising for it creates an “ought” that the model needs to try to aim for. Consider the following sentences:
- Epistemic: You ought to believe the General Theory of Relativity is true.
- Moral: You ought not to act in a way that causes gratuitous suffering.
- Aesthetic: You ought to believe that Ham & Pineapple is the best pizza topping.
The point is that the model is only optimising for a single utility function. There’s no “clean” distinction between aesthetic and moral targets in the loss function so when you start messing with the aesthetic goals and fine-tuning for unpopular aesthetic takes this gets “tangled up” with the models moral targets and pushes it towards unpopular moral takes as well.
What links here?
- The Other Alignment Problems: How epistemic, moral and aesthetic norms get entangled by James Diacoumis (28 Aug 2025 11:26 UTC; 3 points)
- Haiku 27 Aug 2025 10:40 UTC
  2 points
  5
  Parent
  I think there is also a local sense in which morals are just aesthetics. The long-term consequences of moral choices mean that evolution plays a big part in determining morality, but divorced from the constraints of evolution and any sense of long-term planning, by what can we objectively compare moral systems other than their popularity? Orthogonality and all that. Are LLMs just modeling that accurately?