No77e comments on Daniel Kokotajlo’s Shortform

No77e 13 Dec 2025 9:23 UTC
1 point
−3
Fragility of Value thesis and Orthogonality thesis both hold, for this type of agent.

...

E.g. it’s vision for a future utopia would actually be quite bad from our perspective because there’s some important value it lacks (such as diversity, or consent, or whatever)
I think we have enough evidence to say that, in practice, this turns out very easy or moot. Values tend to cluster in LLMs (good with good and bad with bad; see emergent misalignment results), so value fragility isn’t a hard problem.