Hastings comments on A Self-Dialogue on The Value Proposition of Romantic Relationships

Hastings 10 Aug 2025 17:20 UTC
6 points
0
Reading this post, the sentence that jumped out was “I’m generally reflectively stable about my own values“
Isn’t this an extremely strong claim? I have no idea how to modify a person or a machine to have reflectively stable values without paying essentially all the utils to value drift- I thought this was an open problem in alignment.
Anyways, I’d assume that typical people aren’t close to reflectively stable, particularly around love and relationships, and that any full-send attempt to become stable would have an outcome scored very poorly by their current values.
- johnswentworth 10 Aug 2025 17:39 UTC
  4 points
  0
  Parent
  This is indeed a moderately unusual thing for a human, and most people would indeed be ill-advised to try to become reflectively stable; there is a right way to do it (which should probably be the topic of some posts at some point), but most peoples’ models of their own values are far too confused to do it correctly if they just directly try. Most likely, they’d end up trying to shoehorn themselves into what-they-think-their-values-are, without actually listening to the underlying parts of themselves where their actual values come from, and then eventually end up depressed.
  That said, the version of reflective stability I’m talking about is not an open problem in alignment. The alignment version is about keeping values stable under heavy self-modification; I indeed do not know how to heavily modify my brain while keeping my values stable (and I am accordingly paranoid about drugs which fuck with the reward system). What I’m talking about in the post is merely endorsing my own values and wanting to keep them, which is a standard property of utility maximizers (though that is not to claim that I am necessarily well modeled as a utility maximizer).
  - Caleb Biddulph 10 Aug 2025 19:10 UTC
    8 points
    0
    Parent
    without actually listening to the underlying parts of themselves where their actual values come from
    Based on your posts, this is totally the kind of thing that I thought you were likely to not be doing, so the fact that you were able to generate this sentence makes me feel better