Q Home comments on A single principle related to many Alignment subproblems?

Q Home 27 May 2025 8:17 UTC
1 point
0
Thanks for clarifying! Even if I still don’t fully understand your position, I now see where you’re coming from.
No, I think it’s what humans actually pursue today when given the options. I’m not convinced that these values are static, or coherent, much less that we would in fact converge.
Then those values/motivations should be limited by the complexity of human cognition, since they’re produced by it. Isn’t that trivially true? I agree that values can be incoherent, fluid, and not converging to anything. But building Task AGI doesn’t require building an AGI which learns coherent human values. It “merely” requires an AGI which doesn’t affect human values in large and unintended ways.
No, because we don’t comprehend them, we just evaluate what we want locally using the machinery directly, and make choices based on that.
This feels like arguing over definitions. If you have an oracle for solving certain problems, this oracle can be defined as a part of your problem-solving ability. Even if it’s not transparent compared to your other problem-solving abilities. Similarly, the machinery which calculates a complicated function from sensory inputs to judgements (e.g. from Mona Lisa to “this is beautiful”) can be defined as a part of our comprehension ability. Yes, humans don’t know (1) the internals of the machinery or (2) some properties of the function it calculates — but I think you haven’t given an example of how human values depend on knowledge of 1 or 2. You gave an example of how human values depend on the maxima of the function (e.g. the desire to find the most delicious food), but that function having maxima is not an unknown property, it’s a trivial property (some foods are worse than others, therefore some foods have the best taste).
That’s a very big “if”! And simplicity priors are made questionable, if not refuted, by the fact that we haven’t gotten any convergence about human values despite millennia of philosophy trying to build such an explanation.
I agree that ambitious value learning is a big “if”. But Task AGI doesn’t require it.