One is thinking about how to build aligned intelligence in a machine, the other is thinking about how to build aligned intelligence in humans and groups of humans.
Is this true though? Teaching rationality improves capability in people but shouldn’t necessarily align them. People are not AIs, but their morality doesn’t need to converge under reflection.
And even if the argument is “people are already aligned with people”, you still are working on capabilities when dealing with people and on alignment when dealing with AIs.
Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.
Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.
I love this question. Mostly because your model seems pretty natural and clear, and yet I disagree with it.
To me it looks more like AI alignment research, in that one is often trying to align internal processes with e.g. truth-seeking, so that a person ends up doing reasoning instead of rationalization. Or, on the group level, so that people can work together to form accurate maps and build good things, instead of working to trick each other into giving control to particular parties, assigning credit or blame to particular parties, believing that a given plan will work and so allowing that plan to move forward for reasons that’re more political than epistemic, etc.
That is, humans in practice seem to me to be partly a coalition of different subprocesses that by default waste effort bamboozling one another, or pursuing “lost purposes” without propagating the updates all the way, or whatnot. Human groups even more so.
I separately sort of think that in practice, increasing a person’s ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct, although I agree this is not at all obvious, and I have not made any persuasive arguments for it and do not claim it as “public knowledge.”
Ah, I see your point now, and it makes sense. If I had to summarize it (and reword it in a way that appeals to my intuition), I’d say that the choice of seeking the truth is not just about “this helps me,” but about “this is what I want/ought to do/choose”. Not just about capabilities. I don’t think I disagree at this point, although perhaps I should think about it more.
I had the suspicion that my question would be met with something at least a bit removed inference-wise from where I was starting, since my model seemed like the most natural one, and so I expected someone who routinely thinks about this topic to have updated away from it rather than not having thought about it.
Regarding the last paragraph: I already believed your line “increasing a person’s ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct.” It didn’t seem to bear on the argument in this case because it looks like you are getting alignment for free by improving capabilities (if you reason with my previous model, otherwise it looks like your truth-alignment efforts somehow spill over to other values, which is still getting something for free due to how humans are built I’d guess).
Also… now that I think about it, what Harry was doing with Draco in HPMOR looks a lot like aligning rather than improving capabilities, and there were good spill-over effects (which were almost the whole point in that case perhaps).
Is this true though? Teaching rationality improves capability in people but shouldn’t necessarily align them. People are not AIs, but their morality doesn’t need to converge under reflection.
And even if the argument is “people are already aligned with people”, you still are working on capabilities when dealing with people and on alignment when dealing with AIs.
Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.
I love this question. Mostly because your model seems pretty natural and clear, and yet I disagree with it.
To me it looks more like AI alignment research, in that one is often trying to align internal processes with e.g. truth-seeking, so that a person ends up doing reasoning instead of rationalization. Or, on the group level, so that people can work together to form accurate maps and build good things, instead of working to trick each other into giving control to particular parties, assigning credit or blame to particular parties, believing that a given plan will work and so allowing that plan to move forward for reasons that’re more political than epistemic, etc.
That is, humans in practice seem to me to be partly a coalition of different subprocesses that by default waste effort bamboozling one another, or pursuing “lost purposes” without propagating the updates all the way, or whatnot. Human groups even more so.
I separately sort of think that in practice, increasing a person’s ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct, although I agree this is not at all obvious, and I have not made any persuasive arguments for it and do not claim it as “public knowledge.”
Ah, I see your point now, and it makes sense. If I had to summarize it (and reword it in a way that appeals to my intuition), I’d say that the choice of seeking the truth is not just about “this helps me,” but about “this is what I want/ought to do/choose”. Not just about capabilities. I don’t think I disagree at this point, although perhaps I should think about it more.
I had the suspicion that my question would be met with something at least a bit removed inference-wise from where I was starting, since my model seemed like the most natural one, and so I expected someone who routinely thinks about this topic to have updated away from it rather than not having thought about it.
Regarding the last paragraph: I already believed your line “increasing a person’s ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct.” It didn’t seem to bear on the argument in this case because it looks like you are getting alignment for free by improving capabilities (if you reason with my previous model, otherwise it looks like your truth-alignment efforts somehow spill over to other values, which is still getting something for free due to how humans are built I’d guess).
Also… now that I think about it, what Harry was doing with Draco in HPMOR looks a lot like aligning rather than improving capabilities, and there were good spill-over effects (which were almost the whole point in that case perhaps).