This makes sense as a strategic choice, and thank you for explaining it clearly, but I think it’s bad for discussion norms because readers won’t automatically understand your intent as you’ve explained it here. Would it work to substitute the term “alignment target” or “developer’s goal”?
I agree to a significant extent, and do usually prefer “target” or “goal” in contexts where that makes sense.
It doesn’t always make sense, though. Often I want to talk about how humans’ value-systems work in general, without talking about AI or alignment. Intentional ambiguity still applies there—e.g. I usually don’t want to talk about “preferences” in such situations (because that has specific established mathematical meanings which are not what I mean), or “utility”, or “morality”, or “goodness”, or “goals”, or [...]. The thing I want to talk about is different from any of those, and I don’t necessarily have a good legible explanation of what the thing is I want to talk about, either in my head or in writing. So it’s useful to use an intentionally-underdefined term as a distinct placeholder for this thing, with the expectation that over time I will better understand what it is that I instinctively am gesturing at with the term.
This makes sense as a strategic choice, and thank you for explaining it clearly, but I think it’s bad for discussion norms because readers won’t automatically understand your intent as you’ve explained it here. Would it work to substitute the term “alignment target” or “developer’s goal”?
I agree to a significant extent, and do usually prefer “target” or “goal” in contexts where that makes sense.
It doesn’t always make sense, though. Often I want to talk about how humans’ value-systems work in general, without talking about AI or alignment. Intentional ambiguity still applies there—e.g. I usually don’t want to talk about “preferences” in such situations (because that has specific established mathematical meanings which are not what I mean), or “utility”, or “morality”, or “goodness”, or “goals”, or [...]. The thing I want to talk about is different from any of those, and I don’t necessarily have a good legible explanation of what the thing is I want to talk about, either in my head or in writing. So it’s useful to use an intentionally-underdefined term as a distinct placeholder for this thing, with the expectation that over time I will better understand what it is that I instinctively am gesturing at with the term.