I also agree that the question of whether AIs will be driven (purely) by consequentialist goals or whether they will (to a significant extent) be constrained by deontological principles / virtues / etc. is an important question.
I think it’s downstream of the spread of hypotheses discussed in this post, such that we can make faster progress on it once we’ve made progress eliminating hypotheses from this list.
Like, suppose you think Hypothesis 1 is true: They’ll do whatever is in the Spec, because Constitutional AI or Deliberative Alignment or whatever Just Works. On this hypothesis, the answer to your question is “well, what does the Spec say? Does it just list a bunch of goals, or does it also include principles? Does it say it’s OK to overrule the principles for the greater good, or not?”
Meanwhile suppose you think Hypothesis 4 is true. Then it seems like you’ll be dealing with a nasty consequentialist, albeit hopefully a rather myopic one.
I think it’s downstream of the spread of hypotheses discussed in this post, such that we can make faster progress on it once we’ve made progress eliminating hypotheses from this list.
Fair enough, yeah—this seems like a very reasonable angle of attack.
I agree it’s mostly orthogonal.
I also agree that the question of whether AIs will be driven (purely) by consequentialist goals or whether they will (to a significant extent) be constrained by deontological principles / virtues / etc. is an important question.
I think it’s downstream of the spread of hypotheses discussed in this post, such that we can make faster progress on it once we’ve made progress eliminating hypotheses from this list.
Like, suppose you think Hypothesis 1 is true: They’ll do whatever is in the Spec, because Constitutional AI or Deliberative Alignment or whatever Just Works. On this hypothesis, the answer to your question is “well, what does the Spec say? Does it just list a bunch of goals, or does it also include principles? Does it say it’s OK to overrule the principles for the greater good, or not?”
Meanwhile suppose you think Hypothesis 4 is true. Then it seems like you’ll be dealing with a nasty consequentialist, albeit hopefully a rather myopic one.
Fair enough, yeah—this seems like a very reasonable angle of attack.