I don’t think our story is really ultimately about “making chatbots polite”, it’s just that impoliteness or awkwardness are common manifestations of AIs acting in contextually inappropriate ways that happen all the time, right now, today.
However, at present, people are starting to give AIs more decision making authority in various domains. And as we continue doing that, giving AIs more and more decision making authority, the scope of possible harm from contextually inappropriate behavior will grow massively. I don’t think the scope of possible harm achieved this way is different between our theory and others in the AI safety community.
We are just saying that we think it’s useful to conceptualize the possible harms as stemming from contextually inappropriate behavior. Our claim is that there are valuable lessons to learn about governance from the “content moderation” view of already deployed chatbots and these lessons will transfer to other kinds of AI systems that take actions and have much greater capacity to produce both direct harms and stability of society type harms.
In the context of your Twitter comments about biorisk, the relevant question for this post and the appropriateness theory is probably just about whether it would be better to conceptualize these risks as part of a theory that includes a single objective for all humanity (“make the world a better place”) versus a theory that doesn’t accept that. Our claim is that it’s easier to think about ‘stability of society’ and misuse type harms when using our theory. Nothing you said on Twitter is an argument for why it’s important to have anything resembling a single objective.
I do think you’re right to point out that the sentence you extracted from the long paper where we speculate about the future doesn’t really follow from the theory itself.
Yes, to be clear I’m mostly agreeing with your theory, I just got thrown off by the future extrapolation clashing with my own expectations. I think your theory is a valuable addition to the dialogue around value alignment! Thank you for writing it!
I don’t think our story is really ultimately about “making chatbots polite”, it’s just that impoliteness or awkwardness are common manifestations of AIs acting in contextually inappropriate ways that happen all the time, right now, today.
However, at present, people are starting to give AIs more decision making authority in various domains. And as we continue doing that, giving AIs more and more decision making authority, the scope of possible harm from contextually inappropriate behavior will grow massively. I don’t think the scope of possible harm achieved this way is different between our theory and others in the AI safety community.
We are just saying that we think it’s useful to conceptualize the possible harms as stemming from contextually inappropriate behavior. Our claim is that there are valuable lessons to learn about governance from the “content moderation” view of already deployed chatbots and these lessons will transfer to other kinds of AI systems that take actions and have much greater capacity to produce both direct harms and stability of society type harms.
Linking my discussion on x Twitter here for continuity: https://x.com/nathan84686947/status/1914772448341581935
In the context of your Twitter comments about biorisk, the relevant question for this post and the appropriateness theory is probably just about whether it would be better to conceptualize these risks as part of a theory that includes a single objective for all humanity (“make the world a better place”) versus a theory that doesn’t accept that. Our claim is that it’s easier to think about ‘stability of society’ and misuse type harms when using our theory. Nothing you said on Twitter is an argument for why it’s important to have anything resembling a single objective.
I do think you’re right to point out that the sentence you extracted from the long paper where we speculate about the future doesn’t really follow from the theory itself.
Yes, to be clear I’m mostly agreeing with your theory, I just got thrown off by the future extrapolation clashing with my own expectations. I think your theory is a valuable addition to the dialogue around value alignment! Thank you for writing it!