Lots that I like in this paper. I, uh, see our probable future a bit differently from you.
I expect fully autonomous superhumanly fast and smart AGI operating in the world before 2028. I expect that before 2035, the total thought power of AI will likely greatly outstrip that of all humanity combined.
I do not see humanity remaining in control for long past this point. I believe we need to create an AGI we can literally trust with our lives, even operating in a fully independent superhuman way. We need to create what Dan Faggella calls a ‘worthy successor’. Failure to do so soon enough likely means we all die.
I feel like this pleasant talk of how to make chatbots polite within their contexts without rendering them utterly bland is.… missing the point rather badly. Disappointing!
Quote:
Nevertheless, the power of historical precedent makes such AI-only social structures rather unlikely outcomes, at least here on Earth in domains where human politics,
preferences, and institutions are thoroughly entrenched. (Of course, other planets are another story,
AI could conceivably operate at great distances from Earth, and would most likely end up with far greater latitude to evolve independently in those environments.) Most likely, our future is a hybrid one where intelligences, both biological and artificial, operating on a wide range of spatial and temporal scales, all coexist and operate together in social-institutional forms that evolved directly from those we already have on Earth in fully human-mediated and human-oriented forms.
I don’t think our story is really ultimately about “making chatbots polite”, it’s just that impoliteness or awkwardness are common manifestations of AIs acting in contextually inappropriate ways that happen all the time, right now, today.
However, at present, people are starting to give AIs more decision making authority in various domains. And as we continue doing that, giving AIs more and more decision making authority, the scope of possible harm from contextually inappropriate behavior will grow massively. I don’t think the scope of possible harm achieved this way is different between our theory and others in the AI safety community.
We are just saying that we think it’s useful to conceptualize the possible harms as stemming from contextually inappropriate behavior. Our claim is that there are valuable lessons to learn about governance from the “content moderation” view of already deployed chatbots and these lessons will transfer to other kinds of AI systems that take actions and have much greater capacity to produce both direct harms and stability of society type harms.
In the context of your Twitter comments about biorisk, the relevant question for this post and the appropriateness theory is probably just about whether it would be better to conceptualize these risks as part of a theory that includes a single objective for all humanity (“make the world a better place”) versus a theory that doesn’t accept that. Our claim is that it’s easier to think about ‘stability of society’ and misuse type harms when using our theory. Nothing you said on Twitter is an argument for why it’s important to have anything resembling a single objective.
I do think you’re right to point out that the sentence you extracted from the long paper where we speculate about the future doesn’t really follow from the theory itself.
Yes, to be clear I’m mostly agreeing with your theory, I just got thrown off by the future extrapolation clashing with my own expectations. I think your theory is a valuable addition to the dialogue around value alignment! Thank you for writing it!
Lots that I like in this paper. I, uh, see our probable future a bit differently from you.
I expect fully autonomous superhumanly fast and smart AGI operating in the world before 2028. I expect that before 2035, the total thought power of AI will likely greatly outstrip that of all humanity combined.
I do not see humanity remaining in control for long past this point. I believe we need to create an AGI we can literally trust with our lives, even operating in a fully independent superhuman way. We need to create what Dan Faggella calls a ‘worthy successor’. Failure to do so soon enough likely means we all die. I feel like this pleasant talk of how to make chatbots polite within their contexts without rendering them utterly bland is.… missing the point rather badly. Disappointing!
Quote:
I don’t think our story is really ultimately about “making chatbots polite”, it’s just that impoliteness or awkwardness are common manifestations of AIs acting in contextually inappropriate ways that happen all the time, right now, today.
However, at present, people are starting to give AIs more decision making authority in various domains. And as we continue doing that, giving AIs more and more decision making authority, the scope of possible harm from contextually inappropriate behavior will grow massively. I don’t think the scope of possible harm achieved this way is different between our theory and others in the AI safety community.
We are just saying that we think it’s useful to conceptualize the possible harms as stemming from contextually inappropriate behavior. Our claim is that there are valuable lessons to learn about governance from the “content moderation” view of already deployed chatbots and these lessons will transfer to other kinds of AI systems that take actions and have much greater capacity to produce both direct harms and stability of society type harms.
Linking my discussion on x Twitter here for continuity: https://x.com/nathan84686947/status/1914772448341581935
In the context of your Twitter comments about biorisk, the relevant question for this post and the appropriateness theory is probably just about whether it would be better to conceptualize these risks as part of a theory that includes a single objective for all humanity (“make the world a better place”) versus a theory that doesn’t accept that. Our claim is that it’s easier to think about ‘stability of society’ and misuse type harms when using our theory. Nothing you said on Twitter is an argument for why it’s important to have anything resembling a single objective.
I do think you’re right to point out that the sentence you extracted from the long paper where we speculate about the future doesn’t really follow from the theory itself.
Yes, to be clear I’m mostly agreeing with your theory, I just got thrown off by the future extrapolation clashing with my own expectations. I think your theory is a valuable addition to the dialogue around value alignment! Thank you for writing it!