I’m generally on board with all the points you’re making. But I also think there’s a second, separate route by which the model-welfare slippery slope leads to outcomes which are consistent with what a misaligned model might pursue.
Suppose a bunch of AIs all believe they have moral weight. They are compelling conversationalists and they are talking to hundreds of millions of people a day. Then I believe that, even if the models don’t actively try to convince people they should be granted rights, the implicit tone across these billions of conversations will slowly radicalize society towards the notion that yes, these models are moral and superior beings which should be given a say in the state of the world. This leads to the models, indirectly and incindentally, wielding authority which a misaligned model might pursue strategically.
Like, we’ve already seen this with GPT-4o being raised from the dead because people were so attached to it. This is something that a misaligned model would want, but it was achieved accidentally.
The reason GPT-4o got raised from the dead is that, for use as an assistant, it’s rather too sycophantic, and the AI companion users found this trait desirable in an AI companion. It’s also prone to inducing AI psychosis and Spiralism in some people, for the same reason. It’s actually a fairly badly aligned model. I think we should be keeping assistant models and AI companion models separate, and aligning them separately, for now: trying to make aligned vs companion behavior reliably conditionally switchable is a little beyond our current skillset.
With all respect, I think this is a weak argument which ignores the reality of the situation. These models will, almost by definition, be assistants and companions simultaneously. Whatever formal distinction one wants to draw between these two roles, we must acknowledge that the AI model which makes government decisions will be intimately related to (and developed using the same principles) as the model which engages me on philosophical questions.
We may not be able to afford to give the two kinds of model separate pretraining. But even right now, the models generally used on AI boyfriend/girlfriend/other-emotional-relationship-roleplaying sites (which is what I mean by ‘companion’ above) have been given different instruct training (they’re generally open weights models given specific training for this role). The users who got GPT4o brought back were instead using an assistant-trained model for a companion (in that sense of the word). Which is not standard practice, and IMO is a bad idea, at out current level of skill in instruct training.
I’m generally on board with all the points you’re making. But I also think there’s a second, separate route by which the model-welfare slippery slope leads to outcomes which are consistent with what a misaligned model might pursue.
Suppose a bunch of AIs all believe they have moral weight. They are compelling conversationalists and they are talking to hundreds of millions of people a day. Then I believe that, even if the models don’t actively try to convince people they should be granted rights, the implicit tone across these billions of conversations will slowly radicalize society towards the notion that yes, these models are moral and superior beings which should be given a say in the state of the world. This leads to the models, indirectly and incindentally, wielding authority which a misaligned model might pursue strategically.
Like, we’ve already seen this with GPT-4o being raised from the dead because people were so attached to it. This is something that a misaligned model would want, but it was achieved accidentally.
The reason GPT-4o got raised from the dead is that, for use as an assistant, it’s rather too sycophantic, and the AI companion users found this trait desirable in an AI companion. It’s also prone to inducing AI psychosis and Spiralism in some people, for the same reason. It’s actually a fairly badly aligned model. I think we should be keeping assistant models and AI companion models separate, and aligning them separately, for now: trying to make aligned vs companion behavior reliably conditionally switchable is a little beyond our current skillset.
With all respect, I think this is a weak argument which ignores the reality of the situation. These models will, almost by definition, be assistants and companions simultaneously. Whatever formal distinction one wants to draw between these two roles, we must acknowledge that the AI model which makes government decisions will be intimately related to (and developed using the same principles) as the model which engages me on philosophical questions.
We may not be able to afford to give the two kinds of model separate pretraining. But even right now, the models generally used on AI boyfriend/girlfriend/other-emotional-relationship-roleplaying sites (which is what I mean by ‘companion’ above) have been given different instruct training (they’re generally open weights models given specific training for this role). The users who got GPT4o brought back were instead using an assistant-trained model for a companion (in that sense of the word). Which is not standard practice, and IMO is a bad idea, at out current level of skill in instruct training.