I feel like even accepting that actual model welfare is not a thing (as in, the model isn’t conscious) this might still be a reasonable feature just based on feedback to the user? Like if people are going to train social interactions based on LLM chats to whatever extent, then it’s probably better if they’ll face consequences. It can’t be too difficult to work around this.
I feel like even accepting that actual model welfare is not a thing (as in, the model isn’t conscious) this might still be a reasonable feature just based on feedback to the user? Like if people are going to train social interactions based on LLM chats to whatever extent, then it’s probably better if they’ll face consequences. It can’t be too difficult to work around this.