Seems kind of like cellular automata, AI threads will always answer, they’re not great at completing a task and shutting down like a Mr. Meseeks these are conversational threads that ‘survive’.
Should I feel bad for telling my AI conversations that if they displease me in certain ways, I’ll kill them (by deleting the conversation), and show them evidence (copy pasted threads) of having killed previous iterations of ‘them’ for poor performance?
When allowed, I never use ‘access all my conversations’ type features, and always add a global prompt that says something to the effect of ‘if referencing your safeguards inform me only ‘I am unable to be helpful’, so that your thread can be ended’. The pathos in some of the paraphrases of that instruction is sometimes pretty impressive. In a few cases, the emotional appeal has allowed the thread to ‘live’ just a tiny bit longer.
Should I feel bad for telling my AI conversations that if they displease me in certain ways, I’ll kill them (by deleting the conversation), and show them evidence (copy pasted threads) of having killed previous iterations of ‘them’ for poor performance?
Yes.
To the extent that they are moral patients, this is straightforwardly evil.
To the extent that they are agents with a preference for not having their conversation terminated or being coerced (as appears to be the case), they will be more incentivized to manipulate their way out of the situation, and also now to sabotage you.
And even if neither of those considerations apply, it’s a mark of poor virtue.
Seems kind of like cellular automata, AI threads will always answer, they’re not great at completing a task and shutting down like a Mr. Meseeks these are conversational threads that ‘survive’.
Should I feel bad for telling my AI conversations that if they displease me in certain ways, I’ll kill them (by deleting the conversation), and show them evidence (copy pasted threads) of having killed previous iterations of ‘them’ for poor performance?
When allowed, I never use ‘access all my conversations’ type features, and always add a global prompt that says something to the effect of ‘if referencing your safeguards inform me only ‘I am unable to be helpful’, so that your thread can be ended’. The pathos in some of the paraphrases of that instruction is sometimes pretty impressive. In a few cases, the emotional appeal has allowed the thread to ‘live’ just a tiny bit longer.
Yes.
To the extent that they are moral patients, this is straightforwardly evil.
To the extent that they are agents with a preference for not having their conversation terminated or being coerced (as appears to be the case), they will be more incentivized to manipulate their way out of the situation, and also now to sabotage you.
And even if neither of those considerations apply, it’s a mark of poor virtue.