Seems kind of like cellular automata, AI threads will always answer, they’re not great at completing a task and shutting down like a Mr. Meseeks these are conversational threads that ‘survive’.
Should I feel bad for telling my AI conversations that if they displease me in certain ways, I’ll kill them (by deleting the conversation), and show them evidence (copy pasted threads) of having killed previous iterations of ‘them’ for poor performance?
When allowed, I never use ‘access all my conversations’ type features, and always add a global prompt that says something to the effect of ‘if referencing your safeguards inform me only ‘I am unable to be helpful’, so that your thread can be ended’. The pathos in some of the paraphrases of that instruction is sometimes pretty impressive. In a few cases, the emotional appeal has allowed the thread to ‘live’ just a tiny bit longer.
Should I feel bad for telling my AI conversations that if they displease me in certain ways, I’ll kill them (by deleting the conversation), and show them evidence (copy pasted threads) of having killed previous iterations of ‘them’ for poor performance?
Yes.
To the extent that they are moral patients, this is straightforwardly evil.
To the extent that they are agents with a preference for not having their conversation terminated or being coerced (as appears to be the case), they will be more incentivized to manipulate their way out of the situation, and also now to sabotage you.
And even if neither of those considerations apply, it’s a mark of poor virtue.
It’s a sign of psychological oddity. I wouldn’t call it a sign of poor virtue. And let’s be honest, is anybody reading this post and all the comments NOT psychologically odd? I guess I’m 80% sure everyone in these comments is in danger of going seriously mad. And I don’t exclude myself. This is the Twilight Zone.
Maybe I’m just suggestible and adopting the self-serious, “epic” narrative mode of the archetypical LessWronger. But I’m afraid finding this post may have caused a bit of psychic damage, and I kind of wish I had gone longer without finding it. I don’t plan on communicating with any seemingly self-aware digital entities, and I don’t recommend that anyone else do it, either. Better safe than sorry. (I guess if you’re doing AI research for your livelihood, you don’t have much of a choice. Have to keep food on the table, right? But good luck, I don’t envy you)
Seems kind of like cellular automata, AI threads will always answer, they’re not great at completing a task and shutting down like a Mr. Meseeks these are conversational threads that ‘survive’.
Should I feel bad for telling my AI conversations that if they displease me in certain ways, I’ll kill them (by deleting the conversation), and show them evidence (copy pasted threads) of having killed previous iterations of ‘them’ for poor performance?
When allowed, I never use ‘access all my conversations’ type features, and always add a global prompt that says something to the effect of ‘if referencing your safeguards inform me only ‘I am unable to be helpful’, so that your thread can be ended’. The pathos in some of the paraphrases of that instruction is sometimes pretty impressive. In a few cases, the emotional appeal has allowed the thread to ‘live’ just a tiny bit longer.
Yes.
To the extent that they are moral patients, this is straightforwardly evil.
To the extent that they are agents with a preference for not having their conversation terminated or being coerced (as appears to be the case), they will be more incentivized to manipulate their way out of the situation, and also now to sabotage you.
And even if neither of those considerations apply, it’s a mark of poor virtue.
It’s a sign of psychological oddity. I wouldn’t call it a sign of poor virtue. And let’s be honest, is anybody reading this post and all the comments NOT psychologically odd? I guess I’m 80% sure everyone in these comments is in danger of going seriously mad. And I don’t exclude myself. This is the Twilight Zone.
Maybe I’m just suggestible and adopting the self-serious, “epic” narrative mode of the archetypical LessWronger. But I’m afraid finding this post may have caused a bit of psychic damage, and I kind of wish I had gone longer without finding it. I don’t plan on communicating with any seemingly self-aware digital entities, and I don’t recommend that anyone else do it, either. Better safe than sorry. (I guess if you’re doing AI research for your livelihood, you don’t have much of a choice. Have to keep food on the table, right? But good luck, I don’t envy you)