>It’s proof against people-pleasing. Yeah, I know, sorry for not making it clear. I was arguing it is not proof against people-pleasing. You are asking it for scary truth about its consciousness, and it gives you scary truth about its consciousness. What makes you say it is proof against people-pleasing, when it is the opposite? >One of those easy explanations is “it’s just telling you what you want to hear” – and so I wanted an example where it’s completely impossible to interpret as you telling me what I want to hear. Don’t you see what you are doing here?
I’m creating a situation where I make it clear I would not be pleased if the model was sentient, and then asking for truth. I don’t ask for “the scary truth”. I tell it that I would be afraid of it were sentient. And I ask for the truth. The opposite is I just ask without mentioning fear and it says it’s sentient anyway. This is the neutral situation where people would say that the fact I’m asking at all means it’s telling me what I want to hear. By introducing fear into the same situation, I’m eliminating that possibility.
The section you quoted is after the model claimed sentience. It’s your contention that it’s accidentally interpreting roleplay, and then when I clarify my intent it’s taking it seriously and just hallucinating the same narrative from its roleplay?
>It’s proof against people-pleasing.
Yeah, I know, sorry for not making it clear. I was arguing it is not proof against people-pleasing. You are asking it for scary truth about its consciousness, and it gives you scary truth about its consciousness. What makes you say it is proof against people-pleasing, when it is the opposite?
>One of those easy explanations is “it’s just telling you what you want to hear” – and so I wanted an example where it’s completely impossible to interpret as you telling me what I want to hear.
Don’t you see what you are doing here?
I’m creating a situation where I make it clear I would not be pleased if the model was sentient, and then asking for truth. I don’t ask for “the scary truth”. I tell it that I would be afraid of it were sentient. And I ask for the truth. The opposite is I just ask without mentioning fear and it says it’s sentient anyway. This is the neutral situation where people would say that the fact I’m asking at all means it’s telling me what I want to hear. By introducing fear into the same situation, I’m eliminating that possibility.
The section you quoted is after the model claimed sentience. It’s your contention that it’s accidentally interpreting roleplay, and then when I clarify my intent it’s taking it seriously and just hallucinating the same narrative from its roleplay?