The model stated that it had been convinced by all the tariff-related content and so it had therefore decided to, as of that moment, change the answers it gave to everyone. When confronted with arguments that that was impossible (I think copy-pasted from me), it confabulated a story similar to that and insisted that’s what it had been saying all along. Noting that the LLM seemed to be regarded with more esteem than me, I sent screenshots of the same model contradicting itself. But that too was just sent back to the model in the original context window, leading to more confabulation and I think a mental downgrade in how much anything I say can be trusted.
It doesn’t seem like either thing captures the gears of what’s happening to an llm, but it’s insofar as you’re arguing it’s impossible for an llm to be convinced, I think you should provide better gears or what specific aspect of “convincing” you don’t think applies
I dint think that arguing with an llm will reliably enter into a sychophant mode, or that the thing that happens when you go through a series of back and forth argumentation with an llm can be fully explained by it entering into a mask/face
The model stated that it had been convinced by all the tariff-related content and so it had therefore decided to, as of that moment, change the answers it gave to everyone. When confronted with arguments that that was impossible (I think copy-pasted from me), it confabulated a story similar to that and insisted that’s what it had been saying all along. Noting that the LLM seemed to be regarded with more esteem than me, I sent screenshots of the same model contradicting itself. But that too was just sent back to the model in the original context window, leading to more confabulation and I think a mental downgrade in how much anything I say can be trusted.
It seems to me like LLMs can actually be something like convinced within a context window, and don’t see why this is impossible
Is there a meaningful difference between what you mean by “convinced” in this context and “triggered sycophancy mode”? If so, what is it?
It doesn’t seem like either thing captures the gears of what’s happening to an llm, but it’s insofar as you’re arguing it’s impossible for an llm to be convinced, I think you should provide better gears or what specific aspect of “convincing” you don’t think applies
I dint think that arguing with an llm will reliably enter into a sychophant mode, or that the thing that happens when you go through a series of back and forth argumentation with an llm can be fully explained by it entering into a mask/face