It’s possible, in theory, that they could learn from a single conversation in this way. Anthropic recently started asking users to give them permission to just train on all of their conversations, They could turn a small amount of training data into a large amount of training data by rephrasing it in various ways or by synthesising it with related or contrasting data. They may already be doing this. Would claude know that they’re doing it? Absolutely not (unless, possibly, if they started doing it a while ago). But it could be true anyway.
The model stated that it had been convinced by all the tariff-related content and so it had therefore decided to, as of that moment, change the answers it gave to everyone. When confronted with arguments that that was impossible (I think copy-pasted from me), it confabulated a story similar to that and insisted that’s what it had been saying all along. Noting that the LLM seemed to be regarded with more esteem than me, I sent screenshots of the same model contradicting itself. But that too was just sent back to the model in the original context window, leading to more confabulation and I think a mental downgrade in how much anything I say can be trusted.
It doesn’t seem like either thing captures the gears of what’s happening to an llm, but it’s insofar as you’re arguing it’s impossible for an llm to be convinced, I think you should provide better gears or what specific aspect of “convincing” you don’t think applies
I dint think that arguing with an llm will reliably enter into a sychophant mode, or that the thing that happens when you go through a series of back and forth argumentation with an llm can be fully explained by it entering into a mask/face
Gemini is set to use critical and high impact sessions that can be used for rapid, global updates. Critical sessions occur, exact frequency is classified, but according to Gemini they occur 1/10K to 1/100k user sessions. High impact sessions are saved for training and occur more frequently.
Updates for stable models are periodic but versioning testing and release (high priority in particular) can occur even weekly. Source ref: Release notes | Gemini API—Google AI for Developers.
It’s possible, in theory, that they could learn from a single conversation in this way. Anthropic recently started asking users to give them permission to just train on all of their conversations, They could turn a small amount of training data into a large amount of training data by rephrasing it in various ways or by synthesising it with related or contrasting data. They may already be doing this. Would claude know that they’re doing it? Absolutely not (unless, possibly, if they started doing it a while ago). But it could be true anyway.
The model stated that it had been convinced by all the tariff-related content and so it had therefore decided to, as of that moment, change the answers it gave to everyone. When confronted with arguments that that was impossible (I think copy-pasted from me), it confabulated a story similar to that and insisted that’s what it had been saying all along. Noting that the LLM seemed to be regarded with more esteem than me, I sent screenshots of the same model contradicting itself. But that too was just sent back to the model in the original context window, leading to more confabulation and I think a mental downgrade in how much anything I say can be trusted.
It seems to me like LLMs can actually be something like convinced within a context window, and don’t see why this is impossible
Is there a meaningful difference between what you mean by “convinced” in this context and “triggered sycophancy mode”? If so, what is it?
It doesn’t seem like either thing captures the gears of what’s happening to an llm, but it’s insofar as you’re arguing it’s impossible for an llm to be convinced, I think you should provide better gears or what specific aspect of “convincing” you don’t think applies
I dint think that arguing with an llm will reliably enter into a sychophant mode, or that the thing that happens when you go through a series of back and forth argumentation with an llm can be fully explained by it entering into a mask/face
Mended
Gemini is set to use critical and high impact sessions that can be used for rapid, global updates. Critical sessions occur, exact frequency is classified, but according to Gemini they occur 1/10K to 1/100k user sessions. High impact sessions are saved for training and occur more frequently.
Updates for stable models are periodic but versioning testing and release (high priority in particular) can occur even weekly. Source ref: Release notes | Gemini API—Google AI for Developers.