I’m not familiar with these strings. Are you referring to the adversarial prompts themselves? I don’t see anything else that would fit mentioned in the paper that seems like it’d be most likely to include it.
I think ‘you can use semantically-meaningless-to-a-human inputs to break model behavior arbitrarily’ is just inherent to modern neural networks, rather than a quirk of LLM “psychology”.
Yes that’s right, thinking of the prompts themselves.
I agree it’s not very surprising given what we know about neural networks, it’s just a way in which LLMs are very much not generalizing in the same way a human would.
I’m not familiar with these strings. Are you referring to the adversarial prompts themselves? I don’t see anything else that would fit mentioned in the paper that seems like it’d be most likely to include it.
I think ‘you can use semantically-meaningless-to-a-human inputs to break model behavior arbitrarily’ is just inherent to modern neural networks, rather than a quirk of LLM “psychology”.
Yes that’s right, thinking of the prompts themselves.
I agree it’s not very surprising given what we know about neural networks, it’s just a way in which LLMs are very much not generalizing in the same way a human would.