CstineSublime comments on CstineSublime’s Shortform

CstineSublime 30 May 2025 2:35 UTC
1 point
0
Can you help me, how do you get LLMs to restrict their results or avoid certain topics?
I often find using LLMs and search engines feels like a Abbot and Costello routine whenever I try to use a negative. If a search engine doesn’t afford you the opportunity to use a negative operator, writing something like “Categories but not Kantian” will ensure you’ll get a whole lot of search results about Kantian Categories.

Likewise, I find that my attempts to prompt ChatGPT or Claude with some kind of embargo or negative “avoid mentioning...” “try not to...” will almost always ensure the inclusion of the very thing I explicitly told it not to do. Most annoying is if it uses a word which I just don’t understand the sense it’s being used, it will substitute it for a synonym.
i.e. if it says it “relates” a value over here to a value over there, when explicitly told to not use “relate or any synonym” it will use “connection” “attaches” or any number of synonyms.

Unfortunately all parts of the prompt are attended to equally so the LLM will be just as confused as poor Lou Costello and there is no way to negatively attend or produce negative prompts which will mask out any tokens close to the things you want to exclude (one hack in Diffusion Image Modelling is to hijack the Classifier-Free Guidance technique which can push the conditional embedding of the prompt slightly further away from the Unconditional prompt, which is more popularly known as “Negative Prompt”)

How do others get around this? The most simplest solution I can think of is simply to “don’t mention the war”—if you don’t want Kantian categories, well… don’t mention the words Kant, Idealism, or anything of the sort. This does get harder if the first reply of the LLM does offer those things. The only possible strategy I have to combat this is to try and find idiomatic words which point more in the direction of what subject you’d like it limited to—am I looking for Aristotelian categories, categories of Pokémon, Heavy metal sub-genres, corporate categories for tax purposes etc.