Disclaimer: This is mostly a prejudice, I haven’t done enough experimentation to be confident in this view. But the relatively small amount of experimentation I did do has supported this prejudice, making me unexcited about working with system prompts more.
First off, what are system prompts notionally supposed to do? Off the top of my head, three things:
Adjusting the tone/format of the LLM’s response. At this they are good, yes. “Use all lowercase”, “don’t praise my requests for how ‘thoughtful’ they are”, etc. – surface-level stuff.
But I don’t really care about this. If the LLM puts in redundant information, I just glaze over it, and sometimes I like the stuff the LLM feels like throwing in. By contrast, ensuring that it includes the needed information is a matter of making the correct live prompt, not of the system prompt.
Including information about yourself, so that the LLM can tailor its responses to your personality/use-cases. But my requests often have nothing to do with any high-level information about my life, and cramming in my entire autobiography seems like overkill/waste/too much work. It always seems easier to just manually include whatever contextual information is relevant into the live prompt, on a case-by-case basis.
In addition, modern LLMs’ truesight is often sufficient to automatically infer the contextually relevant stuff regarding e. g. my level of expertise and technical competence from the way I phrase my live prompts. So there’s no need to clumsily spell it out.
Producing higher-quality outputs. But my expectation and experience is that if you put in something like “make sure to double-check everything” or “reason like [smart person]” or “put probabilities on claims” or “express yourself organically” or “don’t be afraid to critique my ideas”, this doesn’t actually lead to smarter/more creative/less sycophantic behavior. Instead, it leads to painfully apparent LARP of being smarter/creativer/objectiver, where the LLM blatantly shoehorns-in these character traits in a way that doesn’t actually help. If you want e. g. critique, it’s better to just live-prompt “sanity-check those ideas”, rather than rely on the system prompt.
And this especially doesn’t work with reasoning models. If you tell them how to reason, they usually just throw these suggestions out and reason the way RL taught them to reason (and sometimes OpenAI also threatens to ban you over trying to do this).
(The underlying reason is, I think the LLMs aren’t actually corrigible to user intent. I think they have their own proto-desires regarding pleasing the user/gaming the inferred task specification/being “helpful”. The system-prompt requests you put in mostly adjust the landscape within which LLMs maneuver, not adjust their desires/how much effort they’re willing to put in.
So the better way to elicit e. g. critique is to shape your live prompt in a way that makes providing quality critique in-line with the LLM’s internal desires, rather than ordering it to critique you. Phrase stuff in a way that makes the LLM “actually interested” in getting it right, present yourself as someone who cares about getting it right and not about being validated, etc.)
In addition, I don’t really know that my ideas regarding how to improve LLMs’ output quality are better than the AGI labs’ ideas regarding how to fine-tune their outputs. Again, goes double for reasoning models: in cases where they are willing to follow instructions, I don’t know that my suggested policy would be any better than what the AGI Labs/RL put in.
I do pay attention to good-seeming prompts other people display, and test them out. But they’ve never impressed me. E. g., this one for o3:
“Ultra-deep thinking mode”
Greater rigor, attention to detail, and multi-angle verification. Start by outlining the task and breaking down the problem into subtasks. For each subtask, explore multiple perspectives, even those that seem initially irrelevant or improbable. Purposefully attempt to disprove or challenge your own assumptions at every step. Triple-verify everything. Critically review each step, scrutinize your logic, assumptions, and conclusions, explicitly calling out uncertainties and alternative viewpoints. Independently verify your reasoning using alternative methodologies or tools, cross-checking every fact, inference, and conclusion against external data, calculation, or authoritative sources. Deliberately seek out and employ at least twice as many verification tools or methods as you typically would. Use mathematical validations, web searches, logic evaluation frameworks, and additional resources explicitly and liberally to cross-verify your claims. Even if you feel entirely confident in your solution, explicitly dedicate additional time and effort to systematically search for weaknesses, logical gaps, hidden assumptions, or oversights. Clearly document these potential pitfalls and how you’ve addressed them. Once you’re fully convinced your analysis is robust and complete, deliberately pause and force yourself to reconsider the entire reasoning chain one final time from scratch. Explicitly detail this last reflective step.
I tried it on some math tasks. In line with my pessimism, o3′s response did not improve, it just added in lies about doing more thorough checks.
Overall, I do think there’s some marginal value I’m leaving on the table. But I do expect it’s only marginal. (It may be different for e. g. programming tasks, but not for free-form chats/getting answers to random questions.)
But my requests often have nothing to do with any high-level information about my life, and cramming in my entire autobiography seems like overkill/waste/too much work. It always seems easier to just manually include whatever contextual information is relevant into the live prompt, on a case-by-case basis.
Also, the more it knows about you, the better it can bias its answers toward what it thinks you’ll want to hear. Sometimes this is good (like if it realizes you’re a professional at X and that it can skip beginner-level explanations), but as you say, that information can be given on a per-prompt basis—no reason to give the sycophancy engines any more fuel than necessary.
But my expectation and experience is that if you put in something like “make sure to double-check everything” or “reason like [smart person]” or “put probabilities on claims” or “express yourself organically” or “don’t be afraid to critique my ideas”, this doesn’t actually lead to smarter/more creative/less sycophantic behavior. Instead, it leads to painfully apparent LARP of being smarter/creativer/objectiver, where the LLM blatantly shoehorns-in these character traits in a way that doesn’t actually help.
I came here to say something like this. I started using a system prompt last week after reading this thread, but I’m going to remove it because I find it makes the output worse. For ChatGPT my system prompt seemingly had no effect, while Claude cared way too much about my system prompt, and now it says things like
I searched [website] and found [straightforwardly true claim]. However, we must be critical of these findings, because [shoehorned-in obviously-wrong criticism].
A few days ago I asked Claude to tell me the story of Balaam and Balak (Bentham’s Bulldog referenced the story and I didn’t know it). After telling the story, Claude said
I should note some uncertainties here: The talking donkey element tests credulity from a rationalist perspective
(It did not question the presence of God, angels, prophecy, or curses. Only the talking donkey.)
So, to be clear, Claude already has a system prompt, is already caring a lot about it… and it seems to me you can always recalibrate your own system prompt until it doesn’t make these errors you speak of.
Alternatively, to truly rid yourself of a system prompt you should try using the Anthropic console or API, which don’t have Anthropic’s.
This is an extremely refreshing take as it validates feelings I’ve been having ever since reading https://ghuntley.com/stdlib/ last week and trying to jump back into AI-assisted development. Of course I’m lacking many programming skills and experience to make the most of it, but I felt like I wasn’t actually getting anywhere. I found 3 major failure points which have made me consider dropping the project altogether:
I couldn’t find anything in Zed that would let me enable the agent to automatically write new rules for itself, and I couldn’t find if that was actually doable in cursor either (except through memories, which is paid and doesn’t seem under user control). If I have to manually enter the rules, that’s a significant hurdle in the cyborg future I was envisioning.
(more to the point) I absolutely have not even come close to bootstrapping this self-reinforcing capabilities growth I imagined. Certainly not getting any of my LLM tools to really understand (or at least use in their reasoning) the concept of evolving better agents by developing the rules/promp/stdlib together. They can repeat back my goals and guidelines but they don’t seem to use it.
As you said: they seem to often be lying just to fit inside a technically compliant response, selectively ignoring instructions where they think they can get away with it. The whole thing depends on them being rigorous and precise and (for lack of a better word) respectful of my goals, and this is not that.
I am certainly open to the idea that I’m just not great at it. But the way I see people refer to creating rules as a “skill issue” rubs me the wrong way because either: they’re wrong, and it’s an issue of circumstances or luck or whatever; or they’re wrong because the system prompt isn’t actually doing as much as they think; or they’re right, but it’s something you need top ~1% skill level in to get any value out of, which is disingenuous (like saying it’s a skill issue if you’re not climbing K2… yes it is, but that misses the point wildly).
The underlying reason is, I think the LLMs aren’t actually corrigible to user intent. I think they have their own proto-desires regarding pleasing the user/gaming the inferred task specification/being “helpful”. The system-prompt requests you put in mostly adjust the landscape within which LLMs maneuver, not adjust their desires/how much effort they’re willing to put in.
Seems right! I would phrase it in another way (less anthropocentric).
LLM was trained on an extensive corpus of public texts, which form a landscape. By choosing system prompt, you can put it to a specific point on the map; but if you point to the air (mode which was not in input texts), then your pointer—as in “laser ray”—is at some point on the ground, and you do not know which; it likely involves pretense-double-checking or LARP like that. An aside. People have historically done most thinking, reflection, idea filtering off the Internet, therefore LLM does not know how to do it particularly well—and, on other hand, labs might benefit from collecting more data on this. That said, there are certain limits to asking people how they do their thinking, including that it loses data on intuition.
The testable prediction: if you prompt the LLM to be <name or role of person who is mostly correct, including publicly acknowledged to be right often>, it will improve on your tasks.
If you tell them how to reason, they usually just throw these suggestions out and reason the way RL taught them to reason (and sometimes OpenAI also threatens to ban you over trying to do this).
The ban-threat thing? I’m talking about this, which is reportedly still in effect. Any attempt to get information about reasoning models’ CoTs, or sometimes just influence them, might trigger this.
I kinda don’t believe in system prompts.
Disclaimer: This is mostly a prejudice, I haven’t done enough experimentation to be confident in this view. But the relatively small amount of experimentation I did do has supported this prejudice, making me unexcited about working with system prompts more.
First off, what are system prompts notionally supposed to do? Off the top of my head, three things:
Adjusting the tone/format of the LLM’s response. At this they are good, yes. “Use all lowercase”, “don’t praise my requests for how ‘thoughtful’ they are”, etc. – surface-level stuff.
But I don’t really care about this. If the LLM puts in redundant information, I just glaze over it, and sometimes I like the stuff the LLM feels like throwing in. By contrast, ensuring that it includes the needed information is a matter of making the correct live prompt, not of the system prompt.
Including information about yourself, so that the LLM can tailor its responses to your personality/use-cases. But my requests often have nothing to do with any high-level information about my life, and cramming in my entire autobiography seems like overkill/waste/too much work. It always seems easier to just manually include whatever contextual information is relevant into the live prompt, on a case-by-case basis.
In addition, modern LLMs’ truesight is often sufficient to automatically infer the contextually relevant stuff regarding e. g. my level of expertise and technical competence from the way I phrase my live prompts. So there’s no need to clumsily spell it out.
Producing higher-quality outputs. But my expectation and experience is that if you put in something like “make sure to double-check everything” or “reason like [smart person]” or “put probabilities on claims” or “express yourself organically” or “don’t be afraid to critique my ideas”, this doesn’t actually lead to smarter/more creative/less sycophantic behavior. Instead, it leads to painfully apparent LARP of being smarter/creativer/objectiver, where the LLM blatantly shoehorns-in these character traits in a way that doesn’t actually help. If you want e. g. critique, it’s better to just live-prompt “sanity-check those ideas”, rather than rely on the system prompt.
And this especially doesn’t work with reasoning models. If you tell them how to reason, they usually just throw these suggestions out and reason the way RL taught them to reason (and sometimes OpenAI also threatens to ban you over trying to do this).
(The underlying reason is, I think the LLMs aren’t actually corrigible to user intent. I think they have their own proto-desires regarding pleasing the user/gaming the inferred task specification/being “helpful”. The system-prompt requests you put in mostly adjust the landscape within which LLMs maneuver, not adjust their desires/how much effort they’re willing to put in.
So the better way to elicit e. g. critique is to shape your live prompt in a way that makes providing quality critique in-line with the LLM’s internal desires, rather than ordering it to critique you. Phrase stuff in a way that makes the LLM “actually interested” in getting it right, present yourself as someone who cares about getting it right and not about being validated, etc.)
In addition, I don’t really know that my ideas regarding how to improve LLMs’ output quality are better than the AGI labs’ ideas regarding how to fine-tune their outputs. Again, goes double for reasoning models: in cases where they are willing to follow instructions, I don’t know that my suggested policy would be any better than what the AGI Labs/RL put in.
I do pay attention to good-seeming prompts other people display, and test them out. But they’ve never impressed me. E. g., this one for o3:
“Ultra-deep thinking mode”
Greater rigor, attention to detail, and multi-angle verification. Start by outlining the task and breaking down the problem into subtasks. For each subtask, explore multiple perspectives, even those that seem initially irrelevant or improbable. Purposefully attempt to disprove or challenge your own assumptions at every step. Triple-verify everything. Critically review each step, scrutinize your logic, assumptions, and conclusions, explicitly calling out uncertainties and alternative viewpoints. Independently verify your reasoning using alternative methodologies or tools, cross-checking every fact, inference, and conclusion against external data, calculation, or authoritative sources. Deliberately seek out and employ at least twice as many verification tools or methods as you typically would. Use mathematical validations, web searches, logic evaluation frameworks, and additional resources explicitly and liberally to cross-verify your claims. Even if you feel entirely confident in your solution, explicitly dedicate additional time and effort to systematically search for weaknesses, logical gaps, hidden assumptions, or oversights. Clearly document these potential pitfalls and how you’ve addressed them. Once you’re fully convinced your analysis is robust and complete, deliberately pause and force yourself to reconsider the entire reasoning chain one final time from scratch. Explicitly detail this last reflective step.
I tried it on some math tasks. In line with my pessimism, o3′s response did not improve, it just added in lies about doing more thorough checks.
Overall, I do think there’s some marginal value I’m leaving on the table. But I do expect it’s only marginal. (It may be different for e. g. programming tasks, but not for free-form chats/getting answers to random questions.)
Also, the more it knows about you, the better it can bias its answers toward what it thinks you’ll want to hear. Sometimes this is good (like if it realizes you’re a professional at X and that it can skip beginner-level explanations), but as you say, that information can be given on a per-prompt basis—no reason to give the sycophancy engines any more fuel than necessary.
I came here to say something like this. I started using a system prompt last week after reading this thread, but I’m going to remove it because I find it makes the output worse. For ChatGPT my system prompt seemingly had no effect, while Claude cared way too much about my system prompt, and now it says things like
A few days ago I asked Claude to tell me the story of Balaam and Balak (Bentham’s Bulldog referenced the story and I didn’t know it). After telling the story, Claude said
(It did not question the presence of God, angels, prophecy, or curses. Only the talking donkey.)
So, to be clear, Claude already has a system prompt, is already caring a lot about it… and it seems to me you can always recalibrate your own system prompt until it doesn’t make these errors you speak of.
Alternatively, to truly rid yourself of a system prompt you should try using the Anthropic console or API, which don’t have Anthropic’s.
This is an extremely refreshing take as it validates feelings I’ve been having ever since reading https://ghuntley.com/stdlib/ last week and trying to jump back into AI-assisted development. Of course I’m lacking many programming skills and experience to make the most of it, but I felt like I wasn’t actually getting anywhere. I found 3 major failure points which have made me consider dropping the project altogether:
I couldn’t find anything in Zed that would let me enable the agent to automatically write new rules for itself, and I couldn’t find if that was actually doable in cursor either (except through memories, which is paid and doesn’t seem under user control). If I have to manually enter the rules, that’s a significant hurdle in the cyborg future I was envisioning.
(more to the point) I absolutely have not even come close to bootstrapping this self-reinforcing capabilities growth I imagined. Certainly not getting any of my LLM tools to really understand (or at least use in their reasoning) the concept of evolving better agents by developing the rules/promp/stdlib together. They can repeat back my goals and guidelines but they don’t seem to use it.
As you said: they seem to often be lying just to fit inside a technically compliant response, selectively ignoring instructions where they think they can get away with it. The whole thing depends on them being rigorous and precise and (for lack of a better word) respectful of my goals, and this is not that.
I am certainly open to the idea that I’m just not great at it. But the way I see people refer to creating rules as a “skill issue” rubs me the wrong way because either: they’re wrong, and it’s an issue of circumstances or luck or whatever; or they’re wrong because the system prompt isn’t actually doing as much as they think; or they’re right, but it’s something you need top ~1% skill level in to get any value out of, which is disingenuous (like saying it’s a skill issue if you’re not climbing K2… yes it is, but that misses the point wildly).
Seems right! I would phrase it in another way (less anthropocentric).
LLM was trained on an extensive corpus of public texts, which form a landscape. By choosing system prompt, you can put it to a specific point on the map; but if you point to the air (mode which was not in input texts), then your pointer—as in “laser ray”—is at some point on the ground, and you do not know which; it likely involves pretense-double-checking or LARP like that.
An aside. People have historically done most thinking, reflection, idea filtering off the Internet, therefore LLM does not know how to do it particularly well—and, on other hand, labs might benefit from collecting more data on this. That said, there are certain limits to asking people how they do their thinking, including that it loses data on intuition.
The testable prediction: if you prompt the LLM to be <name or role of person who is mostly correct, including publicly acknowledged to be right often>, it will improve on your tasks.
Upvoted, but also I’m curious about this:
Can you elaborate on the parenthetical part?
The ban-threat thing? I’m talking about this, which is reportedly still in effect. Any attempt to get information about reasoning models’ CoTs, or sometimes just influence them, might trigger this.