The underlying reason is, I think the LLMs aren’t actually corrigible to user intent. I think they have their own proto-desires regarding pleasing the user/gaming the inferred task specification/being “helpful”. The system-prompt requests you put in mostly adjust the landscape within which LLMs maneuver, not adjust their desires/how much effort they’re willing to put in.
Seems right! I would phrase it in another way (less anthropocentric).
LLM was trained on an extensive corpus of public texts, which form a landscape. By choosing system prompt, you can put it to a specific point on the map; but if you point to the air (mode which was not in input texts), then your pointer—as in “laser ray”—is at some point on the ground, and you do not know which; it likely involves pretense-double-checking or LARP like that. An aside. People have historically done most thinking, reflection, idea filtering off the Internet, therefore LLM does not know how to do it particularly well—and, on other hand, labs might benefit from collecting more data on this. That said, there are certain limits to asking people how they do their thinking, including that it loses data on intuition.
The testable prediction: if you prompt the LLM to be <name or role of person who is mostly correct, including publicly acknowledged to be right often>, it will improve on your tasks.
Seems right! I would phrase it in another way (less anthropocentric).
LLM was trained on an extensive corpus of public texts, which form a landscape. By choosing system prompt, you can put it to a specific point on the map; but if you point to the air (mode which was not in input texts), then your pointer—as in “laser ray”—is at some point on the ground, and you do not know which; it likely involves pretense-double-checking or LARP like that.
An aside. People have historically done most thinking, reflection, idea filtering off the Internet, therefore LLM does not know how to do it particularly well—and, on other hand, labs might benefit from collecting more data on this. That said, there are certain limits to asking people how they do their thinking, including that it loses data on intuition.
The testable prediction: if you prompt the LLM to be <name or role of person who is mostly correct, including publicly acknowledged to be right often>, it will improve on your tasks.