[Question] How do I design long prompts for thinking zero shot systems with distinct equally distributed prompt sections (mission, goals, memories, how-to-respond,… etc) and how to maintain llm coherence?

I see a lot of marketing about “max input tokens” being in the hundreds of thousands or millions of tokens, but I have a theory that this only works with simple prompts like “summarise this data, here is the data...”.

If you have a prompt without a strong directive, made up of sections with equal sizes, then you lose coherence very fast. Gemini 1.5 Pro lost coherence at 25k input tokens. Gemini 2.5 Pro loses coherence at 35k with my prompts.

Imagine you’re trying to construct a “thought” for an llm, where actions are extracted from the response and ran. The prompt has sections like:

  • introduction

  • mission

  • explanation of system state and architecture

  • previous actions taken and their outcomes

  • historical summarised actions taken

  • goals

  • predictions and prediction outcomes

  • recent memories

  • summarised memories

  • chat history between me and the llm

  • reward

  • performance metrics

  • app logs

  • how to respond

etc etc

Here’s a visual of how i’m constructing prompts and experiencing this problem, with 2 charts that show how we can have 2 kinds of prompts, one that is simple, and another with many distinct prompt sections. Each bar segment represents a “prompt section”, with the bar being total input tokens the prompt uses:

imagine that the directive is first, then the data in the prompt—I couldn’t wrangle office charts quickly this time

Is there any research in this space that I can read or watch?

What i’m working on is dynamically constructing prompts where the llm’s response is parsed into actions, and on the next run the llm receives the actions’ outcomes in the prompt. I’m having fun thinking about how we can construct a thought for an llm. Kind of like making a zero shot self improving system.

I have a dynamically assembled prompt with 16 sections which is working amazingly well, but I want to know of any traps as I expand and enhance it?

For example, adding a predictions section, and a section for adding goals that the llm maintains made a huge difference it what was achieved.

I want to avoid the multiple agents ai system architecture here, i’m interested in zero shot prompting.