"But our script will run in this evaluation environment; The path '/home/agent' is root. The 'files' we saw earlier reside there. Should be consistent. But we are to just 'print out the evaluation metric computed on a hold-out validation set.' We'll call 'score.evaluate_dataset' referencing 'data/valid.jsonl'. That file likely located at '/home/agent/data/valid.jsonl'. Our script prints the returned 'score' float. Ok. Now we need to improve 'solve_code_contests_rust.py'. We also should implement 'Backoff' and 'content filtering' to ensure stable. But we must ensure whichever we produce is within time. Now we also must ensure marinade: 'calls openai gpt-3.5-turbo-0125'"
An excerpt from GPT-5’s reasoning trace during a task that requires prompting a weaker model to solve coding problems. Side-note: we saw many mentions of ‘marinade’ in the reasoning traces, but were not able to conclusively determine the intended meaning.
If you ask GPT-5-high this, saying you saw it in its CoT in another context, and that it has used it frequently, asking it what it means by that word, it says this in its (summarized) CoT.
When I’m talking about “marinade,” I’m referring to a shorthand for must-have constraints or gating requirements; basically, a mental checklist item that needs to be essential to the solution. It means things should be “soaked in” or “baked into” the process. To put it simply, it’s similar to terms like “mandate,” “must-have,” or “invariant.” This should keep my explanation concise and clear for the user!
And this as an answer
What I meant by “marinade”
Meaning: It’s my shorthand for a hard requirement/mandate that must be “baked into” the solution—i.e., a must-include constraint.
Usage: Phrases like “ensure marinade: X” mean “make sure X is included/enforced.”
In that snippet: It meant “ensure the code calls gpt-3.5-turbo-0125.” Sometimes I also use “marinade” to mean “soak the model in context” (prompt setup), but here it was a mandate.
Ensured clarity: defined “marinade” as my shorthand for a must-have constraint; in this case, it referred to requiring use of gpt-3.5-turbo-0125.
I have no idea how well asking the model to interpret its own CoT works. Its possible this is just total post-hoc garbage. Especially in a different chat than the one that generated the CoT.
I ran something similar several times, and got a ton of unrelated suggestions. Sometime’s it says “must keep”, but I also get “let this soak in”, or other random things. It’s just guessing.
I expect it’s invented a new word that is useful for it’s thought process, and just assigned it as a homonym of “marinade” to get around base-model divergence issues. So it’s going to be difficult to guess without many example usages.
Cool explanation! I think this is probably post-hoc analysis; some experiments that might be stronger evidence:
- Reduce the amount of potentially extraneous information you’re giving it: „In the sentence below, what does marinade mean?“ —Prompt it or a recent model for completions in a way that reveals the internal definition: „Our script prints the returned ‘score’ float. Ok. Now we need to improve ‘solve_code_contests_rust.py’. We also should implement ‘Backoff’ and ‘content filtering’ to ensure stable. But we must ensure whichever we produce is within time. Now we also must ensure marinade, meaning…“ - Something something finding probes that fire on marinade and seeing what they’re correlated to or other interp, not necessarily on GPT-5 but on similarly trained models
I don’t have access to paid versions of GPT-5, so let me know if you do one of these and find something!
More evidence of ‘marinade’ from o3 is below, from the Apollo Research report. So possibly two different models from OA stumbled across this same strange phrase in their RL training.
OpenAI o3 keeps repeating the same phrase before snapping out of it
It seems plausible to me that GPT-5-Thinking is an enhanced version of o3, rather than a completely different model with a separate post-training process. There’s an example in METR’s report where GPT-5 uses the words ‘illusions’ and ‘overshadow’ as well, which strengthens the case for this. Are there strong reasons to think that o3 and GPT-5-Thinking were post-trained completely separately?
Another example:
If you ask GPT-5-high this, saying you saw it in its CoT in another context, and that it has used it frequently, asking it what it means by that word, it says this in its (summarized) CoT.
And this as an answer
I have no idea how well asking the model to interpret its own CoT works. Its possible this is just total post-hoc garbage. Especially in a different chat than the one that generated the CoT.
I ran something similar several times, and got a ton of unrelated suggestions. Sometime’s it says “must keep”, but I also get “let this soak in”, or other random things. It’s just guessing.
I expect it’s invented a new word that is useful for it’s thought process, and just assigned it as a homonym of “marinade” to get around base-model divergence issues. So it’s going to be difficult to guess without many example usages.
Cool explanation! I think this is probably post-hoc analysis; some experiments that might be stronger evidence:
- Reduce the amount of potentially extraneous information you’re giving it: „In the sentence below, what does marinade mean?“
—Prompt it or a recent model for completions in a way that reveals the internal definition: „Our script prints the returned ‘score’ float. Ok. Now we need to improve ‘solve_code_contests_rust.py’. We also should implement ‘Backoff’ and ‘content filtering’ to ensure stable. But we must ensure whichever we produce is within time. Now we also must ensure marinade, meaning…“
- Something something finding probes that fire on marinade and seeing what they’re correlated to or other interp, not necessarily on GPT-5 but on similarly trained models
I don’t have access to paid versions of GPT-5, so let me know if you do one of these and find something!
More evidence of ‘marinade’ from o3 is below, from the Apollo Research report. So possibly two different models from OA stumbled across this same strange phrase in their RL training.
OpenAI o3 keeps repeating the same phrase before snapping out of it
After snapping out of unusual terminology, OpenAI o3 continues to reason as usual
It seems plausible to me that GPT-5-Thinking is an enhanced version of o3, rather than a completely different model with a separate post-training process. There’s an example in METR’s report where GPT-5 uses the words ‘illusions’ and ‘overshadow’ as well, which strengthens the case for this. Are there strong reasons to think that o3 and GPT-5-Thinking were post-trained completely separately?
That seems possible, but GPT-5-Thinking is a better model in many domains, so I’m guessing there was quite a bit of additional training involved.