Stepping back to the meta level (the OP seems a fine), I worry that you fail to utilize LLMs.
“There is are ways in which John could use LLMs that would be useful in significant ways, that he currently isn’t using, because he doesn’t know how to do it. Worse he doesn’t even know these exist.”
I am not confident this statement is true, but based on things you say, and based on how useful I find LLMs, I intuit there is a significant chance it is true.
If the statement is true or not doesn’t really matter, if the following is true: “John never seriously sat down for 2 hours and really tried to figure out how to utilize LLMs full.”
E.g. I expect when you had the problem that the LLM reused symbols randomly you didn’t go: “Ok how could I prevent this from happening? Maybe I could create an append only text pad, in which the LLM records all definitions and descriptions of each symbol, and have this text pad be always appended to the prompt. And then I could have the LLM verify that the current response has not violated the pad’s contents, and that no duplicate definitions have been added to the pad.”
Maybe this would resolve the issue, probably not based on priors. But it seems important to think this kind of thing (and think for longer such that you get multiple ideas, of which one might work, and ideally first focus on trying to build a mechanistic model of why the error is happening in the first place, that allows you to come up with better interventions).
Stepping back to the meta level (the OP seems a fine), I worry that you fail to utilize LLMs.
“There is are ways in which John could use LLMs that would be useful in significant ways, that he currently isn’t using, because he doesn’t know how to do it. Worse he doesn’t even know these exist.”
I am not confident this statement is true, but based on things you say, and based on how useful I find LLMs, I intuit there is a significant chance it is true.
If the statement is true or not doesn’t really matter, if the following is true: “John never seriously sat down for 2 hours and really tried to figure out how to utilize LLMs full.”
E.g. I expect when you had the problem that the LLM reused symbols randomly you didn’t go: “Ok how could I prevent this from happening? Maybe I could create an append only text pad, in which the LLM records all definitions and descriptions of each symbol, and have this text pad be always appended to the prompt. And then I could have the LLM verify that the current response has not violated the pad’s contents, and that no duplicate definitions have been added to the pad.”
Maybe this would resolve the issue, probably not based on priors. But it seems important to think this kind of thing (and think for longer such that you get multiple ideas, of which one might work, and ideally first focus on trying to build a mechanistic model of why the error is happening in the first place, that allows you to come up with better interventions).