Wouldn’t this essentially converge to adversarial prompt-writing (accidentally or otherwise)? I’d expect the most successful approach to amount to writing a `strategy` that causes LLMs that read it to behave in a way that benefits it, irrespective of what the opponent’s strategy says.
Maybe structuring the competition like the original one, with programs instead of prompts, would eliminate the issue. Each competitor releases source code for their strategies, and these strategies can examine (or run, in varied sets of conditions) their opponents’ source code before making a decision. The programs could later be summarized in natural language, after all.
The strategies are manually reviewed with clear prompt injection attempts rejected.
I think the program approach proved unworkable. It is simply too difficult to write a program that can analyze another program effectively when the program being analyzed has to be complex enough to do its own analysis of other programs.
Wouldn’t this essentially converge to adversarial prompt-writing (accidentally or otherwise)? I’d expect the most successful approach to amount to writing a `strategy` that causes LLMs that read it to behave in a way that benefits it, irrespective of what the opponent’s strategy says.
Maybe structuring the competition like the original one, with programs instead of prompts, would eliminate the issue. Each competitor releases source code for their strategies, and these strategies can examine (or run, in varied sets of conditions) their opponents’ source code before making a decision. The programs could later be summarized in natural language, after all.
The strategies are manually reviewed with clear prompt injection attempts rejected.
I think the program approach proved unworkable. It is simply too difficult to write a program that can analyze another program effectively when the program being analyzed has to be complex enough to do its own analysis of other programs.