Maybe I should also expand on what the “AI agents are submitting the programs themselves subject to your approval” scenario could look like. When I talk about a preorder on Turing Machines (or some subset of Turing Machines), you don’t have to specify this preorder up front. You just have to be able to evaluate it and the debaters have to be able to guess how it will evaluate.
If you already have a specific simulation program in mind then you can define ≤ as follows: if you’re handed two programs which are exact copies of your simulation software using different hard-coded world models then you consult your ordering on world models, if one submission is even a single character different from your intended program then it’s automatically less, if both programs differ from your program then you decide arbitrarily. What’s nice about the “ordering on computations” perspective is that it naturally generalizes to situations where you don’t follow this construction.
What could happen if we don’t supply our own simulation program via this construction? In the planes example, maybe the “snap” debater hands you a 50,000-line simulation program with a bug so that if you’re crafty with your grid sizes then it’ll get confused about the material properties and give the wrong answer. Then the “safe” debater might hand you a 200,000-line simulation program which avoids / patches the bug so that the crafty grid sizes now give the correct answer. Of course, there’s nothing stopping the “safe” debater from having half of those lines be comments containing a Lean proof using super annoying numerical PDE bounds or whatever to prove that the 200,000-line program avoids the same kind of bug as the 50,000-line program.
When you think about it that way, maybe it’s reasonable to give the “it’ll snap” debater a chance to respond to the “it’s safe” debater’s comments. Now maybe we change the type of ≤ from being a subset of (Turing Machines) x (Turing Machines) to being a subset of (Turing Machines) x (Turing Machines) x (Justifications from safe debater) x (Justifications from snap debater). In this manner deciding how you want ≤ to behave can become a computational problem in its own right.
Maybe I should also expand on what the “AI agents are submitting the programs themselves subject to your approval” scenario could look like. When I talk about a preorder on Turing Machines (or some subset of Turing Machines), you don’t have to specify this preorder up front. You just have to be able to evaluate it and the debaters have to be able to guess how it will evaluate.
If you already have a specific simulation program in mind then you can define ≤ as follows: if you’re handed two programs which are exact copies of your simulation software using different hard-coded world models then you consult your ordering on world models, if one submission is even a single character different from your intended program then it’s automatically less, if both programs differ from your program then you decide arbitrarily. What’s nice about the “ordering on computations” perspective is that it naturally generalizes to situations where you don’t follow this construction.
What could happen if we don’t supply our own simulation program via this construction? In the planes example, maybe the “snap” debater hands you a 50,000-line simulation program with a bug so that if you’re crafty with your grid sizes then it’ll get confused about the material properties and give the wrong answer. Then the “safe” debater might hand you a 200,000-line simulation program which avoids / patches the bug so that the crafty grid sizes now give the correct answer. Of course, there’s nothing stopping the “safe” debater from having half of those lines be comments containing a Lean proof using super annoying numerical PDE bounds or whatever to prove that the 200,000-line program avoids the same kind of bug as the 50,000-line program.
When you think about it that way, maybe it’s reasonable to give the “it’ll snap” debater a chance to respond to the “it’s safe” debater’s comments. Now maybe we change the type of ≤ from being a subset of (Turing Machines) x (Turing Machines) to being a subset of (Turing Machines) x (Turing Machines) x (Justifications from safe debater) x (Justifications from snap debater). In this manner deciding how you want ≤ to behave can become a computational problem in its own right.