How do we prove the AI tools we used didn’t insert the most gnarly backdoors possible? Things that require a particular code implementation that is valid, in top of a subtle compiler bug, on top of an exact timing hardware bug that can’t be tested for and therefore won’t be found in validation....
I believe this exactly the kind of thing that my proposal would be good for: Gnarly backdoors that exploit a compiler bug etc. should be very rare in the set of all valid implementations!
For this particular situation, can you describe what the restriction would be in concrete terms?
Is it “ok write this compiler function to convert C arithmetic to bytecode. Declare any variables used at the latest valid location. Use only 3 registers. ”
And then elsewhere in the compiler the restriction might be “declare any variables used at the top of the main function and pass it by reference to any child functions. Use all available registers possible, and manually update the instruction pointer”
I’m not sure I understand your question. What restriction do you have in mind? A safety restriction on what the generated code should be like? Something like requiring the code should be in some canonical form to remove degrees of freedom for the (potential malicious) code generating AI?
I gave “changing canon randomly” in the comment you are replying to. Is this how you propose limiting the hostile AIs ability to inject subtle hostile plans? Or similarly, “design the columns for this building. Oh they must all be roman arches.” Would be a similar example.
I believe this exactly the kind of thing that my proposal would be good for: Gnarly backdoors that exploit a compiler bug etc. should be very rare in the set of all valid implementations!
For this particular situation, can you describe what the restriction would be in concrete terms?
Is it “ok write this compiler function to convert C arithmetic to bytecode. Declare any variables used at the latest valid location. Use only 3 registers. ”
And then elsewhere in the compiler the restriction might be “declare any variables used at the top of the main function and pass it by reference to any child functions. Use all available registers possible, and manually update the instruction pointer”
I’m not sure I understand your question. What restriction do you have in mind? A safety restriction on what the generated code should be like? Something like requiring the code should be in some canonical form to remove degrees of freedom for the (potential malicious) code generating AI?
I gave “changing canon randomly” in the comment you are replying to. Is this how you propose limiting the hostile AIs ability to inject subtle hostile plans? Or similarly, “design the columns for this building. Oh they must all be roman arches.” Would be a similar example.