You’re correctly pointing out that “all before-and-after-optimization code pairs are equivalent to pairs of Turing machines”, but that is not the opposite of “not all pairs of Turing machines are before-and-after-optimization codes”.
A useful distinction. Thank you.
So “will my optimized version produce the exact same output for the exact same input” is only half of the problem; it still isn’t a proof of recursive stability in a realistic use case where after each improvement we’re also changing the input.
Suppose we have an AI that does two things: Calculate ballistics trajectories for its army of killer robots, and coherently extrapolate human volition to guide its overall goals. If it optimises the ballistics calculation, it can spend more time thinking about the CEV; this will produce a different result (unless it was already at the point of reflective stability), but in this case that’s a good thing. However, the optimised ballistics calculation had better be yielding the same results or it will start losing the war. So I distinguish between two outputs: The output of the specific function being optimised must be the same. The output of the AI as a whole can differ.
A useful distinction. Thank you.
Suppose we have an AI that does two things: Calculate ballistics trajectories for its army of killer robots, and coherently extrapolate human volition to guide its overall goals. If it optimises the ballistics calculation, it can spend more time thinking about the CEV; this will produce a different result (unless it was already at the point of reflective stability), but in this case that’s a good thing. However, the optimised ballistics calculation had better be yielding the same results or it will start losing the war. So I distinguish between two outputs: The output of the specific function being optimised must be the same. The output of the AI as a whole can differ.