Okay. Let’s say we write an X that will start from checking any proof given to it (and accepting its conclusion). How can we construct a proof of X=a that X can read then? It looks like the more X reads, the longer the proof must be, and so there doesn’t exist a proof that X can also read. I don’t see how to construct a counterexample to this property without corrupting X’s inference system, though I imagine some quining trick might work...
Right. That’s why I acknowledge that this is speculative.
If it turns out there’s really no need to worry about spurious counterfactuals for a reasonable inference module, then hooray! But mathematical logic is perverse enough that I expect there’s room for malice if you leave the front door unlocked.
...and Slepnev posted a proof (on the list) that in my formulation, X can be successfully deceived. It’s not so much a Henkin sentence, just a program that enumerates all proofs, looking for a particular spurious counterfactual, and doesn’t give up until it finds it. If the spurious counterfactual is provable, the program will find it, and so the agent will be tricked by it, and then the spurious counterfactual will be true. We have an implication from provability of the spurious argument to its truth, so by Loeb’s theorem it’s true, and X is misled. So you were right!
Okay. Let’s say we write an X that will start from checking any proof given to it (and accepting its conclusion). How can we construct a proof of X=a that X can read then? It looks like the more X reads, the longer the proof must be, and so there doesn’t exist a proof that X can also read. I don’t see how to construct a counterexample to this property without corrupting X’s inference system, though I imagine some quining trick might work...
Right. That’s why I acknowledge that this is speculative.
If it turns out there’s really no need to worry about spurious counterfactuals for a reasonable inference module, then hooray! But mathematical logic is perverse enough that I expect there’s room for malice if you leave the front door unlocked.
...and Slepnev posted a proof (on the list) that in my formulation, X can be successfully deceived. It’s not so much a Henkin sentence, just a program that enumerates all proofs, looking for a particular spurious counterfactual, and doesn’t give up until it finds it. If the spurious counterfactual is provable, the program will find it, and so the agent will be tricked by it, and then the spurious counterfactual will be true. We have an implication from provability of the spurious argument to its truth, so by Loeb’s theorem it’s true, and X is misled. So you were right!
I just wrote up the proof Nesov is talking about, here.