A subtlety: If the proof search finds “agent()==1 implies world()==5” and “agent()==2 implies world()==1000″, then the argument in the post shows that it will indeed turn out that agent()==2 and world()==1000. But the argument does not show in any similar sense that ‘if the agent had returned 1, then world() would equal 5’—in fact this has no clear meaning (other than the one used inside the algorithm, i.e. that the theorem above is found in the proof search).
I do not think that this is actually a problem here, i.e. I don’t think it’s possible to construct a world() where we would intuitively say that the algorithm doesn’t optimize [except of course by making the world so complicated that the proofs get too long], but there’s a related situation where something similar does play a role:
Suppose that instead of searching only proofs, you let agent() call a function probability(logicalStatement), and then do expected utility maximization. [To be precise, let both the possible actions and the possible utility values come from a finite set; then you can call probability(...) for all possible combinations, and since different utilities for the same action are mutually exclusive, their probabilities are additive, so the expected utility calculation is straight-forward.]
You might imagine that you could score different candidates for probability(...) on how much probability they assign to false statements (at least I at one point thought this might work), but this is not sufficient: an evil probability function might assign 99% probability to “agent()==1 implies world()==1” and to “agent()==2 implies world()==5″, even though
def world(): return 1000 if agent()==1 else 5
because both logical statements would turn out to be perfectly true.
Sorry, my previous reply was based on a misparsing of your comment, so I deleted it.
So the “mathematical intuition module” can be 100% truthful and evil at the same time, because making true statements isn’t the same as making provable statements. Funny. It seems the truth vs provability distinction is actually the heart of decision theory.
It seems the truth vs provability distinction is actually the heart of decision theory.
I came across this paragraph from Bruno Marchal today, which strikingly reminds me of Vinge’s “Hexapodia as the key insight”:
Talking about Smullyan’s books, I recall that “Forever Undecided” is a
recreational (but ok … not so easy, nor really recreational)
introduction to the modal logic G (the one Solovay showed to be a sound
and complete theory for the Godel-Lob (Gödel, Löb, or Goedel, Loeb)
provability/concistency logic. G is the key for the math in the TOE
approach I am developing. The logic G is the entry for all
arithmetical “Plotinian Hypostases”.
Bruno is a prolific participant on the “everything-list” that I created years ago, but I’ve never been able to understand much of what he talked about. Now I wonder if I should have made a greater effort to learn mathematical logic. Do his writings make sense to you?
EDIT: it seems I was partly mistaken. What I could parse of Marchal’s mathy reasoning was both plausible and interesting, but parsing is quite hard work because he expresses his ideas in a very freaky manner. And there still remain many utterly unparseable philosophical bits.
Thanks for taking another look. BTW, I suggest if people change the content of their comments substantially, it’s better to make a reply to yourself, otherwise those who read LW by scanning new comments will miss the new content.
This is basically the idea of the “mathematical intuition module”. Wei Dai explained it to me and I spent some time thinking about it, and now I think such an idea cannot work. To make it provably work, you’d need way too many assumptions about “well-behavedness” of the module that our actual math intuition doesn’t satisfy.
A subtlety: If the proof search finds “agent()==1 implies world()==5” and “agent()==2 implies world()==1000″, then the argument in the post shows that it will indeed turn out that agent()==2 and world()==1000. But the argument does not show in any similar sense that ‘if the agent had returned 1, then world() would equal 5’—in fact this has no clear meaning (other than the one used inside the algorithm, i.e. that the theorem above is found in the proof search).
I do not think that this is actually a problem here, i.e. I don’t think it’s possible to construct a world() where we would intuitively say that the algorithm doesn’t optimize [except of course by making the world so complicated that the proofs get too long], but there’s a related situation where something similar does play a role:
Suppose that instead of searching only proofs, you let agent() call a function probability(logicalStatement), and then do expected utility maximization. [To be precise, let both the possible actions and the possible utility values come from a finite set; then you can call probability(...) for all possible combinations, and since different utilities for the same action are mutually exclusive, their probabilities are additive, so the expected utility calculation is straight-forward.]
You might imagine that you could score different candidates for probability(...) on how much probability they assign to false statements (at least I at one point thought this might work), but this is not sufficient: an evil probability function might assign 99% probability to “agent()==1 implies world()==1” and to “agent()==2 implies world()==5″, even though
def world(): return 1000 if agent()==1 else 5
because both logical statements would turn out to be perfectly true.
Sorry, my previous reply was based on a misparsing of your comment, so I deleted it.
So the “mathematical intuition module” can be 100% truthful and evil at the same time, because making true statements isn’t the same as making provable statements. Funny. It seems the truth vs provability distinction is actually the heart of decision theory.
I came across this paragraph from Bruno Marchal today, which strikingly reminds me of Vinge’s “Hexapodia as the key insight”:
Bruno is a prolific participant on the “everything-list” that I created years ago, but I’ve never been able to understand much of what he talked about. Now I wonder if I should have made a greater effort to learn mathematical logic. Do his writings make sense to you?
No, they look like madness.
But logic is still worth learning.
EDIT: it seems I was partly mistaken. What I could parse of Marchal’s mathy reasoning was both plausible and interesting, but parsing is quite hard work because he expresses his ideas in a very freaky manner. And there still remain many utterly unparseable philosophical bits.
Thanks for taking another look. BTW, I suggest if people change the content of their comments substantially, it’s better to make a reply to yourself, otherwise those who read LW by scanning new comments will miss the new content.
This is basically the idea of the “mathematical intuition module”. Wei Dai explained it to me and I spent some time thinking about it, and now I think such an idea cannot work. To make it provably work, you’d need way too many assumptions about “well-behavedness” of the module that our actual math intuition doesn’t satisfy.