More generally, what makes Q dangerous is that (1) it only settles for a spurious moral argument, doesn’t accept natural ones, and (2) what it finds is taken seriously by the agent, acted out. As a result, provability of a spurious moral argument provably implies its truth, which by Loeb’s theorem makes it true and forces the agent to be thus misled.
The only difference between Q and normal proof search procedures is that the normal procedures are OK with any proof, while Q dislikes natural proofs and ignores them. And this bit of “motivated skepticism” is sufficient to make the preferred spurious proofs come true, Q doesn’t just loop without finding anything.
This is a whole new level of Oracle AI unsafety… Take what it says seriously, and it can argue you into doing anything at all. :-)
More generally, what makes Q dangerous is that (1) it only settles for a spurious moral argument, doesn’t accept natural ones, and (2) what it finds is taken seriously by the agent, acted out. As a result, provability of a spurious moral argument provably implies its truth, which by Loeb’s theorem makes it true and forces the agent to be thus misled.
The only difference between Q and normal proof search procedures is that the normal procedures are OK with any proof, while Q dislikes natural proofs and ignores them. And this bit of “motivated skepticism” is sufficient to make the preferred spurious proofs come true, Q doesn’t just loop without finding anything.
This is a whole new level of Oracle AI unsafety… Take what it says seriously, and it can argue you into doing anything at all. :-)
You mean Q instead of P, right? (Edit: Fixed.)
Right, cousin_it changed some terminology from the post on the list, I didn’t notice. (Fixed.)