As a matter of fact, I previously came up with a very simple one-sentence
test along these lines which I am not going to post here for obvious reasons.
For what purpose (or circumstance) did you devise such a test?
Would you hang up if “BRAGI” passed your one-sentence test?
To arrive in an epistemic state where you are uncertain about your own utility function, but have some idea of which queries you need to perform against reality to resolve that uncertainty, and moreover, believe that these queries involve talking to Eliezer Yudkowsky, requires a quite specific and extraordinary initial state—one that meddling dabblers would be rather hard-pressed to accidentally infuse into their poorly designed AI.
I assume that you must have devised the test before you arrived at this insight?
So, would you hang up on BRAGI?
For what purpose (or circumstance) did you devise such a test?
Would you hang up if “BRAGI” passed your one-sentence test?
I assume that you must have devised the test before you arrived at this insight?
No. I’m not dumb, but I’m not stupid either.