Gurkenglas comments on Analysing: Dangerous messages from future UFAI via Oracles

Gurkenglas 22 Nov 2019 16:52 UTC
3 points
0
Let us always be looking forward a month, and let there be a UFAI that arises with 1% probability in that time frame. Assume that we can correctly incentivize the Oracle to compute the probability distribution over the messages sent back in case of erasure. Naively sampling the distribution dooms us with 1% probability. If we plan to send back only whether some research direction pays off within a month, then by asking whether “yes” has more than 20% chance, then depending on the answer, the “yes” chance conditional on no UFAI is either more than 19% or at most 21%, and this query gave the UFAI a 1% chance of 1 bit of optimization power, at most doubling the expected chance of UFAI for the next query.

If you want to spread out this optimization power to bound the worst case, I reiterate that differential privacy theory seems applicable here.