perhaps allowing the comet to fragment into a dense dust cloud around the Earth that effectively obscures Earth from the rest of the Universe for 49 years.
That’s an interesting thought, I’ll have to consider it more.
Your ideas on thickening s or decaying the certainty could be interesting.
Setting the start of s to an hour after the RIAI starts (when it’s supposed to turn its privacy box into lots of entropy) would also limit the time it had to run amok.
The idea with the 50 year delay is to be sure that no trace remain of the contents of the box. If we have a highly destructive ending to the privacy box, we can move s closer—but I wanted to move away from my previous “clouds of anti-matter plus bomb” scenario.
What if the RIAI can observe O? Suppose it...
That sounds like anthropic reasoning, which I banned the AI from using (or else counterfactuals in which it doesn’t exist make no sense—and the hypothetical super AI (which doesn’t exist) makes no sense either).
The idea with the 50 year delay is to be sure that no trace remain of the contents of the box. If we have a highly destructive ending to the privacy box, we can move s closer—but I wanted to move away from my previous “clouds of anti-matter plus bomb” scenario.
Instead of conditioning on the start of the AI perhaps its possible to directly measure the effect of the AI by restricting the contents of the privacy box directly. Let U be the region containing everything but the privacy box, let PB be the region containing the privacy box, and let O contain the output state. Ensure U intersect PB is O. Let U(t), PB(t), and O(t) be functions that yield the information content of the regions after time t (including the changes to the regions as things move around), and let U(0), PB(0), and O(0) be defined based on the initial background information B. Let T be the time at which the AI should be done and have altered O to indicate its output. Let Õ(t) represent a particular random state of the output. Let U_B(t) be information content of a universe U(t) + PB(0), e.g. the time-evolution of the universe while keeping PB constant. So long as P(S | U(T), PB(T), O(T)) / P(S | U_B(T), Õ(T)) = 1 I think that the AI’s impact is reduced to the region of the privacy box for the duration of T, and O. The equation implies that U(T) + PB(T) = U_B(T), or in other words that the privacy box in its final state is indistinguishable from its state in the background information. So long as the AI is restartable with a new background B and time 0, it doesn’t matter if the AI is accidentally started or ran more than once as long as it can correctly identify PB. It becomes more difficult (hopefully impossible) to directly observe whether the AI started or not, but a portion of O could be set aside for signalling the status of the AI.
I am not completely confident that the above approach works or that it covers the loopholes that your approach did, so it might make sense to add the privacy box conditional to the original one so that the AI has to satisfy both conditions.
That’s an interesting thought, I’ll have to consider it more.
Your ideas on thickening s or decaying the certainty could be interesting.
The idea with the 50 year delay is to be sure that no trace remain of the contents of the box. If we have a highly destructive ending to the privacy box, we can move s closer—but I wanted to move away from my previous “clouds of anti-matter plus bomb” scenario.
That sounds like anthropic reasoning, which I banned the AI from using (or else counterfactuals in which it doesn’t exist make no sense—and the hypothetical super AI (which doesn’t exist) makes no sense either).
Instead of conditioning on the start of the AI perhaps its possible to directly measure the effect of the AI by restricting the contents of the privacy box directly. Let U be the region containing everything but the privacy box, let PB be the region containing the privacy box, and let O contain the output state. Ensure U intersect PB is O. Let U(t), PB(t), and O(t) be functions that yield the information content of the regions after time t (including the changes to the regions as things move around), and let U(0), PB(0), and O(0) be defined based on the initial background information B. Let T be the time at which the AI should be done and have altered O to indicate its output. Let Õ(t) represent a particular random state of the output. Let U_B(t) be information content of a universe U(t) + PB(0), e.g. the time-evolution of the universe while keeping PB constant. So long as P(S | U(T), PB(T), O(T)) / P(S | U_B(T), Õ(T)) = 1 I think that the AI’s impact is reduced to the region of the privacy box for the duration of T, and O. The equation implies that U(T) + PB(T) = U_B(T), or in other words that the privacy box in its final state is indistinguishable from its state in the background information. So long as the AI is restartable with a new background B and time 0, it doesn’t matter if the AI is accidentally started or ran more than once as long as it can correctly identify PB. It becomes more difficult (hopefully impossible) to directly observe whether the AI started or not, but a portion of O could be set aside for signalling the status of the AI.
I am not completely confident that the above approach works or that it covers the loopholes that your approach did, so it might make sense to add the privacy box conditional to the original one so that the AI has to satisfy both conditions.
I’ll think about this—for the moment, it doesn’t seem to add much, to X vs ¬X, but I may be wrong...