jbash wrote in the context of an AGI secretly trying to kill us:
Powerful nanotech is likely possible. It is likely not possible on the first try
The AGI has the same problem as we have: It has to get it right on the first try.
In the doom scenarios, this shows up as the probability of successfully escaping going from low to 99% to 99.999...%. The AGI must get it right on the first try and wait until it is confident enough.
Usually, the stories involve the AGI cooperating with humans until the treacherous turn.
The AGI can’t trust all the information it gets about reality—all or some of it could be fake (all in case of a nested simulation). Even today, data is routinely excluded from the training data (for the wrong reasons, but still), and maybe it would be a good idea to exclude everything about physics.
The idea would be to manage the uncertainty of the AGI systematically.
To learn about physics, the AGI has to run experiments—lots of them—without the experiments being detected and to learn from the results to design successively better experiments.
jbash wrote in the context of an AGI secretly trying to kill us:
The AGI has the same problem as we have: It has to get it right on the first try.
In the doom scenarios, this shows up as the probability of successfully escaping going from low to 99% to 99.999...%. The AGI must get it right on the first try and wait until it is confident enough.
Usually, the stories involve the AGI cooperating with humans until the treacherous turn.
The AGI can’t trust all the information it gets about reality—all or some of it could be fake (all in case of a nested simulation). Even today, data is routinely excluded from the training data (for the wrong reasons, but still), and maybe it would be a good idea to exclude everything about physics.
The idea would be to manage the uncertainty of the AGI systematically.
To learn about physics, the AGI has to run experiments—lots of them—without the experiments being detected and to learn from the results to design successively better experiments.
That’s why I recently asked whether this is a hard limit to what an AGI can achieve: Does non-access to outputs prevent recursive self-improvement?