It depends. But yes, incorrect epistemics can make an AGI safer, if it is the right and carefully calibrated kind of incorrect. A goal-directed AGI that incorrectly believes that its off switch does not work will be less resistant to people using it. So the goal is here to design an AGI epistemics that is the right kind of incorrect.
Note: designing an AGI epistemics that is the right kind of incorrect seems to go against a lot of the principles that aspiring rationalists seem to hold dear, but I am not an aspiring rationalist. For more technical info on such designs, you can look up my sequence on counterfactual planning.
It depends. But yes, incorrect epistemics can make an AGI safer, if it is the right and carefully calibrated kind of incorrect. A goal-directed AGI that incorrectly believes that its off switch does not work will be less resistant to people using it. So the goal is here to design an AGI epistemics that is the right kind of incorrect.
Note: designing an AGI epistemics that is the right kind of incorrect seems to go against a lot of the principles that aspiring rationalists seem to hold dear, but I am not an aspiring rationalist. For more technical info on such designs, you can look up my sequence on counterfactual planning.