SilentCal comments on Epiphenomenal Oracles Ignore Holes in the Box

SilentCal 1 Feb 2018 16:47 UTC
4 points
Honestly I’m not sure Oracles are the best approach either, but I’ll push the Pareto frontier of safe AI design wherever I can.
Though I’m less worried about the epistemic flaws exacerbating a box-break—it seems an epistemically healthy AI breaking its box would be maximally bad already—but more about the epistemic flaws being prone to self-correction. For instance, if the AI constructs a subagent of the ‘try random stuff, repeat whatever works’ flavor.