Accepting that framing, I would characterize it as optimizing for inexploitability and resistance to persuasion over peak efficiency.
Alternatively, this job/process could be described as consisting of a partially separate skill or set of skills. It appears to be an open problem on how to extract useful ideas from an isolated context[1], without distorting them in a way that would lead to problems, while also not letting out any info-hazards or malicious programs. Against adversaries (accidental or otherwise) below superintelligence, a human may be able to develop this skill (or set of skills).
Hmm, psychosecurity is an interesting reframing of epistemic rationality.
Accepting that framing, I would characterize it as optimizing for inexploitability and resistance to persuasion over peak efficiency.
Alternatively, this job/process could be described as consisting of a partially separate skill or set of skills. It appears to be an open problem on how to extract useful ideas from an isolated context[1], without distorting them in a way that would lead to problems, while also not letting out any info-hazards or malicious programs. Against adversaries (accidental or otherwise) below superintelligence, a human may be able to develop this skill (or set of skills).
See this proposal on solving philosophy: https://www.lesswrong.com/posts/HbkNAyAoa4gCnuzwa/wei-dai-s-shortform?commentId=yDrWT2zFpmK49xpyz https://www.lesswrong.com/posts/HbkNAyAoa4gCnuzwa/wei-dai-s-shortform?commentId=JzbsLiwvvcbBaeDF5 Note especially the part about getting security precautions from the simulations in Wei Dai’s comment.