When you say “some case in which a human might make different judgments, but where it’s catastrophic for the AI not to make the correct judgment,” what I hear is “some case where humans would sometimes make catastrophic judgments.”
I think such cases exist and are a problem for the premise that some humans agreeing means an idea meets some standard of quality. Bumbling into such cases naturally might not be a dealbreaker, but there are some reasons we might get optimization pressure pushing plans proposed by an AI towards the limits of human judgment.
When you say “some case in which a human might make different judgments, but where it’s catastrophic for the AI not to make the correct judgment,” what I hear is “some case where humans would sometimes make catastrophic judgments.”
I think such cases exist and are a problem for the premise that some humans agreeing means an idea meets some standard of quality. Bumbling into such cases naturally might not be a dealbreaker, but there are some reasons we might get optimization pressure pushing plans proposed by an AI towards the limits of human judgment.