Intuitively: The mutual information is defined as D(P(X,Y)||P(X)P(Y)), and likewise for conditional mutual info you condition each bit (and then average over the conditoned var). This means that the mutual information is definitionally the error of the diagram X Y (without any arrows) - the error that you get when assuming them independent. Conditioning on Z means you draw the arrows from Z to X and Y but still leave no arrow between X and Y.
Intuitively: The mutual information is defined as D(P(X,Y)||P(X)P(Y)), and likewise for conditional mutual info you condition each bit (and then average over the conditoned var). This means that the mutual information is definitionally the error of the diagram X Y (without any arrows) - the error that you get when assuming them independent. Conditioning on Z means you draw the arrows from Z to X and Y but still leave no arrow between X and Y.