I’m not that convinced that attributing patching is better then ACDC—as far as I can tell Syed et al only measure ROC with respect to “ground truth” (manually discovered) circuits and not faithfulness, completeness, etc. Also Interp Bench finds ACDC is better than attribution patching
I’m not that convinced that attributing patching is better then ACDC—as far as I can tell Syed et al only measure ROC with respect to “ground truth” (manually discovered) circuits and not faithfulness, completeness, etc. Also Interp Bench finds ACDC is better than attribution patching