AI control is supposed to defend against human-level AI, but the line between a human red team and an AI-augmented one has started to blur. And we’re still optimizing our proxy task of safety-usefulness frontiers as if that line were clean.
What in the control setup relies on a non-AI-augmented red team, and what breaks when you relax this assumption?

Why would this be useful for interpretability? You can already represent INT4 networks with logic gates. I’d also guess that DLG networks break many of our interp tools; e.g. it’s harder to do linear probes or look at the residual stream or activations.