Caleb Biddulph comments on Caleb Biddulph’s Shortform

Caleb Biddulph 10 Mar 2025 4:58 UTC
1 point
0
Another idea I forgot to mention: figure out whether LLMs can write accurate, intuitive explanations of boolean circuits for automated interpretability.
Curious about the disagree-votes—are these because DLGN or DLGN-inspired methods seem unlikely to scale, they won’t be much more interpretable than traditional NNs, or some other reason?