Cam comments on Architectures for Increased Externalisation of Reasoning

Cam 27 Nov 2025 8:48 UTC
7 points
2
I’m really excited about this line of work. It seems like small tweaks to model architecture could have the ability to lead to increased monitorability, basically for free. I’m surprised this project wasn’t funded, and would like to see more ambitious research projects like this that de-risk small architectural changes for positive safety properties.