TristanTrim comments on A Pragmatic Vision for Interpretability

TristanTrim 2 Dec 2025 9:41 UTC
1 point
0
I feel much better about this post after reading it than I got from my first impression.

I feel there is a vibe going around that “timelines have gotten too short to do the actual work so we need to just do what we can in the time we have left” and I want to push back on this vibe with the contrasting vibe “timelines have gotten too short to do the actual work so we need to secure longer timelines”. From this perspective, I would encourage marginal AI alignment and Mech Interp researchers mainly to consider if they can pivot to activism and policy as a primary important and neglected area, with short timeline AGI safety as a secondary but still very important direction. But regardless of my impression of this vibe, which I think this post does resonate with, most of the object level advice in this post seems valuable even for work outside of research.

It feels important to know whether proxy goals are biased in some directions just as curiosity driven science is. I like the north star approach for keeping proxy goals relevant to helping things go well, but I am hoping it will be applied more competently than I fear it will be. I would like for north stars to be grounded in a deeper worldview such as what seems to be explored in, for example, Bostrom’s “Deep Utopia”, rather than seemingly shorter term ideas involving profit, markets, management, and control.

Thank you to everyone who contributed to this thought provoking post : )