Category-Theoretic Wanderings into Interpretability

unruly abstractions2 Sep 2025 0:03 UTC

18 points

Logic & Mathematics Interpretability (ML & AI)Category theory AI Psychology AI Language Models (LLMs)Anthropic (org)

I have realized I want to contribute to AI Safety in any way I can. I am currently focused on interpretability, trying to make sense of research out there, orienting myself, looking for other new ravers^[1], and ultimately learning to enact its technics^[2]. In that process, I am writing a paper. A sort of queer paper. I keep updating it as I think more about it. You can read the latest version here:

FULL PAPER AT UNRULYABSTRACTIONS.COM

I will use category theory to investigate what interpretability is. Think of this formalism as much closer to a language than a theory^[3]. I use notation and symbols to interrogate how things could come together. I am doing some scribbles and asking you, do you also feel it’s something like that?

Hope you enjoy wandering with me.

^
Ravers are those who need to rave. Raves are practices. What if practices (like this one, which I am currently feeling my way in) are also raves?
^
Technics are a general category for all making and all practices
^
A judgement but not a proposition

unruly abstractions2 Sep 2025 0:03 UTC

18 points

2 comments1 min readLW link

Logic & Mathematics Interpretability (ML & AI)Category theory AI Psychology AI Language Models (LLMs)Anthropic (org)

Trevor Hill-Hand 2 Sep 2025 16:22 UTC
3 points
2
I enjoyed reading the paper but did not find the screenshots here in the post a helpful addition; I think I would have just quoted the introduction, if converting it into a full article was infeasible.

It’s also fun seeing other Eugenia Chang fans!
- unruly abstractions 2 Sep 2025 18:05 UTC
  1 point
  0
  Parent
  Good feedback!
  
  I am still trying to figure out my workflow. I like writing on Typst, but I realized it’s not very easy to go from Typst → Less Wrong. Also, a lot of my writing is sorta experimental. I’m trying to determine which parts of my writing should be directed to which platforms/audiences.
  
  I will make this a linkpost
  
  And yes, Eugenia Chang is amazing :)