I have realized I want to contribute to AI Safety in any way I can. I am currently focused on interpretability, trying to make sense of research out there, orienting myself, looking for other new ravers[1], and ultimately learning to enact its technics[2]. In that process, I am writing a paper. A sort of queer paper. I keep updating it as I think more about it. You can read the latest version here:
FULL PAPER AT UNRULYABSTRACTIONS.COM
I will use category theory to investigate what interpretability is. Think of this formalism as much closer to a language than a theory[3]. I use notation and symbols to interrogate how things could come together. I am doing some scribbles and asking you, do you also feel it’s something like that?
Hope you enjoy wandering with me.
I enjoyed reading the paper but did not find the screenshots here in the post a helpful addition; I think I would have just quoted the introduction, if converting it into a full article was infeasible.
It’s also fun seeing other Eugenia Chang fans!
Good feedback!
I am still trying to figure out my workflow. I like writing on Typst, but I realized it’s not very easy to go from Typst → Less Wrong. Also, a lot of my writing is sorta experimental. I’m trying to determine which parts of my writing should be directed to which platforms/audiences.
I will make this a linkpost
And yes, Eugenia Chang is amazing :)