This is the first time I wrote something on LW that I consider to be serious, in that it explored genuinely new ideas in technical depth. I’m pretty happy with how it turned out.
I write a lot of hand-written notes that, years later, become papers. People who are around me know about this habit. This post started as such a hand-written note that I put together in a few hours, and would have likely stayed that way if not for the outlet of LW. The paper this became is “Programs as singularities” (PAS). The treatment there is much better than the (elementary but somewhat gross) calculations here, but it is also 90 pages long and came out more than a year later.
I think the idea of structural Bayesianism being hinted at here is correct and important, and is conceptually the foundation for how we think about interpretability at Timaeus. Its role in providing foundations for talking about the structure of agents is just starting to become visible, Dalcy Ku has some nice recent shortform about their work and Timaeus will have work on SLT in the setting of RL coming out soon, as well as more throughout 2026.
Was it worth making this post, vs just waiting to share the ideas in the paper? I’m not sure. Plausibly some people saw it here who wouldn’t otherwise have engaged with the material (PAS is probably a bit intimidating). This post interprets that material more in an alignment setting and draws connections e.g. to RL that we didn’t do in the paper. I think posting works-in-progress like this runs a risk of incentivising flag planting and rewarding people psychologically for half-finished things (which then never get finished, because “that’s done” and nobody has the incentive to do it properly). I thought at the time that this material was “weird” enough that this risk was marginal, as it has turned out to be.
I notice I haven’t done something like this again since March 2024, however.
This is the first time I wrote something on LW that I consider to be serious, in that it explored genuinely new ideas in technical depth. I’m pretty happy with how it turned out.
I write a lot of hand-written notes that, years later, become papers. People who are around me know about this habit. This post started as such a hand-written note that I put together in a few hours, and would have likely stayed that way if not for the outlet of LW. The paper this became is “Programs as singularities” (PAS). The treatment there is much better than the (elementary but somewhat gross) calculations here, but it is also 90 pages long and came out more than a year later.
I think the idea of structural Bayesianism being hinted at here is correct and important, and is conceptually the foundation for how we think about interpretability at Timaeus. Its role in providing foundations for talking about the structure of agents is just starting to become visible, Dalcy Ku has some nice recent shortform about their work and Timaeus will have work on SLT in the setting of RL coming out soon, as well as more throughout 2026.
Was it worth making this post, vs just waiting to share the ideas in the paper? I’m not sure. Plausibly some people saw it here who wouldn’t otherwise have engaged with the material (PAS is probably a bit intimidating). This post interprets that material more in an alignment setting and draws connections e.g. to RL that we didn’t do in the paper. I think posting works-in-progress like this runs a risk of incentivising flag planting and rewarding people psychologically for half-finished things (which then never get finished, because “that’s done” and nobody has the incentive to do it properly). I thought at the time that this material was “weird” enough that this risk was marginal, as it has turned out to be.
I notice I haven’t done something like this again since March 2024, however.