Adrià Garriga-alonso comments on Circuit discovery through chain of thought using policy gradients

Adrià Garriga-alonso 30 Nov 2025 1:15 UTC
LW: 2 AF: 1
0
AF
I am concerned that long chains of RL are just sufficiently fucked functions with enough butterfly effects that this wouldn’t be well approximated by this process.

This is a concern. Two possible replies:
- If it’s truly a chaotic system then there’s no good way to estimate the expectation.
- In reality, it could be that the effects of neurons are not very chaotic, but this estimate of the gradient is very chaotic. Previous work actually shows that policy gradients are much less chaotic than the ‘reparameterization trick’ (in the case where the transition is continuous, differentiating through it). It could be that finite differences (resampling many rollouts with/without the neuron activated) actually estimates effects better with less variance. We’ll see.