Charlie Steiner comments on Open Thread Autumn 2025

Charlie Steiner 12 Oct 2025 1:33 UTC
4 points
0
If it can assign meaning to states, then sure why not? Currently this comes with plenty of caveats, so it kind of depends on how much you want to stick about principledness and effectiveness.
Sometimes “deciding” etc. is represented in the activations, which is kind of trivial. So you can also be asking about interpreting the parameters of the AI that transform one state to another. Keywords might be circuit interpretability, automated circuit discovery, or parameter decomposition.