How can we combine behavioural experiments with mechanistic interpretability to infer an agent’s subjective causal model? The next post will say more about this.
There is no next post. Can I read about it somewhere anyway?
There is no next post. Can I read about it somewhere anyway?